[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #24594 [Core Tor/Tor]: Protocol warning: Expiring stuck OR connection to fd...
#24594: Protocol warning: Expiring stuck OR connection to fd...
-------------------------+-------------------------------------------------
Reporter: dgoulet | Owner: (none)
Type: defect | Status: new
Priority: Medium | Milestone: Tor: 0.3.3.x-final
Component: Core | Version:
Tor/Tor |
Severity: Normal | Keywords: tor-sched, libevent, tor-connection
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
-------------------------+-------------------------------------------------
So in theory, this is at protocol warning so shouldn't too problematic but
I think this worth looking at it. I've been seeing many of these on a test
relay I have (capped at 200KB/s) using KIST scheduler: (redacting the
relay addr/port):
{{{
Expiring stuck OR connection to fd 380 (IP:PORT). (3747888 bytes to flush;
3000 seconds since last write)
}}}
This is pretty big, 3.7MB stuck in the `outbuf` of a connection. The
`3000` seconds since last write means that
`connection_handle_write_impl()` hasn't been called which is *very*
surprising in the first place.
There are currently two ways for the handle write function to be called,
either through the libevent `write_event` which is fired everytime the
socket is *ready* to write (see this as `POLLLOUT` from poll()). Or, it is
directly called from KIST scheduler when cells are put in the outbuf.
This is worrying because it means that KIST did in fact put 3.7MB of cells
on the outbuf thinking the socket had its TCP buffer stable enough to put
that data in but somehow none got written on the socket.
On possibility is that KIST flushed cells on the connection then tried to
write it to the network, that didn't work, the TCP information of the
socket is still intact and because KIST doesn't check for errors (#24449),
nothing happened. Then, somehow, after those 3.7MB were put in the outbuf,
the channel was never scheduled again for a write because KIST had no idea
that anything was left in the outbuf from previous flush on the network.
So then it comes down to the `write_event` to write those cells flushed by
KIST. Without having a `POLLOUT` event on the socket, nothing will happen
so the question I have is how can this event was never fired up for 50
minutes? I kind of feel that the TCP timeout would have kicked in by then
if there was really a problem... ? But also, that is a _long_ time for an
idle connection?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24594>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs