On Tue, Feb 19, 2013 at 11:57 AM, Catalin Patulea <catalinp@xxxxxxxxxx> wrote: > On Tue, Feb 19, 2013 at 11:49 AM, Nick Mathewson <nickm@xxxxxxxxxxxxx> wrote: >> Could you be more specific about the good results, and how to >> reproduce them? Or how to reproduce/observe the undesired behavior? >> It would also be good to have an explanation of how the patch is >> supposed to work. > I had a lot of trouble reproducing this bug. It is exacerbated by high > requests parallelism, where there is a high chance of having fds > numbers get reused by the kernel and the evmap falls out of sync with > the kernel's idea of which fds are and aren't used. > > I think the trigger is a connection that times out. Then > evhttp_connection_reset_ closes the fd (evutil_closesocket) but the > bufferevent still thinks it owns the fd and the fd is still present in > the underlying evmap/epoll implementation. > > Then the kernel hands the same fd to a new bufferevent and that > bufferevent starts getting events for the old one. This would cause > assertions in the epoll layer like "Epoll ADD(X) on fd Y failed", > Valgrind invalid reads or invalid writes and occasionally crashes. Ah, there's the key. Those errors can happen -- especially with epoll -- when you close an fd without deleting all the events attached to it, especially if you manipulate those events later. But the patch isn't quite optimal for that -- we should do the bufferevent_setfd() before we close the socket, not after. That way there's never a non-deleted event that refers to the fd. How does the attached patch look to you? Does it work as well/any better? -- Nick
Attachment:
http.diff
Description: Binary data