[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] Re: error response from select on Windows



On Fri, Feb 28, 2014 at 4:12 PM, Nate Rosenblum <nater@xxxxxxxxxxxxxx> wrote:
>> Recently I've been investigating the mysterious termination of my event
>> dispatch loop on a Windows system, with libevent 2.0.21. This has happened
>> extremely rarely, and only in an EC2 virtualized environment. The system is
>> not OOMing, so I suspect an error return from select in win32_dispatch
>> (which is the only place that dispatch could return an error). Assume for
>> the sake of discussion that I'm
>
>
> Ok, I have tracked this down to a call to `bufferevent_free` on a
> bufferevent that is outstanding in win32_dispatch's invocation of select.
> This is happening at the behest of an application-level timeout. The
> algorithm looks roughly like this:
>
>     bufev = bufferevent_socket_new(...);
>     bufferevent_setcb(..., connectionCb, ...);
>     bufferevent_socket_connect_hostname(...);
>     // Above repeated for several alternative connections
>
>     // ...
>
>     // Register application-level global timeout
>     timeout = evtimer_new(base, timeoutCb, ...);
>     evtimer_add(timeout, tv)
>
>     // ...
>
>     void timeoutCb(int, short, arg) {
>          bufferevent_free(bufev);  // <-- this is the bufferevent for the
> connection above (really several are processed)
>     }
>
> If the timeout fires while we're still waiting for a response on the connect
> for the underlying fd and we're using a select-based backend, the close will
> cause select to return an error and the dispatch loop will bail out. This is
> certainly the case for both select and win32select backends; I have not
> checked whether closing the descriptor also causes the kqueue or *poll
> interfaces to bail out.
>
> Is what I am doing even reasonable? The documentation for bufferevent_free
> implicitly suggests that it's ok to call while an operation is outstanding,
> but it looks to me like doing so will break any select-based implementation.

There have been some longstanding issues trying to get
bufferevent_free() to work from one thread while the bufferevent is
active in another.  We've been trying to get them all straightened out
in the latest libevent 2.1 master granch, but apparently there are
some problems left.

It seems to me that the right response here may be for the select loop
to treat this error as a non-error, and retry. (If we're worried about
looping forever, we could ignore it only the error when there's a
pending notification from another thread, indicating that some other
thread has changed the list of pending events.)

The alternative is for the close() to happen in the "finalize
callback" that happens from the main thread in the new 0., and make
sure that happens after the event_del calls have had their effect.

Any interest in helping to track this down? :)  One thing that might
help is a little test program that tries to provoke this bug.  Beyond
that, improving the finalize-callback code in the master branch would
also be of benefit.

best wishes,
-- 
Nick
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.