[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] bufferevent openssl deadlock

On Fri, Feb 24, 2012 at 8:14 PM, Amarin Phaosawasdi
<amarinph@xxxxxxxxxxxx> wrote:
> Hello,
> About a year ago, there was a thread about deadlocking in
> bufferevent_openssl in multi-threaded mode
> (http://archives.seul.org/libevent/users/Jan-2011/msg00019.html).
> We've run into the same problem.
> Our minimal fix (as opposed to the patch mentioned, which adds atomic
> operations to evthread) for our specific use case was this :
> - expose event_base_lock() and event_base_unlock() to library users
> - in the user code make sure to lock the event_base before writing to an
> openssl bufferevent, so no callbacks can be fired during that write (which
> results in a deadlock).
> Does anybody have feedback on better ways to workaround this?

So, this wants a general fix, not just a workaround.  Right now, I am
not sure what the general fix is.  I started writing up some notes,
but I don't think they actually point to an obvious conclusion yet.
More insight would be welcome!

First off, the bufferevents are a red herring: the problem is the way
that Libevent handles attempts to event_del an event whose callback is
currently running in a different thread.  Right now, Libevent makes
the thread that is not running the callback wait until the callback is
done running.  (This same behavior happens with event_add and
event_active on signal events.)

The problem shows up when we have other locks, like those used by
bufferevents.  If the thread calling event_del() holds a lock on some
resource that the callback wants, then we wind up in a deadlock where
the event_del() thread is waiting for the callback to finish, and the
callback is waiting for the lock.

So why do this? Why wait at all?  When I wrote that code (back in
6b4b77a26), I think the rationale was this: there is lots of code
where you want to event_del() an event, free it, and free the data
structure that was in the event's callback_arg.  When using Libevent
in a single-threaded mode, this is completely safe, since either the
event is not running, or you are
doing all of this from the event's callback and you presumably know
not to use the callback_arg after it's freed.  But if your program is
multithreaded, you run into a problem: the callback might be running
at the time, merrily using the callback_arg and event you're about to
free.  So the thread doing the event_del() might free a callback_arg
or an event that was in use.

Here are all the fixes I can think of quickly, in brainstorming order:

   - Remove the wait entirely: make it the responsibility of the
     multithreaded program to ensure that the event callback is not
     running before you free the event's callback_arg.  But the
     problem here is, how can you do that?  You can event_del() the
     event to make sure that it is not pending or not active, but if the
     callback is running, it's running.

   - Remove the wait, and give multithreaded programs a way to make
     their cleanup happen _after_ the event is done running (if it's
     running).  One sketch interface would make event_free() detect
     whether the event is running.  If so, event_free() might return
     without freeing the event.  Once the callback was finished,
     another callback would run to do whatever it was that the
     deleting thread hoped to do.

     We'd need a corresponding event_unassign() for events not created
     with event_new().

   - What else?

Obviously, anything here that would place a new requirement on
multithreaded programs should be written in a way so that currently
working programs won't break because of it.

To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.