[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] Event base exiting from loop unexpectedly.




(I assume you're trying this on Libevent 2.0.3-alpha, btw.  Earlier
versions had big threading bugs.  There were also a few minor
threading bugs in Libevent 2.0.3-alpha, of course; have you tried
reproducing using the latest development code from the Git repository?
[1]  I believe it fixed a bug specifically related to the 1-return
case from event_base_loop() back in commit da1718b2. [2])

The exiting bug was with that specific version of libevent (2.0.3),
Shortly after I posted, I tried using the latest version from the Git
repository, and that somewhat worked. What that means is..

1) It stopped exiting, it instead started crashing.

The reason for the crashing was that the request is essentially
queued with one of many worker threads (that do blocking disk i/o).
Sometimes, the connection would die while the request was in the
queue, this causes the error_event_cb to free the bev, which causes
the later access to crash (definitely not libevent's fault).

This was trivially "fixed", by adding an atomic int into the callback ptr
which is manipulated at one of various stages in the program, and
the error_cb will not free the bev until after its been processed,
regardless of the state of the connection.

One would think this would fix it, but, it doesn't. Even though the
int is atomic, and all the ops on it are CAS (atomic) swaps, the
synchronization doesn't quite work stably. It ends up causing
random race conditions. I analyzed this quite thoroughly, but
got nowhere.

2) I then completely removed any bufferevent_free operations, which
fixed that problem.. and caused another.
Deadlocks, specifically in the event loop, as soon as the number of
requests exceeds a couple of hundred (or around there).

3) The above problem was also fixed, but, the way I did that was not
ideal. I removed all bufferevent manipulations in any other thread, and
instead process them all in a single thread, and sent all manipulations
to this thread from other threads (I ran the event base loop within this
thread as well, using the NONBLOCK flag). There's a disadvantage
to this, and that's that the thread running the event loop (and processing
the writes/reads/frees) will end up using 100% of one core, all the time,
due to running the EVLOOP in this fashion, basically:

while(1)
{
     ..get anything to process..
        event_base_loop(base, EV_NONBLOCK);
}

The ..get anything to process.. has no locks, nor does the writing to
that area. Essentially, a lock-free queue like structure, which is
quite efficient across multiple threads. The event base itself is
created with NO_LOCKS, but, the bufferevents are created with
the THREADSAFE opt. Without this, the entire thing still locked
and/or crashed. (linked with levent_pthreads).

This isn't ideal. Nonetheless..

You'd think at this point, that there would be no more problems, but
unfortunately, there are.

4) Doing all the above will get you in the thousands of connections with
bufferevents. At around the ~20-30k connections mark, you'll find that
libevent will randomly crash. It crashes when trying to expand a buffer,
on a write. There's plenty of memory available, and its requesting only
8 bytes or something, regardless, it will crash. I believe the exact place
is an if, when checking if a chain is pinned or something like that. I
spent some time with it in gdb, but have forgotten the exact line of code
now.

Unfortunately, at this point I gave up on libevent, and wrote my own implementation
that seems to be better suited to my purposes, and is more stable.

I don't have the time currently to write a test, but I hope I've given you enough
information to help you track down some of the issues.

Let me know if you need more information.

Thanks!

P.S: I tried running the program in valgrind at the time of the bug (with the 2.0.3), 
unfortunately, valgrind was far too slow to allow me to get anything. Nonetheless, 
as it stopped exiting when I upgraded, I think its safe to assume that that
problem was not in my code.

P.P.S: All the above is on Linux (64bit).



      

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.