[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[Libevent-users] Buffer event race condition



I'm at wit's end with a libevent threading bug.  As part of a disconnect client routine, I manually call the errorcb with EVENT_ERROR_EOF.  The idea was to keep all the cleanup code in one callback but I'm beginning to think this was ill-conceived.  Somehow, libevent is entering a condition wait and it looks like the event base is never releasing the cv mutex.

For the worker threads, the workers are passed a *bev and *ctx pointer through a singly linked list that is mutex protected.  When a client disconnects, a reaper is run in the errorcb to remove any pending work entries that match that *bev pointer.  This happens mutually exclusive to the workers so a worker should not consume a null *bev.  I'm guessing the errorcb is being called twice.

It may be a longshot asking such a broad question but any advice is appreciated.

Here is the backtrace of the worker thread(6), and the event base(1):

Thread 6 (Thread 0xf5de5b70 (LWP 25688)):
#0  0xffffe430 in __kernel_vsyscall ()
#1  0xf7f6d625 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xf7fd3cf0 in evthread_posix_cond_wait (_cond=0x8061018, _lock=0x8060598,
    tv=0x0) at evthread_pthread.c:156
#3  0xf7fa15f5 in debug_cond_wait (_cond=0x8061018, _lock=0x80606c0, tv=0x0)
    at evthread.c:222
#4  0xf7f9f14f in event_del_internal (ev=0x8061558) at event.c:2142
#5  0xf7f9efdc in event_del (ev=0x8061558) at event.c:2110
#6  0xf7faa75b in be_socket_destruct (bufev=0x8061508)
    at bufferevent_sock.c:583
#7  0xf7fa8edc in _bufferevent_decref_and_unlock (bufev=0x8061508)
    at bufferevent.c:601
#8  0xf7fa9128 in bufferevent_free (bufev=0x8061508) at bufferevent.c:659
#9  0x0805365e in errorcb (bev=0x8061508, error=16, ctx=0x8061410)
    at craftd.c:173
#10 0x0805556f in packetdecoder (pkttype=255 '\377', pktlen=11, bev=0x8061508,
    player=0x8061410) at network/decoder.c:292
#11 0x080549f4 in run_worker (arg=0xffffd238) at network/worker.c:145
#12 0xf7f6893e in start_thread () from /lib/libpthread.so.0
#13 0xf7edcd9e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xf7dea6c0 (LWP 25681)):
#0  0xffffe430 in __kernel_vsyscall ()
#1  0xf7f70319 in __lll_lock_wait () from /lib/libpthread.so.0
#2  0xf7f6b5c2 in _L_lock_543 () from /lib/libpthread.so.0
#3  0xf7f6b455 in pthread_mutex_lock () from /lib/libpthread.so.0
#4  0xffffcfd8 in ?? ()
#5  0xf7dea6c0 in ?? ()
#6  0x08060468 in ?? ()
#7  0xf7fd4ff4 in ?? () from /usr/local/lib/libevent_pthreads-2.0.so.5
#8  0xf7fa99f0 in bufferevent_socket_outbuf_cb (buf=Cannot access memory at address 0x6459
) at bufferevent_sock.c:119
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

If you're willing to swim through some spaghetti code, the errorcb is on line 125:
https://github.com/kev009/craftd/blob/781c2ec4503fe1c96fce23f29239226019f44d81/src/craftd.c#L125

Regards,
Kevin