On Mon, Aug 6, 2012 at 9:42 PM, Matthieu Nottale
<mnottale@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi.
I'm experiencing a deadlock on 2.0.19 while calling bufferevent_free frome thread A, while thread B is in event_base_dispatch.
Here are the two relevant backtraces:
(gdb) bt
#0 0xb7fe1424 in __kernel_vsyscall ()
#1 0xb7d1c48c in pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:169
#2 0xb7f8f2dc in evthread_posix_cond_wait ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#3 0xb7f776a0 in event_del_internal ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#4 0xb7f7752d in event_del () from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#5 0xb7f83b12 in be_socket_destruct ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#6 0xb7f82172 in _bufferevent_decref_and_unlock ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#7 0xb7f823c9 in bufferevent_free ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
(gdb) thread 10
[Switching to thread 10 (Thread 0xb6a6db70 (LWP 18334))]#0 0xb7fe1424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7fe1424 in __kernel_vsyscall ()
#1 0xb7d1f0b9 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xb7d1a559 in _L_lock_859 () from /lib/i386-linux-gnu/libpthread.so.0
#3 0xb7d1a3eb in __pthread_mutex_lock (mutex=0xb6100780) at pthread_mutex_lock.c:82
#4 0xb7f8f0cb in evthread_posix_lock ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#5 0xb7f82064 in _bufferevent_incref_and_lock ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#6 0xb7f82f7f in bufferevent_writecb ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#7 0xb7f74d91 in event_persist_closure ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#8 0xb7f74ea7 in event_process_active_single_queue ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#9 0xb7f7510c in event_process_active ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#10 0xb7f75764 in event_base_loop ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#11 0xb7f751a2 in event_base_dispatch ()
from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
The way I understand it,
in event_del_internal() the pthread_cond_wait is done while holding the bufev lock (acquired by _bufferevent_decref_and_unlock )
in event_process_active_single_queue, the callback is bufferevent_writecb which tries to acquire the bufev lock. the mutex will only be signaled when bufferevent_writecb returns, which is not going to happen because the thread is locked->deadlock.
The code can reach this point because event_process_active_single_queue temporarily releases th_base_lock before calling the callback function, which leaves a window for the other thread to acquire it, test for (base->current_event == ev ) and enter the pthread_cond_wait.
Said differently:
1)
bufferevent_free
BEV_LOCK(bufev)
_bufferevent_decref_and_unlock
be_socket_destruct
event_del
ACQUIRE(th_base_lock)
event_del_internal(event ev)
base = ev->ev_base;
if (base->current_event == ev )
EVTHREAD_COND_WAIT(base->current_event_cond, base->th_base_lock);
2)
event_base_dispatch
event_base_loop
EVBASE_ACQUIRE_LOCK(base, th_base_lock);
event_process_active
event_process_active_single_queue
base->current_event = ev;
EVBASE_RELEASE_LOCK(base, th_base_lock)
USER_CB
bufferevent_writecb
_bufferevent_incref_and_lock(bufferevent bufev)
BEV_LOCK(bufev);
Any idea how to fix this? I can't see a way out.
What if you free the bufferevent on the thread running event_base_dispatch?
Create a new event with a callback that free that bufferevent. Make it active from the thread that call bufferevent_free today.