[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] Deadlock when calling bufferevent_free from an other thread





On Mon, Aug 6, 2012 at 9:42 PM, Matthieu Nottale <mnottale@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi.

I'm experiencing a deadlock on 2.0.19 while calling bufferevent_free frome thread A, while thread B is in event_base_dispatch.

Here are the two relevant backtraces:

(gdb) bt
#0  0xb7fe1424 in __kernel_vsyscall ()
#1  0xb7d1c48c in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:169
#2  0xb7f8f2dc in evthread_posix_cond_wait ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#3  0xb7f776a0 in event_del_internal ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#4  0xb7f7752d in event_del () from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#5  0xb7f83b12 in be_socket_destruct ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#6  0xb7f82172 in _bufferevent_decref_and_unlock ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#7  0xb7f823c9 in bufferevent_free ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so



(gdb) thread 10
[Switching to thread 10 (Thread 0xb6a6db70 (LWP 18334))]#0  0xb7fe1424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7fe1424 in __kernel_vsyscall ()
#1  0xb7d1f0b9 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2  0xb7d1a559 in _L_lock_859 () from /lib/i386-linux-gnu/libpthread.so.0
#3  0xb7d1a3eb in __pthread_mutex_lock (mutex=0xb6100780) at pthread_mutex_lock.c:82
#4  0xb7f8f0cb in evthread_posix_lock ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#5  0xb7f82064 in _bufferevent_incref_and_lock ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#6  0xb7f82f7f in bufferevent_writecb ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#7  0xb7f74d91 in event_persist_closure ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#8  0xb7f74ea7 in event_process_active_single_queue ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#9  0xb7f7510c in event_process_active ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#10 0xb7f75764 in event_base_loop ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so
#11 0xb7f751a2 in event_base_dispatch ()
   from /home/bearclaw/aldebaran/qi-2/lib/qimessaging/build-linux32/sdk/lib/libqimessaging.so



The way I understand it,
in event_del_internal() the pthread_cond_wait is done while holding the bufev lock (acquired by _bufferevent_decref_and_unlock )
in event_process_active_single_queue, the callback is bufferevent_writecb which tries to acquire the bufev lock. the mutex will only be signaled when bufferevent_writecb returns, which is not going to happen because the thread is locked->deadlock.

The code can reach this point because event_process_active_single_queue temporarily releases th_base_lock before calling the callback function, which leaves a window for the other thread to acquire it, test for (base->current_event == ev ) and enter the pthread_cond_wait.

Said differently:
1)
  bufferevent_free
      BEV_LOCK(bufev)
     _bufferevent_decref_and_unlock
         be_socket_destruct
           event_del
               ACQUIRE(th_base_lock)
               event_del_internal(event ev)
                  base = ev->ev_base;
                  if (base->current_event == ev )
                       EVTHREAD_COND_WAIT(base->current_event_cond, base->th_base_lock);



2)
 event_base_dispatch
 event_base_loop
    EVBASE_ACQUIRE_LOCK(base, th_base_lock);
    event_process_active
      event_process_active_single_queue
         base->current_event = ev;
         EVBASE_RELEASE_LOCK(base, th_base_lock)
         USER_CB
            bufferevent_writecb
              _bufferevent_incref_and_lock(bufferevent bufev)
                  BEV_LOCK(bufev);

Any idea how to fix this? I can't see a way out.

What if you free the bufferevent on the thread running event_base_dispatch?

Create a new event with a callback that free that bufferevent. Make it active from the thread that call bufferevent_free today.