[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] deadlock in callbacks



Hi Azat,

The backtraces I included are from the only threads waiting while holding a lock. The other threads are providing convincing signs of good health in the trace log.

Definitely the write callback was called tens of thousand of times before incurring in the deadlock, and I suspect luck or chance has little to do with the deadlock. It may have more to do with the fact that the read callback receiving a message send it down a bufferevent with the write callback handled by another thread?Â

The fact is that after adding BEV_OPT_UNLOCK_CALLBACKS the deadlock has not appeared again. Before it took about one hour of stress test to appear. If removing the callback lock solved the problem, please help me understand what was happening in the first place and if for instance the added lock inside the callback was unnecessary or even the cause of the deadlock itself.

Also, since enabling OpenSSL is coming next, I was wondering if you have a word of wisdom or two, on how to better tackle the task at hands or which version of libevent is best suited for the job. I am using the head of the tree, or to do some debugging if that it was it takes, I would be okay with it. I am afraid that locking the callbacks may be needed with OpenSSL enabled.

Thanks for the useful insights.

-Marco G.

On Wed, Feb 11, 2015 at 3:11 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
On Tue, Feb 10, 2015 at 09:06:40PM -0800, Marco Graziano wrote:
> I am running into a deadlock with both libevent 2.0 and 2.1 that I cannot
> explain. One thread is executing a write callback and the other thread a
> read callback of a different bufferevent. Since they are both accessing the
> same data structure passed to the callbacks (arg=0x630c40), I am using a
> rwlock in the data structure. I did not expect though the evbuffer internal
> lock to interfere with the lock used to synchronize access to the data
> structure shared between the callbacks.
>
> I am including the backtraces and the relevant portions of code. I would
> appreciate if someone can give me any suggestion on how to work around this.
>
> Thanks,
>
> -Marco G.
>
> (gdb) t 1
> [Switching to thread 1 (Thread 0x7ffff7fdd780 (LWP 12008))]
> (gdb) bt
> #0Â __lll_lock_wait () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1Â 0x00007ffff7770672 in _L_lock_953 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2Â 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x620200) at
> ../nptl/pthread_mutex_lock.c:114
> #3Â 0x00007ffff7b9686e in evbuffer_get_length (buffer=0x630930) at
> buffer.c:610

I don't sure that this two backtraces could have deadlock. Do you?
Could it call write callback again and again, but you was so lucky that
you always see such backtrace?

Maybe some other thread acquire some locks in other order?
(I bet to this one).

Also did you check program under memcheck/drd/helgrind?
Again if you was so *lucky* that you overwrite lock of evbuffer by your
own, but it is so unlikely, that you could just ignore this.

And if you could write 50-100L sample to reproduce the issue, it will be
very helpful.

Cheers,
Azat.

> #4Â 0x000000000040494b in device_write_cb (bev=0x123f640, arg=0x630c40) at
> cometa.c:667
> #5Â 0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0, iotype=4,
> bufev=0x123f640) at bufferevent-internal.h:369
> #6Â bufferevent_writecb (fd=<optimized out>, event=<optimized out>,
> arg=0x123f640) at bufferevent_sock.c:297
> #7Â 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
> base=0x6218d0) at event.c:1531
> #8Â event_process_active_single_queue (base=base@entry=0x6218d0,
> activeq=0x621d20, max_to_process=max_to_process@entry=2147483647,
>Â Â Âendtime=endtime@entry=0x0) at event.c:1590
> #9Â 0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at
> event.c:1689
> #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912
> #11 0x000000000040a700 in main () at cometa.c:1571
>
>
> bufferevent 0x123f640 write callback:
> 665Â Â pthread_rwlock_wrlock(&device->lock);
> 666Â Â if ((device->pending && device->state == WAITING_STATE) ||
> device->ws_pending) {
> 667Â Â Âint l = evbuffer_get_length(device_bev);
> 668Â Â Âpthread_rwlock_unlock(&device->lock);
> 669Â Â Âoutput = bufferevent_get_output(device->pending->bufev);
> 670 if (evbuffer_get_length(output) == 0) {
> 671 /* free and close the connection */
> 672 pthread_rwlock_wrlock(&device->lock);
> 673Â Â Âfree_http_connection(device->pending);
> 674Â Â Âdevice->pending = NULL;
> 675Â Â Âdevice->state = IDLE_STATE;
> 676Â Âdevice->chunk_length = 0;
> 677Â Â Âdevice->ntoread = -1;
> 678
> 679Â Â Âevbuffer_drain (device_bev, l);
> 680Â Âsem_post(&device->active);
> 681Â Âpthread_rwlock_unlock(&device->lock);
> 682Â Â Â}
> 683Â Â } else {
> 684Â Â Âpthread_rwlock_unlock(&device->lock);
> 685Â Â }
>
>
> (gdb) t 3
> [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 12013))]
> (gdb) bt
> #0Â pthread_rwlock_wrlock () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
> #1Â 0x0000000000404667 in device_read_cb (bev=0x630730, arg=0x630c40) at
> cometa.c:604
> #2Â 0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0, iotype=2,
> bufev=0x630730) at bufferevent-internal.h:366
> #3Â bufferevent_readcb (fd=34, event=<optimized out>, arg=0x630730) at
> bufferevent_sock.c:187
> #4Â 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
> base=0x7fffe8000900) at event.c:1531
> #5Â event_process_active_single_queue (base=base@entry=0x7fffe8000900,
> activeq=0x7fffe8000d50,
>Â Â Âmax_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0)
> at event.c:1590
> #6Â 0x00007ffff7ba82cf in event_process_active (base=0x7fffe8000900) at
> event.c:1689
> #7Â event_base_loop (base=0x7fffe8000900, flags=0) at event.c:1912
> #8Â 0x000000000040a1ff in device_loop (arg=0x0) at cometa.c:1437
> #9Â 0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at
> pthread_create.c:312
> #10 0x00007ffff689700d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> bufferevent 0x630730 read call back:
> 604 pthread_rwlock_wrlock(&device->lock);
> 605Â Â if (device->state != WAITING_STATE) {
> 606
> 607Â Âdevice->chunk_length = 0;
> 608Â Â Âdevice->ntoread = -1;
> 609Â Â Â/* in case of state == WAITING_STATE the counters reset is done in
> the write callback after sending the response */
> 610Â Â }
> 611 pthread_rwlock_unlock(&device->lock);
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users  in the body.