[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] deadlock in callbacks



On Thu, Feb 12, 2015 at 09:16:20AM -0800, Marco Graziano wrote:
> Hi Azat,

Hi Marco,

> 
> The backtraces I included are from the only threads waiting while holding a
> lock. The other threads are providing convincing signs of good health in
> the trace log.

Could you please post backstraces for all available threads?
(I hope you don't have 100 of them).

> 
> Definitely the write callback was called tens of thousand of times before
> incurring in the deadlock, and I suspect luck or chance has little to do
> with the deadlock. It may have more to do with the fact that the read
> callback receiving a message send it down a bufferevent with the write
> callback handled by another thread?
> 
> The fact is that after adding BEV_OPT_UNLOCK_CALLBACKS the deadlock has not
> appeared again. Before it took about one hour of stress test to appear. If
> removing the callback lock solved the problem, please help me understand
> what was happening in the first place and if for instance the added lock
> inside the callback was unnecessary or even the cause of the deadlock
> itself.
> 
> Also, since enabling OpenSSL is coming next, I was wondering if you have a
> word of wisdom or two, on how to better tackle the task at hands or which
> version of libevent is best suited for the job. I am using the head of the
> tree, or to do some debugging if that it was it takes, I would be okay with
> it. I am afraid that locking the callbacks may be needed with OpenSSL
> enabled.

I prefer to use the latest versions, to be up-to-date.
bufferevent_openssl doesn't require bufferevent locking - look at
bufferevent_openssl_new_impl() in bufferevent_openssl.c

> 
> Thanks for the useful insights.
> 
> -Marco G.

And for the next time: could you avoid top-posting please.

Cheers,
Azat.

> 
> On Wed, Feb 11, 2015 at 3:11 PM, Azat Khuzhin <a3at.mail@xxxxxxxxx> wrote:
> 
> > On Tue, Feb 10, 2015 at 09:06:40PM -0800, Marco Graziano wrote:
> > > I am running into a deadlock with both libevent 2.0 and 2.1 that I cannot
> > > explain. One thread is executing a write callback and the other thread a
> > > read callback of a different bufferevent. Since they are both accessing
> > the
> > > same data structure passed to the callbacks (arg=0x630c40), I am using a
> > > rwlock in the data structure. I did not expect though the evbuffer
> > internal
> > > lock to interfere with the lock used to synchronize access to the data
> > > structure shared between the callbacks.
> > >
> > > I am including the backtraces and the relevant portions of code. I would
> > > appreciate if someone can give me any suggestion on how to work around
> > this.
> > >
> > > Thanks,
> > >
> > > -Marco G.
> > >
> > > (gdb) t 1
> > > [Switching to thread 1 (Thread 0x7ffff7fdd780 (LWP 12008))]
> > > (gdb) bt
> > > #0  __lll_lock_wait () at
> > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > #1  0x00007ffff7770672 in _L_lock_953 () from
> > > /lib/x86_64-linux-gnu/libpthread.so.0
> > > #2  0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x620200) at
> > > ../nptl/pthread_mutex_lock.c:114
> > > #3  0x00007ffff7b9686e in evbuffer_get_length (buffer=0x630930) at
> > > buffer.c:610
> >
> > I don't sure that this two backtraces could have deadlock. Do you?
> > Could it call write callback again and again, but you was so lucky that
> > you always see such backtrace?
> >
> > Maybe some other thread acquire some locks in other order?
> > (I bet to this one).
> >
> > Also did you check program under memcheck/drd/helgrind?
> > Again if you was so *lucky* that you overwrite lock of evbuffer by your
> > own, but it is so unlikely, that you could just ignore this.
> >
> > And if you could write 50-100L sample to reproduce the issue, it will be
> > very helpful.
> >
> > Cheers,
> > Azat.
> >
> > > #4  0x000000000040494b in device_write_cb (bev=0x123f640, arg=0x630c40)
> > at
> > > cometa.c:667
> > > #5  0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0,
> > iotype=4,
> > > bufev=0x123f640) at bufferevent-internal.h:369
> > > #6  bufferevent_writecb (fd=<optimized out>, event=<optimized out>,
> > > arg=0x123f640) at bufferevent_sock.c:297
> > > #7  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
> > > base=0x6218d0) at event.c:1531
> > > #8  event_process_active_single_queue (base=base@entry=0x6218d0,
> > > activeq=0x621d20, max_to_process=max_to_process@entry=2147483647,
> > >     endtime=endtime@entry=0x0) at event.c:1590
> > > #9  0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at
> > > event.c:1689
> > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912
> > > #11 0x000000000040a700 in main () at cometa.c:1571
> > >
> > >
> > > bufferevent 0x123f640 write callback:
> > > 665    pthread_rwlock_wrlock(&device->lock);
> > > 666    if ((device->pending && device->state == WAITING_STATE) ||
> > > device->ws_pending) {
> > > 667     int l = evbuffer_get_length(device_bev);

How device_dev is installed?
(evbuffer_new()/evbuffer_enable_locking())

> > > 668     pthread_rwlock_unlock(&device->lock);
> > > 669     output = bufferevent_get_output(device->pending->bufev);
> > > 670 if (evbuffer_get_length(output) == 0) {
> > > 671 /* free and close the connection */
> > > 672 pthread_rwlock_wrlock(&device->lock);
> > > 673     free_http_connection(device->pending);
> > > 674     device->pending = NULL;
> > > 675     device->state = IDLE_STATE;
> > > 676   device->chunk_length = 0;
> > > 677     device->ntoread = -1;
> > > 678
> > > 679     evbuffer_drain (device_bev, l);
> > > 680   sem_post(&device->active);
> > > 681   pthread_rwlock_unlock(&device->lock);
> > > 682     }
> > > 683    } else {
> > > 684     pthread_rwlock_unlock(&device->lock);
> > > 685    }
> > >
> > >
> > > (gdb) t 3
> > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 12013))]
> > > (gdb) bt
> > > #0  pthread_rwlock_wrlock () at
> > > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
> > > #1  0x0000000000404667 in device_read_cb (bev=0x630730, arg=0x630c40) at
> > > cometa.c:604
> > > #2  0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0,
> > iotype=2,
> > > bufev=0x630730) at bufferevent-internal.h:366
> > > #3  bufferevent_readcb (fd=34, event=<optimized out>, arg=0x630730) at
> > > bufferevent_sock.c:187
> > > #4  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
> > > base=0x7fffe8000900) at event.c:1531
> > > #5  event_process_active_single_queue (base=base@entry=0x7fffe8000900,
> > > activeq=0x7fffe8000d50,
> > >     max_to_process=max_to_process@entry=2147483647,
> > endtime=endtime@entry=0x0)
> > > at event.c:1590
> > > #6  0x00007ffff7ba82cf in event_process_active (base=0x7fffe8000900) at
> > > event.c:1689
> > > #7  event_base_loop (base=0x7fffe8000900, flags=0) at event.c:1912
> > > #8  0x000000000040a1ff in device_loop (arg=0x0) at cometa.c:1437
> > > #9  0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at
> > > pthread_create.c:312
> > > #10 0x00007ffff689700d in clone () at
> > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> > >
> > > bufferevent 0x630730 read call back:
> > > 604 pthread_rwlock_wrlock(&device->lock);
> > > 605    if (device->state != WAITING_STATE) {
> > > 606
> > > 607   device->chunk_length = 0;
> > > 608     device->ntoread = -1;
> > > 609     /* in case of state == WAITING_STATE the counters reset is done
> > in
> > > the write callback after sending the response */
> > > 610    }
> > > 611 pthread_rwlock_unlock(&device->lock);
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.