[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] 100% cpu utilization with openssl bufferevent.



Hello All,

it is possible, that you see same problem, which I have
observed on my ulevpoll backend for epoll syscall.

The kernel processing of wait queues does not distinguish
if the event if the event is POLLIN or POLLOUT when both
events are reported by single wait queue. The layer reporting
events back to userspace recheck the condition normally
(for poll or select) and because after device/socket file operations
poll call it finds, that there is no real event, kernel does not
return to userspace and userspace visible behavior is correct.
There is some waste of cycles by unneeded context switch in kernel
and events recheck but not so more.

Problem is for epoll case. Because kernel does not process full
recheck before epoll base fd POLLIN is assigned.
This  means that userpace is woken up. But in call obtaining
list of active events there is the full check for conditions
and because no match is found kernel returns 0.

I my case problem demonstrated when when I enabled debugging
which print result of each epoll wait call.
I have registered fd 0 (console input) for POLLIN only
to have ability to terminate program or command it
from terminal. When printf wrote debugging information
to console, write condition has changed and has caused
POLLIN to be set on epoll fd even that fd 0 has been
registered only for POLLIN. Due epoll fd active kernel
finished wait, but when event has been read there has been
no real event, but my debugging code wrote report about that
to console. This recorded epoll fd POLLIN event again
=> busy loop.

The problem has been analyzed by Davide Libenzi and he provided
solution which allows distinguish correctly between event
types. This correct epoll behavior and has significant
positive effect on performace for some sockets use scenarios
even unrelated to epoll. Patch is integrated in mainline
kernel 2.6.30+

37e5540b3c9d838eb20f2ca8ea2eb8072271e403
PATCH: epoll keyed wakeups: make sockets use keyed wakeups

http://thread.gmane.org/gmane.linux.kernel/786236
http://article.gmane.org/gmane.linux.kernel/790696/match=epoll

You can easily check, if this is cause of your troubles
by running same code on 2.6.30+ kernel.
If you need correct behavior even on older kernels,
then it can be problematic. Basically you have to do
no I/O or changes related to any of FDs registered in epoll
if event count 0 is reported.

Best wishes,


                Pavel Pisa
    e-mail:     pisa@xxxxxxxxxxxxxxxx
    www:        http://cmp.felk.cvut.cz/~pisa
    university: http://dce.felk.cvut.cz/
    company:    http://www.pikron.com/



On Thursday 29 April 2010 18:35:30 Nick Mathewson wrote:
> On Thu, Apr 29, 2010 at 5:19 AM, Sebastian Sjöberg
>
> <Sebastian.Sjoberg@xxxxxxxx> wrote:
> > Hi,
> >
> > I've encountered a problem with openssl bufferevents where libevent
> > reports fd:s as writeable but no action is being taken.
>
>  [...]
>
> > There is no problem when I'm connecting without tls so I think this is an
> > issue with openssl bufferevents and my guess is that somehow the write
> > events that openssl bufferevents sets up sometimes doesn't get removed or
> > disabled properly.
> >
> > Is this an issue that someone else has seen and does anyone have any
> > pointers on how to debug this problem?
>
> I haven't run into this myself yet, but the openssl code is relatively
> new, and probably has some bugs left.
>
> To clarify, it seems that the problem is that Libevent bufferevent
> openssl code never deletes the relevant read events, even though it
> isn't actually interested in reading?  Or the problem is that epoll is
> returning immediately but not making any events active?
>
> If it's the first problem, I'd try adding debugging messages to the
> points in bufferevent_openssl that call event_add, event_del, and
> _bufferevent_add_event, along with debugging statements to display the
> return values of SSL_read and SSL_write, to see at what point we're
> supposed to be deleting the relevant read event but not really doing
> it.
>
> If it's the second problem, I'd start by testing whether stuff begins
> to work when you set the EVENT_NOEPOLL environment variable.  If so,
> then the bug is probably with the epoll backend -- or at least, it
> requires the epoll backend to appear.  To debug this, I'd add
> debugging messages to the loop in epoll_dispatch that calls
> evmap_io_active to tell me whenever it decided not to call
> evmap_io_active, and I'd have evmap_io_active tell me whenever it made
> 0 events become active.
>
> With any luck, the debugging output should help figure out exactly
> what's going wrong here.
>
> I'm afraid I'm about to be away from the internet for tomorrow and the
> weekend, so I won't be able to help much more until early next week.
> Good luck!
>
> yrs,
> --
> Nick
> ***********************************************************************
> To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
> unsubscribe libevent-users    in the body.

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.