[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] Linux, epoll, libevent, regressions, stability, and 2.0.9



This kernel bug is entirely nonobvious so personally I'd either drop the
changelist version entirely for now or default to the conservative
version.

If the latter, people who want the improved performance and understand
that the program they are using will work fine can turn it on (or
recommend to their users that they do).

I like the idea (if possible) of an event_safe_dup or something which
works correctly for both versions.

Regards


On Sat, Nov 20, 2010 at 03:25:05AM -0500, Nick Mathewson wrote:
> I come to you with a heavy heart.
> 
> So, as Gilad found out a week or so ago, LInux has some serious issues
> when using epoll() with dup().  If you have two fds that refer to the
> same file, and you tell epoll_ctl() to listen for events on one, and
> then you close it but leave the other one open, epoll_wait() will
> continue to report the events that you registered on that fd ad
> infinitum, until all fds for the file are closed.
> 
> Older versions of Libevent (1.4 and earlier) had no problem here,
> since event_add() and event_del() would call epoll_ctl() immediately.
> But in Libevent 2.0, we decided to fix another problem.  In practice,
> it's very frequent for a program to add and delete the same fd several
> times between dispatching.  For example, you might flush all your data
> to an fd, delete its write event, then invoke a callback which queues
> more data to flush, thus adding another write event.  In practice, it
> wasn't unusual to add and delete the same fd three or four times
> between dispatches.
> 
> So to save time in Libevent 2.0, we have a "changelist" mechanism that
> queues up all the modifications for a backend (currently only epoll
> and kqueue use it) so they can all get handled at once when we go to
> dispatch.  This is what epoll has been using since 2.0.4-alpha, back
> in February.
> 
> But the  changelist code won't work with current Linux epoll_ctl() and
> dup(), since if the user deletes an event then closes a dupped fd, we
> really need the event_del() to call epoll_ctl() immediately, or else
> we'll hit the kernel issue.
> 
> I hacked up a variant epoll backend to see how hard it would be to
> revert to the old behavior.  It turns out it isn't so tricky.  Doing
> so, of course, means reverting to the old behavior where we would do
> way more epoll_ctl() system calls than necessary .  Also, the reverted
> backend is not nearly so tested as the changelist-based backend.
> 
> So, I see 4 options for Libevent 2.0.  Here are two options that I am
> NOT considering so much:
>   * Include only the changelist backend.  Programs like Gilad's will
> have no way to use an O(1) backend.  Too bad for them!
>   * Include only the non-changelist backend.  Everybody using epoll
> will need to do extra epoll_ctl() calls whether they do dup() or not.
> Too bad for them!
> 
> Here are the two options that I *am* considering:
>   * Include both backends; make the existing changelist backend on by
> default.  The problem here is that it represents a genuine regression
> against Libevent 1.4, and I really hate having regressions.  A library
> that accepts regressions for well-formed code using older versions is
> IMO being very rude to its users, and encouraging people to worry
> about upgrading.
>   * Include both backends; make the non-changelist backend on by
> default.  The problems here are that a) the non-changelist backend is
> slower, and most people won't do whatever is necessary to activate the
> faster one, and b) the non-changelist backend has had not nearly so
> much testing as the current changelist-based backend.  If we do this,
> the lack of testing means we cannot possibly call 2.0.9
> "2.0.9-stable"; we'll need to call it "-rc" instead. :/
> 
> I am currently leaning towards the last option.  Efficiency is
> important, but even more important is knowing that if you wrote a
> program using Libevent version N, your program will still work when
> Libevent N+1 is released.  Setting an option to enable extra
> performance is more important than setting an option to enable
> backward compatibility.
> 
> Or at least that's what I think tonight.  Please, let me know if I'm
> wrong.  But keep in mind that if you argue that it's okay for Libvent
> 2.0 to break a well-behaved Libevent 1.4 program, you are also arguing
> that it's okay for Libevent 2.1 to break any program that you are
> writing for Libevent today.
> 
> Software-is-hard.-Let's-go-shopping-ly yrs,
> -- 
> Nick
> ***********************************************************************
> To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
> unsubscribe libevent-users    in the body.
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.