[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[Libevent-users] Linux, epoll, libevent, regressions, stability, and 2.0.9



I come to you with a heavy heart.

So, as Gilad found out a week or so ago, LInux has some serious issues
when using epoll() with dup().  If you have two fds that refer to the
same file, and you tell epoll_ctl() to listen for events on one, and
then you close it but leave the other one open, epoll_wait() will
continue to report the events that you registered on that fd ad
infinitum, until all fds for the file are closed.

Older versions of Libevent (1.4 and earlier) had no problem here,
since event_add() and event_del() would call epoll_ctl() immediately.
But in Libevent 2.0, we decided to fix another problem.  In practice,
it's very frequent for a program to add and delete the same fd several
times between dispatching.  For example, you might flush all your data
to an fd, delete its write event, then invoke a callback which queues
more data to flush, thus adding another write event.  In practice, it
wasn't unusual to add and delete the same fd three or four times
between dispatches.

So to save time in Libevent 2.0, we have a "changelist" mechanism that
queues up all the modifications for a backend (currently only epoll
and kqueue use it) so they can all get handled at once when we go to
dispatch.  This is what epoll has been using since 2.0.4-alpha, back
in February.

But the  changelist code won't work with current Linux epoll_ctl() and
dup(), since if the user deletes an event then closes a dupped fd, we
really need the event_del() to call epoll_ctl() immediately, or else
we'll hit the kernel issue.

I hacked up a variant epoll backend to see how hard it would be to
revert to the old behavior.  It turns out it isn't so tricky.  Doing
so, of course, means reverting to the old behavior where we would do
way more epoll_ctl() system calls than necessary .  Also, the reverted
backend is not nearly so tested as the changelist-based backend.

So, I see 4 options for Libevent 2.0.  Here are two options that I am
NOT considering so much:
  * Include only the changelist backend.  Programs like Gilad's will
have no way to use an O(1) backend.  Too bad for them!
  * Include only the non-changelist backend.  Everybody using epoll
will need to do extra epoll_ctl() calls whether they do dup() or not.
Too bad for them!

Here are the two options that I *am* considering:
  * Include both backends; make the existing changelist backend on by
default.  The problem here is that it represents a genuine regression
against Libevent 1.4, and I really hate having regressions.  A library
that accepts regressions for well-formed code using older versions is
IMO being very rude to its users, and encouraging people to worry
about upgrading.
  * Include both backends; make the non-changelist backend on by
default.  The problems here are that a) the non-changelist backend is
slower, and most people won't do whatever is necessary to activate the
faster one, and b) the non-changelist backend has had not nearly so
much testing as the current changelist-based backend.  If we do this,
the lack of testing means we cannot possibly call 2.0.9
"2.0.9-stable"; we'll need to call it "-rc" instead. :/

I am currently leaning towards the last option.  Efficiency is
important, but even more important is knowing that if you wrote a
program using Libevent version N, your program will still work when
Libevent N+1 is released.  Setting an option to enable extra
performance is more important than setting an option to enable
backward compatibility.

Or at least that's what I think tonight.  Please, let me know if I'm
wrong.  But keep in mind that if you argue that it's okay for Libvent
2.0 to break a well-behaved Libevent 1.4 program, you are also arguing
that it's okay for Libevent 2.1 to break any program that you are
writing for Libevent today.

Software-is-hard.-Let's-go-shopping-ly yrs,
-- 
Nick
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.