[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] epoll erros



[summary: Nick looks for a bug, and finds only weirdness.  But I think
we're on to something, and we *are* making progress.  Big find: there
doesn't seem to be anything wrong with the changlist logic; something
weird is up with epoll here. Could other people please help me look
into this?]

On Fri, Oct 22, 2010 at 12:20 AM, Gilad Benjamini
<gilad@xxxxxxxxxxxxxxxxx> wrote:
>> -----Original Message-----
>> From: Nick Mathewson <nickm@xxxxxxxxxxxxx>
>> Sent: Tuesday, September 21, 2010 7:43 PM
>> To: libevent-users@xxxxxxxxxxxxx
>> Subject: Re: [Libevent-users] epoll erros
>> ...
>>
>> I just added a couple more detailed debugging messages in the git
>> master branch to help try to figure out what exactly we're doing to
>> epoll fd, and why.  If you can get the same output with the new stuff
>> from git master, that would be great.
>>
>
> I am correct to assume these additions made it into 2.0.8-rc ?
> Assuming they did, I am attaching the output when run with 2.0.8-rc.
>
> If that's not enough, please direct me to the specific changes and I'll run
> with them

Hm.  Here's the first instance of the bug at issue here.  Let's see if
it means anything.  I've cut out some lines only having to do with
timeout events, so everything that remains is about the
soon-to-be-infamous fd 13:

> [debug] event_del: 0x801e7618 (fd 13), callback 0x80105870

Here's the first appearance of fd 13.  It's in event 0x801e7618, which
I'll be calling 618 for short.  Note that this is a delete of an event
that hasn't been added yet.  That's allowed.

> [debug] event_add: event: 0x801e7618 (fd 13),  EV_WRITE  call 0x80105870

Then we add that event again.  It's an EV_WRITE event.

> [debug] Epoll ADD(4) on fd 13 okay. [old events were 0; read change was 0; write change was 1]

And now fd 13 hits epoll for the first time.  We add it for
writing(4==EPOLLOUT), since the write change is 1 (1==EV_CHANGE_ADD)

> [debug] epoll_dispatch: epoll_wait reports 1
> [debug] event_active: 0x801e7618 (fd 13), res 4, callback 0x80105870
> [debug] event_process_active: event: 0x801e7618,  EV_WRITE call 0x80105870

The event triggers! We process it....

> [debug] event_del: 0x801e7618 (fd 13), callback 0x80105870

Then delete it.

> [debug] event_add: event: 0x801e7618 (fd 13), EV_READ   call 0x80105870

Now we add it.  This time 618 is a write event on fd 13.  Interesting!
 It looks like the architecture of this program is to have one event
per fd (or per fd of this kind), and use event_assign [or event_set if
this is legacy code] to switch it between reading and writing as
needed.

This is allowed, but there's a possible bug to watch out for: you must
never event_assign [or event_set] an event that has been added without
first deleting it.  That is almost certainly not happening here, since
Gilad has said that his code is calling event_enable_debug_mode(), and
that is supposed to catch any attempt to event_assign or event_set on
an added event.

It won't catch attempts to manually reassign the event's ev_events
while the event is added.  I sure hope nobody is doing that.  Don't
touch struct event's fields by hand.

So this is a little bug-prone in the wrong hands, but there's no
actual bug evident here yet.

> [debug] Epoll MOD(1) on fd 13 okay. [old events were 4; read change was 1; write change was 2]

So now we call epoll_ctl again, with command EPOLL_MOD(1==EPOLLIN).
MOD is right, because fd 13 was added before.  The write
change(EV_CHANGE_DEL) is correct because we were previously writing.
The read change is right because we added a read.  The old events were
indeed 4==EV_WRITE.

> [debug] epoll_dispatch: epoll_wait reports 1
> [debug] event_active: 0x801e7618 (fd 13), res 2, callback 0x80105870
> [debug] event_process_active: event: 0x801e7618, EV_READ  call 0x80105870

The read event triggers.

> [debug] epoll_dispatch: epoll_wait reports 1
> [debug] event_active: 0x801e7618 (fd 13), res 2, callback 0x80105870
> [debug] event_process_active: event: 0x801e7618, EV_READ  call 0x80105870

The read event triggers again.

> [debug] event_del: 0x801e7618 (fd 13), callback 0x80105870
> [debug] event_add: event: 0x801e7618 (fd 13),  EV_WRITE  call 0x80105870
> [debug] event_del: 0x801e7618 (fd 13), callback 0x80105870
> [debug] event_add: event: 0x801e7618 (fd 13), EV_READ   call 0x80105870

We delete it, then it gets added as a write event, and deleted, and
added again as a read event.

So that's event_del(read), event_add(write), event_del(write), and
event_add(read).  ADD-then-DEL  coalesces into a no-op, but
DEL-then-ADD coalesces into an ADD that might be redundant.

> [debug] Epoll ADD(1) on fd 13 gave File exists: ADD was redundant

Yup.

> [debug] epoll_dispatch: epoll_wait reports 1
> [debug] event_active: 0x801e7618 (fd 13), res 2, callback 0x80105870
> [debug] event_process_active: event: 0x801e7618, EV_READ  call 0x80105870
> [debug] event_del: 0x801e7618 (fd 13), callback 0x80105870

The read triggers, then we delete it.

> [debug] event_process_active: event: 0x801e76a8,   call 0x80102126
> [debug] Epoll DEL(1) on fd 13 gave Bad file descriptor: DEL was unnecessary.

When we go to delete it, we get an EBADF.  This happens when the file
is closed then we try to do an operation on it.  It should be fine.

> [debug] epoll_dispatch: epoll_wait reports 0
> [debug] event_process_active: event: 0x801d5a98,   call 0x80102126
> [debug] event_del: 0x801e76a0 (fd 13), callback 0x80105870
> [debug] event_add: event: 0x801e76a0 (fd 13),  EV_WRITE  call 0x80105870
> [debug] event_del: 0x801e76a0 (fd 13), callback 0x80105870
> [debug] event_add: event: 0x801e76a0 (fd 13), EV_READ   call 0x80105870

Now here comes a new event on fd 13: 0x801e76a0 (6a0 for short).
Let's see what happens.  We delete an (unknown) event, then add it for
writing, then delete that, then add it again for reading.

> [warn] Epoll ADD(1) on fd 13 failed.  Old events were 0; read change was 1 (add); write change was 0 (none): File exists

And boom, there's the bug.  This is not consistent with the sequence
above; if you replay the Epoll lines above with a test socketpair, and
try to close and reopen the socket at the appropriate time, you'll
find that you don't get the EEXIST here.  I've attached a quick test
program to demonstrate that this sequence of operations shouldn't get
an EEXIST.

To recap, the sequence of epoll events on 13 was:
1: [debug] Epoll ADD(4) on fd 13 okay. [old events were 0; read change
was 0; write change was 1]
2: [debug] Epoll MOD(1) on fd 13 okay. [old events were 4; read change
was 1; write change was 2]
3: [debug] Epoll ADD(1) on fd 13 gave File exists: ADD was redundant
4: (close the fd here, or else you won't see an EBADF in the next step)
5: [debug] Epoll DEL(1) on fd 13 gave Bad file descriptor: DEL was unnecessary.
6: (reopen the fd here, or else you *will* see an EBADF in the next step.)
7: [warn] Epoll ADD(1) on fd 13 failed.  Old events were 0; read
change was 1 (add); write change was 0 (none): File exists

But when I run my test program with Linux 2.6.34.7, I get:
1: add(OUT)	OK
2: mod(IN)	OK
3: add(IN)	17: File exists
4: close
5: del(IN)	9: Bad file descriptor
6: reopen
7: add(IN)	OK

So, something's up!  The logs have an eexist, an EEXIST, but the Epoll
operations above don't cause an EEXIST on their own.

A couple of possibilities, none of them very pleasant looking:
 1) The debug logs are wrong.  There are more epoll_ctl operations
going on than are getting logged.
 2) I am wrong about the close; something else causes the EBADF on step 5.
 3) I am wrong about the reopen; something else prevents an EBADF at step 7.
 4) The debug logs are confused somehow; the events are all happening,
and no other events are happening, but things are somehow not as they
seem and there is missing information that would make this all clear.
 5) My test harness is somehow wrong for your situation; maybe if you
use pipes or something, everthing is

1 seems unlikely from a simple read-through of the libevent code:
there is no way to call epoll_ctl without a message getting logged as
far as I can tell.

If 2, then what else might cause an EBADF at step 5?  If 3, then what
else might prevent an EBADF at step 7? (A cursory read of
fs/eventpoll.c in the Linux 2.6.34.7 source turns up nothing for
either.)

4 is conceivable under a couple of weird circumstances.  Might there
be more than one event_base in the application, so that fd 13 is
actually getting added/removed from separate event_bases?  Might there
be multithreading issues?  Is the application forking, then calling
(or not calling) event_reinit, then using Libevent stuff?

5 is also conceivable; if you can find some way to tweak the attached
program so that step 7 gets an eexist from a series of operations
similar to those in your program as logged above, I'd love to know.

-- 
Nick
#include <sys/epoll.h>
#include <unistd.h>
#include <sys/fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#include <sys/socket.h>

#define DIE(s) do { perror(s); exit(1); } while(0)

int fds[2] = {-1,-1};
int epoll_fd;

int opnum=0;

void
init(void)
{
	epoll_fd = epoll_create(32000);
	if (epoll_fd < 0)
		DIE("epoll_create");

	if (socketpair(AF_UNIX, SOCK_STREAM, 0, fds) < 0)
		DIE("socketpair (1)");
}

void
stop(void)
{
	printf("%d: close\n", ++opnum);
	close(fds[0]);
	close(fds[1]);
}

void
reopen(void)
{
	int newfds[2];
	printf("%d: reopen\n", ++opnum);
	if (socketpair(AF_UNIX, SOCK_STREAM, 0, newfds) < 0)
		DIE("socketpair (2)");
	if (newfds[0] != fds[0] || newfds[1] != fds[1]) {
		printf("reopen failed to get the same fds\n");
		exit(1);
	}
}

void
op(int ctl, int arg)
{
	struct epoll_event ev;
	const char *opname, *argname;
	++opnum;
	switch (ctl) {
	case EPOLL_CTL_ADD: opname = "add"; break;
	case EPOLL_CTL_DEL: opname = "del"; break;
	case EPOLL_CTL_MOD: opname = "mod"; break;
	default:
		abort();
	}
	switch (arg) {
	case EPOLLIN: argname = "IN"; break;
	case EPOLLOUT: argname = "OUT"; break;
	case EPOLLIN|EPOLLOUT: argname = "IN|OUT"; break;
	default:
		abort();
	}

	printf("%d: %s(%s)\t", opnum, opname, argname);
	fflush(stdout);

	memset(&ev, 0, sizeof(ev));
	ev.events = arg;
	if (epoll_ctl(epoll_fd, ctl, fds[0], &ev)<0) {
		printf("%d: %s\n", errno, strerror(errno));
	} else {
		printf("OK\n");
	}
}

static void add(int what) { op(EPOLL_CTL_ADD, what); }
static void del(int what) { op(EPOLL_CTL_DEL, what); }
static void mod(int what) { op(EPOLL_CTL_MOD, what); }

int
main(int argc, char **argv)
{
	init();

	add(EPOLLOUT);
	mod(EPOLLIN);
	add(EPOLLIN);
	stop();
	del(EPOLLIN);
	reopen();
	add(EPOLLIN);

	return 0;
}