[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

RE: [Libevent-users] Read failures on Unix socket



I'm actually giving up on this.
IMHO epoll is buggy when it comes to duplicated file descriptors, and
possibly also buggy when file descriptors are just reused. I have seen, for
example, epoll alert on both closed file descriptors and on file descriptors
which were not actually part of the set.
Unfortunately, the people in the LKML don't agree with me.

Considering that "dup" is source of my problems, and that epoll was the
trigger for using dup in the first place, I will revert to using "select"
and avoid the issue altogether.

If someone else is able to push the epoll issues forward, that's great. I'll
be glad to provide insights from my own experience (off this list, as it's
not libevent relevant)

> -----Original Message-----
> From: owner-libevent-users@xxxxxxxxxxxxx [mailto:owner-libevent-
> users@xxxxxxxxxxxxx] On Behalf Of Gilad Benjamini
> Sent: Thursday, November 04, 2010 10:01 PM
> To: libevent-users@xxxxxxxxxxxxx
> Subject: RE: [Libevent-users] Read failures on Unix socket
> 
> > -----Original Message-----
> > From: nick.a.mathewson@xxxxxxxxx [mailto:nick.a.mathewson@xxxxxxxxx]
> On
> > Behalf Of Nick Mathewson
> > Sent: Thursday, November 04, 2010 12:20 PM
> > To: Gilad Benjamini
> > Subject: Re: [Libevent-users] Read failures on Unix socket
> >
> > On Thu, Nov 4, 2010 at 2:09 PM, Gilad Benjamini
> > <gilad@xxxxxxxxxxxxxxxxx> wrote:
> > >> Hm.  Looking at the epoll output in the "nonblocking" case, it
> > doesn't
> > >> look like Libevent is doing anything weird here: epoll_wait() is
> > >> honest-to-goodness saying "Okay to read on fd 15"...
> > >>
> 
> 
> Looking again at the same output, I noticed something interesting
> (output
> below shows only those strace lines which seem relevant)
> - 22:55:21 dup(12)                        = 15
> - 22:55:21 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLOUT, {u32=15,
> u64=15}}) = 0
> - 22:55:21 epoll_wait(4, {{EPOLLOUT, {u32=15, u64=15}}}, 32, 865) = 1
> - 22:55:21 epoll_ctl(4, EPOLL_CTL_MOD, 15, {EPOLLIN, {u32=15, u64=15}})
> = 0
> - 22:55:21 epoll_wait(4, {{EPOLLIN, {u32=15, u64=15}}, {EPOLLIN,
> {u32=14,
> u64=14}}, {EPOLLIN, {u32=13, u64=13}}}, 32, 812) = 3
> - 22:55:21 close(15)                      = 0
> - 22:55:21 epoll_ctl(4, EPOLL_CTL_DEL, 15, {EPOLLIN, {u32=15, u64=15}})
> = -1
> EBADF (Bad file descriptor)
> - 22:55:30 socket(PF_FILE, SOCK_DGRAM, 0) = 15
> - 22:55:30 bind(15, {sa_family=AF_FILE,
> path="/var/log/snort/snort_alert"...}, 110) = 0
> - 22:55:30 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=15, u64=15}})
> = 0
> - 22:55:51 epoll_wait(4, {{EPOLLIN, {u32=16, u64=16}}, {EPOLLIN,
> {u32=15,
> u64=15}}, {EPOLLIN, {u32=14, u64=14}}, {EPOLLIN, {u32=13, u64=13}}},
> 32,
> 1000) = 4
> - 22:55:51 recvfrom(15,   - ... at this point the application hangs
> 
> This lead me to suspect that perhaps the event I am getting on fd 15
> actually belongs to the previous owner of 15
> 
> My first test was to replace "dup(fd)" with "dup2(fd,200+x) ; x++".
> Result: The problem with the UNIX socket disappeared. Instead, there
> was a
> similar problem with a socket which had a 200+ file descriptor; i.e.
> the
> problem has shifted to the duplicated descriptor
> Conclusion: Either epoll or epoll+libevent apparently deliver events to
> the
> wrong file descriptor
> 
> My second test was triggered by another thing I saw. While my code
> deletes
> the event on the duplicated fd BEFORE closing the file descriptor,
> libevent
> actually deletes the descriptor from epoll at a later point. I
> understand
> there is some queuing mechanism involved. I tried "convincing" my code
> to do
> things in the right order, by deleting the event, and then setting a
> timer
> to close the file descriptor 10 miliseconds later.
> Result: The fd was deleted from the epoll set BEFORE it was closed, and
> my
> code seems to work perfectly.
> Conclusion: libevent's delayed delete doesn't combine well with epoll.
> Any
> chance of changing that ? I could think of an API to flush the queue,
> but
> that seems like a solution that involves the libevent user too much in
> the
> implementation.
> 
> 
> My apologies for the long mail.
> I'll be glad to hear your thoughts.
> 
> 
> 
> ***********************************************************************
> To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
> unsubscribe libevent-users    in the body.

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.