[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

RE: [Libevent-users] Read failures on Unix socket



> -----Original Message-----
> From: nick.a.mathewson@xxxxxxxxx [mailto:nick.a.mathewson@xxxxxxxxx] On
> Behalf Of Nick Mathewson
> Sent: Thursday, November 04, 2010 12:20 PM
> To: Gilad Benjamini
> Subject: Re: [Libevent-users] Read failures on Unix socket
> 
> On Thu, Nov 4, 2010 at 2:09 PM, Gilad Benjamini
> <gilad@xxxxxxxxxxxxxxxxx> wrote:
> >> Hm.  Looking at the epoll output in the "nonblocking" case, it
> doesn't
> >> look like Libevent is doing anything weird here: epoll_wait() is
> >> honest-to-goodness saying "Okay to read on fd 15"...
> >>


Looking again at the same output, I noticed something interesting (output
below shows only those strace lines which seem relevant)
- 22:55:21 dup(12)                        = 15
- 22:55:21 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLOUT, {u32=15,
u64=15}}) = 0
- 22:55:21 epoll_wait(4, {{EPOLLOUT, {u32=15, u64=15}}}, 32, 865) = 1
- 22:55:21 epoll_ctl(4, EPOLL_CTL_MOD, 15, {EPOLLIN, {u32=15, u64=15}}) = 0
- 22:55:21 epoll_wait(4, {{EPOLLIN, {u32=15, u64=15}}, {EPOLLIN, {u32=14,
u64=14}}, {EPOLLIN, {u32=13, u64=13}}}, 32, 812) = 3
- 22:55:21 close(15)                      = 0
- 22:55:21 epoll_ctl(4, EPOLL_CTL_DEL, 15, {EPOLLIN, {u32=15, u64=15}}) = -1
EBADF (Bad file descriptor)
- 22:55:30 socket(PF_FILE, SOCK_DGRAM, 0) = 15
- 22:55:30 bind(15, {sa_family=AF_FILE,
path="/var/log/snort/snort_alert"...}, 110) = 0
- 22:55:30 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=15, u64=15}}) = 0
- 22:55:51 epoll_wait(4, {{EPOLLIN, {u32=16, u64=16}}, {EPOLLIN, {u32=15,
u64=15}}, {EPOLLIN, {u32=14, u64=14}}, {EPOLLIN, {u32=13, u64=13}}}, 32,
1000) = 4
- 22:55:51 recvfrom(15,   - ... at this point the application hangs

This lead me to suspect that perhaps the event I am getting on fd 15
actually belongs to the previous owner of 15

My first test was to replace "dup(fd)" with "dup2(fd,200+x) ; x++".
Result: The problem with the UNIX socket disappeared. Instead, there was a
similar problem with a socket which had a 200+ file descriptor; i.e. the
problem has shifted to the duplicated descriptor
Conclusion: Either epoll or epoll+libevent apparently deliver events to the
wrong file descriptor

My second test was triggered by another thing I saw. While my code deletes
the event on the duplicated fd BEFORE closing the file descriptor, libevent
actually deletes the descriptor from epoll at a later point. I understand
there is some queuing mechanism involved. I tried "convincing" my code to do
things in the right order, by deleting the event, and then setting a timer
to close the file descriptor 10 miliseconds later.
Result: The fd was deleted from the epoll set BEFORE it was closed, and my
code seems to work perfectly.
Conclusion: libevent's delayed delete doesn't combine well with epoll. Any
chance of changing that ? I could think of an API to flush the queue, but
that seems like a solution that involves the libevent user too much in the
implementation.


My apologies for the long mail.
I'll be glad to hear your thoughts.



***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.