ThanksI tried to open/close the UDP socket. It helps. I suppose that this is a Linux UDP networking problem.
Socket flag SO_BSDCOMPAT does not help.Reopening the socket is not very convenient, of course... but as this is an exceptional situation (file descriptors overload) then it may be OK.
OlegOn Fri, Jul 19, 2013 at 8:25 PM, Oleg Moskalenko <mom040267@xxxxxxxxx> wrote:
ThanksDoes it make any sense ? I am trying to figure out how it can be fixed at all.I do not see anything like that in non-Linux *NIXes.I tried to change it to recvmsg. No changes... This is what I see from strace:The socket is actually "alive": it would accept messages if sent to.OK, I checked that...The error returned by SO_ERROR is always 0.
..................................
[pid 24100] recvmsg(8, 0x7fff5e4803a0, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 24100] epoll_wait(4, {{EPOLLERR, {u32=8, u64=8}}}, 32, 314) = 1
[pid 24100] clock_gettime(CLOCK_MONOTONIC, {254928, 717051324}) = 0
[pid 24100] gettimeofday({1374289798, 822877}, NULL) = 0
[pid 24100] recvmsg(8, 0x7fff5e4803a0, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 24100] epoll_wait(4, {{EPOLLERR, {u32=8, u64=8}}}, 32, 313) = 1
[pid 24100] clock_gettime(CLOCK_MONOTONIC, {254928, 718103692}) = 0
[pid 24100] gettimeofday({1374289798, 823914}, NULL) = 0
....................................
Basically, the socket goes into a "gray" state - non-dead and non-totally-alive.I wonder if I see the results of the "new" UDP Linux weird behavior (RFC 1122) that many are complaining about, for example:
http://web.mit.edu/Ghudson/info/linux.icmp
OlegOn Fri, Jul 19, 2013 at 8:28 AM, Nick Mathewson <nickm@xxxxxxxxxxxxx> wrote:
Hm. So, epoll_wait is reporting EPOLLERR on fd 8. The LibeventOn Fri, Jul 19, 2013 at 9:31 AM, Oleg Moskalenko <mom040267@xxxxxxxxx> wrote:
> Thank you Azat for the suggestion. It seems to me that UDP sockets are
> offenders, somehow it happens only in Linux (I know Linux has some weird UDP
> behavior):
>
> Process 20828 attached with 5 threads - interrupt to quit
> [pid 20831] clock_gettime(CLOCK_MONOTONIC, <unfinished ...>
> [pid 20832] clock_gettime(CLOCK_MONOTONIC, <unfinished ...>
> [pid 20831] <... clock_gettime resumed> {205614, 271115090}) = 0
> [pid 20831] gettimeofday( <unfinished ...>
> [pid 20832] <... clock_gettime resumed> {205614, 271926086}) = 0
> [pid 20831] <... gettimeofday resumed> {1374240484, 377784}, NULL) = 0
> [pid 20832] gettimeofday( <unfinished ...>
> [pid 20831] epoll_wait(20, <unfinished ...>
> [pid 20829] clock_gettime(CLOCK_MONOTONIC, <unfinished ...>
> [pid 20830] clock_gettime(CLOCK_MONOTONIC, <unfinished ...>
> [pid 20832] <... gettimeofday resumed> {1374240484, 378418}, NULL) = 0
> [pid 20832] epoll_wait(16, <unfinished ...>
> [pid 20830] <... clock_gettime resumed> {205614, 273231001}) = 0
> [pid 20829] <... clock_gettime resumed> {205614, 272801617}) = 0
> [pid 20829] gettimeofday( <unfinished ...>
> [pid 20830] gettimeofday( <unfinished ...>
> [pid 20829] <... gettimeofday resumed> {1374240484, 379094}, NULL) = 0
> [pid 20829] epoll_wait(28, <unfinished ...>
> [pid 20830] <... gettimeofday resumed> {1374240484, 379317}, NULL) = 0
> [pid 20830] epoll_wait(24, <unfinished ...>
> [pid 20828] recvfrom(8, 0x7fff61df20c0, 4, 2, 0xa9bc20, 0x7fff61df20bc) = -1
> EAGAIN (Resource temporarily unavailable)
> [pid 20828] epoll_wait(4, {{EPOLLERR, {u32=8, u64=8}}}, 32, 19) = 1
> [pid 20828] clock_gettime(CLOCK_MONOTONIC, {205614, 277088474}) = 0
> [pid 20828] gettimeofday({1374240484, 386338}, NULL) = 0
> [pid 20828] recvfrom(8, 0x7fff61df20c0, 4, 2, 0xa9bc20, 0x7fff61df20bc) = -1
> EAGAIN (Resource temporarily unavailable)
> [pid 20828] epoll_wait(4, {{EPOLLERR, {u32=8, u64=8}}}, 32, 12) = 1
> [pid 20828] clock_gettime(CLOCK_MONOTONIC, {205614, 286419826}) = 0
> [pid 20828] gettimeofday({1374240484, 392232}, NULL) = 0
> [pid 20828] recvfrom(8, 0x7fff61df20c0, 4, 2, 0xa9bc20, 0x7fff61df20bc) = -1
epoll.c code treats EPOLLERR as (EV_READ|EV_WRITE). But when you
recvfrom on the socket, it only says EAGAIN.
So your program sensibly decides to keep listening for events on fd 8,
and epoll keeps telling you that there was an error.
Assuming that this recvfrom is in your code, I'll echo Vsevolod's
question: what happens when you call getsockopt(...SO_ERROR...) on
the socket in the event handler that calls the recvfrom, to see what
the queued error is?
--
Nick
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users in the body.