[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [Libevent-users] Signals and priority queues
On Jan 13, 2012, at 7:29 AM, Nick Mathewson wrote:
> On Fri, Jan 13, 2012 at 7:47 AM, Ralph Castain <rhc@xxxxxxxxxxxx> wrote:
>> I've been digging further into this, and I believe I have much of it resolved now. However, I have encountered a problem that appears to be something in libevent itself.
>>
>> I configured libevent with debug enabled, and turned it on at execution - and was barraged by:
>>
>> [warn] select: Invalid argument
>>
>> Digging further into the reason, I found that the message comes from the following code in select_dispatch (file select.c):
>
> Weird that you're using select.c; nearly any other backend would be faster.
It's on a Mac, so select is the option and speed isn't really an issue. We forcibly configure it there for OMPI purposes. :-/
* Default to select() on OS X and poll() everywhere else because
* various parts of OMPI / ORTE use libevent with pty's. pty's
* *only* work with select on OS X (tested on Tiger and Leopard);
* we *know* that both select and poll works with pty's everywhere
* else we care about (other mechansisms such as epoll *may* work
* with pty's -- we have not tested comprehensively with newer
* versions of Linux, etc.). So the safe thing to do is:
*
* - On OS X, default to using "select" only
* - Everywhere else, default to using "poll" only (because poll
* is more scalable than select)
>
>>
>> res = select(nfds, sop->event_readset_out,
>> sop->event_writeset_out, NULL, tv);
>>
>> EVBASE_ACQUIRE_LOCK(base, th_base_lock);
>>
>> check_selectop(sop);
>>
>> if (res == -1) {
>> if (errno != EINTR) {
>> event_warn("select");
>> return (-1);
>> }
>>
>> return (0);
>> }
>>
>> The timeout value being supplied to select_dispatch is being corrupted after the first time thru the routine - it comes into the routine the first time as {0, 0}, but is an illegal value thereafter. Resetting the timeout to the original value resolves the problem.
>
> What kind of illegal value are you seeing,
1326467251, 774650
> coming from where?
I'm not sure who calls "select_dispatch" - the value is passed into it.
> Are you
> using the common_timeout code?
This is just flowing thru from a call to event_loop - I'm not sure of the progression that takes us down to select_dispatch.
> What are you doing to "reset the
> timeout" ?
Just hacked things to save the value from the first call into the function, then replace it if there is a problem:
static struct timeval rhctv;
static int rhcfirst=1;
static int rhccnt=0;
static int rhcretry=0;
static int
select_dispatch(struct event_base *base, struct timeval *tv)
{
int res=0, i, j, nfds;
struct selectop *sop = base->evbase;
if (1 == rhcfirst) {
fprintf(stderr, "ORIGINAL TV %d sec %d usec\n", (int)tv->tv_sec, (int)tv->tv_usec);
rhctv.tv_sec = tv->tv_sec;
rhctv.tv_usec = tv->tv_usec;
rhcfirst = 0;
}
rhccnt++;
rhcretry = 0;
check_selectop(sop);
if (sop->resize_out_sets) {
fd_set *readset_out=NULL, *writeset_out=NULL;
size_t sz = sop->event_fdsz;
if (!(readset_out = mm_realloc(sop->event_readset_out, sz)))
return (-1);
sop->event_readset_out = readset_out;
if (!(writeset_out = mm_realloc(sop->event_writeset_out, sz))) {
/* We don't free readset_out here, since it was
* already successfully reallocated. The next time
* we call select_dispatch, the realloc will be a
* no-op. */
return (-1);
}
sop->event_writeset_out = writeset_out;
sop->resize_out_sets = 0;
}
memcpy(sop->event_readset_out, sop->event_readset_in,
sop->event_fdsz);
memcpy(sop->event_writeset_out, sop->event_writeset_in,
sop->event_fdsz);
nfds = sop->event_fds+1;
retry:
EVBASE_RELEASE_LOCK(base, th_base_lock);
res = select(nfds, sop->event_readset_out,
sop->event_writeset_out, NULL, tv);
EVBASE_ACQUIRE_LOCK(base, th_base_lock);
check_selectop(sop);
if (res == -1) {
if (errno != EINTR) {
event_warn("select");
fprintf(stderr, "TV OUT OF SPEC AT CNT %d: value %d:%d\n", rhccnt, tv->tv_sec, tv->tv_usec);
tv->tv_sec = rhctv.tv_sec;
tv->tv_usec = rhctv.tv_usec;
if (0 == rhcretry) {
rhcretry = 1;
goto retry;
} else {
exit(0);
}
return (-1);
}
return (0);
}
...
Retrying select with the corrected value always succeeds. It's clearly being overwritten somewhere, but I don't know enough of libevent's internal call sequence to figure out where/why. Note that this comes after loops through that event create/activate sequence we were discussing. I'm going to try and see if a minimal reproducer can be created based on that code.
>
> --
> Nick
> ***********************************************************************
> To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
> unsubscribe libevent-users in the body.
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users in the body.