[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] Asynchronous writes in the event loop model



On Sun, Jan 22, 2012 at 9:44 AM, Frank Schoep <frank@xxxxxxx> wrote:
> On 19 jan 2012, at 21:53, Nick Mathewson wrote:
>> …
>> The usual way is with events that you manually activate.  You'll need
>> to use libevent 2 for threading support, and call one of the
>> appropriate functions at startup time to turn on threading support.
>
> I've been thinking about this over the past days and I wondered – how do POSIX signal callbacks behave under multithreaded conditions?

Sadly, this whole business is a huge pile of gunk.

The interaction between signals and posix threads (without even
bringing Libevent into the picture) is really, really stupid, mainly
because the original pthreads spec is designed to allow pure-userspace
implementations where all threads share one process, as well as
pure-userspace implementations where each thread _is_ a process.
Basically, under pthreads, any signal sent to the _process_ is
received by an *arbitrary* thread. It could be the same thread every
time; it could be a randomly chosen thread.

(BTW, some details above are surely wrong: sometimes I feel like there
is an evil conspiracy that changes signal semantics around behind my
back whenever I'm not looking... but really, they're just hard.)

> Could I add a signal callback to the libevent loop running on one thread and have another thread raise that signal, invoking the callback on the other thread? Do I still need to use threading support in that case, or will the raised signal always properly register in a 'vanilla' libevent loop? Has anyone ever tried this?

This wouldn't actually be any faster than the event_active() approach.
 In most cases[*], Libevent handles signals by installing a signal
handler with signal().  The handler uses a socketpair to communicate
with the event base, so no matter what thread it gets run in, the
event_base finds out.

In comparison, when using event_active() to tell the event_base to do
something, you're using the (more optimized)[***] "notification"
pathway, which doesn't need signals at all, and sometimes doesn't even
need to hit kernelspace.

So it's worth benchmarking, but I'd be surprised if you got a big
speed improvement there.

[*] kqueue is an exception, since kqueue can handle signals on its
own.  I'd like to be using signalfd on Linux too, but there are
technical challenges there.  See the sourceforge ticket [**] for more
info.

[**] https://sourceforge.net/tracker/?func=detail&atid=461324&aid=3436038&group_id=50884

[***] The internal "evthread_notify_base()" function is used to tell
an event_base that it should wake up from its loop (if it's in one)
and handle changes to the list of active or pending events made by
another thread.  The code uses signalfd where available, pipes when
signalfd is missing (Hm, signals should do that), and socketpair as a
final resort.  It also takes pains to avoid redundant wakeups: if the
base is already going to wake up for some other reason, it doesn't
poke the socketpair/pipe/signalfd again.

> My (naive) assumption is that, since signals work at the process level, any of its pthreads will potentially become aware of a signal shortly after it is raised, because the process-to-pthread(s) mapping would act as an improvised mutex at the OS/scheduler level.
>
> Does that assumption make any sense, is it in use already by existing applications? Do Windows threads, using _beginthreadex, behave differently (like almost everything on Windows) compared to UNIX-like systems? Should I file for a software patent (j/k) or throw this idea in the trash bin?
>
> I'm sorry if I seem to be overengineering my application's design at the moment, but I really want to try to keep all inner loops lock free to maximize throughput. Although pthread mutex locking is fairly fast, skipping it altogether would be even faster, I think.

My understanding is that with a good implementation, pthread mutex
locking is blazingly fast *on the uncontested case*.  So what you
ought to be worrying about is not "how often do I lock/unlock", but
rather "Is the lock contested"?

So instead, maybe the right approach is to make sure that you aren't
actually hitting lock contention here, and only optimize more if so.

Of course, as with all other performance discussions, testing is king.
 The "queue work for the main event_base thread" pattern is pretty
common in multithreaded libevent programming.  If we can come up with
some reasonable benchmarks for that, I'd like to try to find good ways
to optimize for it.

cheers,
-- 
Nick
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.