[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-dev] gettimeofday() Syscall Issues



On 01/01/2015 03:47 PM, Nick Mathewson wrote:
> What part of tor is accounting for all these time/gettimeofday calls?

I set up an OpenBSD 5.6 VM and built Tor 0.2.5.10 on it. I couldn't get
chutney working because of a number of strange errors that seemed
platform-specific, so I went for a less thorough solution. I can't run a
Tor relay from home, so the only activity the Tor process had was some
Firefox traffic I created.

There are surprisingly few syscall analysis tools available on OpenBSD,
and I'm new to syscall/interrupt tuning, so I eventually ended up
selecting the advanced "printf()" technique.

As a quick refresher to those not familiar, there is a linear heirarchy
of time functions in the Tor code, each calling the one below it:

gettimeofday() < tor_gettimeofday() < tor_gettimeofday_cached() <
tor_gettimeofday_cached_monotonic()

The first two will always make the syscall, while the other two will
first try the cache. I applied print statements to instances of all four.

There was a steady stream of tor_gettimeofday() calls from
circuit_expire_building() - about two per second with a single idle TCP
connection.

When there was active traffic flow, there were many more gettimeofday()
calls that were almost exclusively the product of
tor_gettimeofday_cached_monotonic(). This function is called in three
places:

* once in src/or/relay.c
* once in src/or/buffers.c
* once in src/or/circuitlist.c

The first two account for the bulk of the calls, as they are in the core
data relaying logic.

Ultimately, the problem seems to be that the caching is very weak. At
most, only half of the calls to tor_gettimeofday_cached_monotonic() use
the cache. It appears in the vomiting print statements that loading a
single simple HTML page (http://www.openbsd.org/faq/ports/guide.html to
be exact) will cause >30 gettimeofday() syscalls. You can imagine how
that would accumulate for an exit carrying 800 KB/s if the caching
doesn't improve much with additional circuits.

If I understand correctly, this implies either a bug in the caching
logic or the simple fact that libevent calls back and clears the cache
too much for the caching to be very useful.

The cache is only cleared in two places (both pretty self-explanatory,
and both triggered by libevent callbacks):

* connection_handle_read()
* connection_handle_write()

Apparently, the above functions are called >30 times when fetching a
JS-less non-HTTPS HTML page. I don't know enough about TCP to comment on
whether this makes sense.

Sorry for the magnum opus. I thought I'd try to clarify the context for
those interested, and typing this helped organize my thoughts.

Let me know if I've made any mistakes thus far. I'll start looking into
potential improvements. I'm eager to work with someone on this if
anyone's interested. Also, if you have any ideas or suggestions, please
share.

Thanks for reading!

Libertas

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev