[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
RAM improvements in Tor 0.2.0.x and OpenSSL 0.9.9
Hi, folks. Roger asked me to type up a description of what we've been
doing recently to reign in memory usage. I figured that some of
or-dev might be interested too, so here goes:
--------------------
There have been a bunch of RAM improvements to Tor over the last
months. Two of them are in 0.2.0.x; some others are in; and one big
one is in the current OpenSSL development series, which will
eventually become OpenSSL 0.9.9.
- We ship an improved allocator for Linux.
Some of Tor's memory woes on Linux are due to bad performance by the
memory allocator in GNU Libc in dealing with memory fragmentation.
This is a problem, since the fragmentation can lead Tor processes to
grow over time without bound, and since many of Tor's
best-performing servers are on Linux (which uses GNU Libc), and
making them run out of memory annoys their operators.
Tor now includes the so-called "opensbsd malloc" allocator for use
on Linux-like systems. This is included by default in the Linux
packages we ship. Linux hosts using this allocator no longer seem
to grow in memory size without bound.
If the issue persists on other platforms, we may follow Firefox 3's
lead and include jemalloc as our allocator with 0.2.1.x
- Tor's buffer implementation is greatly improved.
Much of Tor's allocated memory is used in I/O buffers.
Previously, each of Tor's buffers was a large block of memory,
potentially 4 times the size of the data it needed to store. (It's
necessary to allow some slack in the buffer size, since expanding
buffers to hold more information is expensive.) For large amounts
of data, this could be very wasteful.
As of 0.2.0.x, Tor ships with a new buffer implementation based on
the classic Unix kernel's mbuf strategy: now, each buffer is a
linked list of small chunks. The overhead on each buffer is now
on the order of 1-4k, no matter how large the buffer size is.
- We wrote an OpenSSL patch to improve OpenSSL's memory performance.
Ordinarily, OpenSSL keeps a read buffer and a write buffer for each
connection. Together, these buffers come to 34k per connection. On
a busy Tor server, this can add to hundreds of megabytes. We
investigated the usage patterns of these buffers, and found that
most buffered data in OpenSSL tend to be very short lived: at any
given time, on the servers we measured, well over 90% of the OpenSSL
buffers were completely empty. This amounts to hundreds of wasted
MB on a busy server.
We wrote a patch for OpenSSL to release not-in-use buffer memory
onto a freelist, and avoid allocating more memory than is necessary.
After 18 revisions, the patch was accepted by the OpenSSL
development team, and will be included in OpenSSL 0.9.9.
Preliminary testing shows the patch to indeed save about 34k per
connection; even on a small, short-lived server this amounts to over
30% savings in memory usage. On high-volume nodes, it seems to be
significantly higher.
Since OpenSSL 0.9.9 has not yet been released, we're not going to
recommend that average users run development snapshots of it.
Instead, once the patch has seen more testing, if OpenSSL 0.9.9
still isn't out, we may backport our patch to the 0.9.8 series of
OpenSSL, and encourage either the OpenSSL dev team to apply it, or
encourage server operators to apply it on their own.
Serindipitously, it seems that using this feature do not suffer much
(or maybe at all) from the memory fragmentation problem described
above, even when they're using the GNU Libc allocator. We're trying
to investigate why this is, or whether it's some kind of
inexplicable measurement error.
--
Nick