[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [or-talk] Re: huge pages, was where are the exit nodes gone?

     On Wed, 14 Apr 2010 17:23:35 +0200 Olaf Selke <olaf.selke@xxxxxxxxxxxx>
>Scott Bennett wrote:
>>> It appears memory consumption with the wrapped Linux malloc() is still
>>> larger than than with openbsd-malloc I used before. Hugepages don't
>>> appear to work with openbsd-malloc.
>>      Okay, that looks like a problem, and it probably ought to be passed
>> along to the LINUX developers to look into.
>yes, but I don't suppose this problem being related to hugepages
>wrapper. Linking tor against standard glibc malloc() never worked for me
>in the past. Always had the problem that memory leaked like hell and
>after a few days tor process crashed with an out of memory error.
>Running configure script with --enable-openbsd-malloc flag solved this
>issue but apparently it doesn't work with libhugetlbfs.so.

     Is tor statically linked?  If not, I wonder if it's a library-ordering
problem, where a version of malloc() in libhugetlbfs or in a library called
by a routine in libhugetlbfs gets linked in ahead of the OpenBSD version.
I don't know how much flexibility that LD_PRELOAD method gives you, but
perhaps you could try the -rpath trick we FreeBSD folks had to use to force 
use of openssl from ports rather than the broken one in the base system.
>After 17 hours of operation resident process size is 1 gig.

     How much was it typically using before?
>21716 debian-t  20   0 1943m 1.0g  24m R 79.4 26.9 927:51.27 1 tor
>On the other hand cpu load really seems to be reduced compared with
>standard page size.
     Holy Crapola!  79.4% is a *reduction*?!??  8-Q  What did it use
before?  100%?  1 GB is 512 hugepages.  I wonder if getting the malloc()
issue resolved and lowering the working set size would reduce the CPU
time still further, given that each TLB only holds 64 entries.  (I fail
to see yet why the LINUX developers picked a hugepage size that is not
supported by hardware, at least not for the data and stack segments.)
     A long time back, we tossed an idea around briefly to the effect
that you might get more balanced utilization of your machine by running
two copies of tor in parallel with their throughput capacities limited
to something more than half apiece of the current, single instance's
capacity.  That would bring the other core into play more of the time.
A configuration like that would still place both instances very high
in the list of relays ordered by throughput, but the reduction in the
advertised capacity of each would help to spread the requests to both
better.  They would still be limited by the TCP/IP's design limit on
port numbers for the system as a whole, which you would likely never
see because the kernel would probably just refuse connections when all
port numbers were already in use, but would probably allow you to
squeeze more total tor throughput through the machine than you get at
present, while still leaving a moderate amount of idle time on each
core that would then be available for other processing.  Have you given
any more thought to this idea over the ensuing months?

                                  Scott Bennett, Comm. ASMELG, CFIAG
* Internet:       bennett at cs.niu.edu                              *
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxxx with
unsubscribe or-talk    in the body. http://archives.seul.org/or/talk/