[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: hardware acceleration available for Tor ? On FreeBSD ?

     On Mon, 12 Oct 2009 16:58:37 -0400 Wyllys Ingersoll
<Wyllys.Ingersoll@xxxxxxx> wrote:
>Scott Bennett wrote:
>>> One caveat with the BCM5821 or the Sun Crypto 1000 is that not all of them 
>>> support AES - I can't tell for sure, but it looks like AES support was 
>>> added after the fact, and it depends on firmware version.  The BCM5825 is 
>>> a safer bet if you're buying off of ebay, etc., but is more expensive.
>>> Everything I have mentioned here appears to be pci64/pci-X, rather than 
>>> pci-e.
>>>> Any comments on the effectiveness of these parts, and the likelihood that 
>>>> they will actually allow a greater network throughput on the same underlying 
>>>> cpu(s) and memory, is appreciated.
>>>> I have been under the impression that memory is more of a limiting factor 
>>>> than cpu - with some estimates being 750-ish megabytes of ram per 10mbits/s. 
>>>> I am unsure whether hardware crypto acceleration will decrease this memory 
>>>> load, or simply decrease cpu load.
>>> I got these loose numbers off of an archived list discussion, but it 
>>> appears to be false.  Again from irc:
>>> "tor is actually cpu-bound rather than ram-bound on the fast relays i 
>>> think you should be able to push 10MB/s in 1G of ram"
>>> So crypto-acceleration appears to be useful.
>>      The symmetric-key processing is very fast and takes up little CPU time.
>> The apparent hangup on the high-rate relays is the asymmetric-key processing
>> (i.e., onion-skin encrypting/decrypting).  FWIW, when I was running a relay,
>> it could be running at rates over 300 KB/s while using less than 1% of the
>> CPU when it was simply passing cells back and forth among the various
>> connections.  When new onion skins came in to be decrypted was when tor would
>> suddenly use much more CPU time for a moment or two.
>>> Unanswered questions:
>>> - how painful is actual integration?  Just because the driver is there and 
>>> those options are available in Tor doesn't mean it will be a snap.  Word 
>>> on the street is that "coderman" has actually done this ... comments ?
>>> - Is the BCM5825 the most powerful solution that can be easily made to 
>>> work on FreeBSD ?  The soekris cards are much less powerful, the SafeNet 
>>> 1741 has a lower throughput and the 1742/1746 parts are not listed on the 
>>> FreeBSD HCL.  Not sure where the Sun Crypto 6000 lies on this continuum, 
>>> but it appears to NOT be a broadcom based card.
>>> - Is anyone _actually_ testing Tor, and more specifically, hardware crypto 
>>> acceleration of Tor, in high speed (gigabit) test environments ?
>I did testing with the Niagara 2 chips on some Sun systems running Solaris and got good results.
>The critical operation is not necessarily the SSL, but rather the AES CTR mode algorithms.
>I did not test this on a gigabit test network though.
>The problem I discovered was that just getting accelerated AES from hardware
>was not giving much improvement if the CTR mode operations had to be done
>in software.  The N2 supports AES CTR in hardware so you can pass
>the entire buffer to be encrypted at once instead of doing 16 bytes at a time
>and updating the counters in software.  
>As far as I could tell, OpenSSL had no support for getting AES CTR from
>hardware (0.9.8k), at least not without some heavy mods to the engine.
>I blogged about it here:
>Also posted a short paper describing the analysis (using DTrace) here:
     Wow!  This is most interesting because it runs counter to what we have
been given to understand.  Usually, it is the asymmetric-key encryption and
decryption that are the costly operations, while the symmetric-key operations
are normally very fast.  tor's NumCPUs parameter in the torrc file allows
specification of the number of threads (defaulting to 1) to be used for
dishing out onion-skin decryption operations to processors, while (so I
understand) there is as yet no way to do get OpenSSL to do the same for the
symmetric-key operations involved in the relaying of tor data cells.  The
asymmetric-key operations occur during onion-skin processing and during the
handshaking process for establishing an SSL connection, and these are the
ones that cause the spikes in CPU utilization of tor relays that do not
have "HardwareAccel 1" in there torrc files or invocation arguments.
     I hope that someone on the tor development team will step in at this
point to clarify what is going on here.  Thanks much for posting your results!

                                  Scott Bennett, Comm. ASMELG, CFIAG
* Internet:       bennett at cs.niu.edu                              *
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxxx with
unsubscribe or-talk    in the body. http://archives.seul.org/or/talk/