On Mon, Feb 26, 2007 at 05:05:23PM -0800, Adam Langley wrote: > On 2/26/07, Nick Mathewson <nickm@xxxxxxxxxxxxx> wrote: > >METHODOLOGY: I wrote a stupid benchmark function in aes.c to encrypt a > >million cell-sized chunks using our aes_crypt function, and timed it > >with the unix "time" command. I did this twice for each > >(computer,code) pair, I took the median of three runs. > > You have to be very careful of cache issues with micro-benchmarks like > that. I'm think that you're ok because the cache profile of an AES > function is probably pretty much fixed (it walks the input and the > output and the tables are of fixed size I'm guessing). But if the > faster impl uses different sized tables etc (or more code, looking at > FULL_UNROLL) you might find that, when running with the rest of the > Tor code, the results are rather different. Right; I'm pretty confident of the 40% improvement from switching to OpenSSL's assembly implementation where available, but less confident of other improvements. A couple more developments on this front, BTW: * I tried OpenSSL 0.9.8e on an x86_64 machine, and found out that either the i586 assembly code isn't used on x86_64, or it is used but offers no speed benefit over 0.9.7f. * It looks like OpenSSL 0.9.9 (or whatever they're calling the next one) will probably add assembly implementations for ARM, x86_64, and sparc. Neat! * We suffer a bit for having our AES_CTR implementation have to work on unaligned data. I did an experiment using 508-byte cell payloads instead of 509-byte cell payloads, and xoring uint32_ts rather than chars: it knocked about 10% off my benchmark. This is probably something to look at when we redesign the cell format. peace, -- Nick Mathewson
Attachment:
pgpCoPDFciZhh.pgp
Description: PGP signature