Nick Mathewson wrote: > On Fri, Feb 16, 2007 at 05:35:50PM -0800, Christopher Layne wrote: >> On Fri, Feb 16, 2007 at 02:00:00PM -0800, Christopher Layne wrote: >>> Thought you guys might find this interesting. I did a couple of callgrind >>> runs on 2 different tor builds, 1 using -Os and the other using -O3. The >> So did a bit more research on spec'ing which cost models are default in >> callgrind and now have it logging jumps, asm instructions, and l1/l2/dram >> performance counters in the simulator. If anyone is interested on the >> machine specifically it's a 2.1 ghz Celeron-D (Prescott) running under >> Linux 2.6.20. I've rebuilt openssl, libz, and libevent with cranked up >> optimization/debug on, so more interesting things to look at. > > Hi, Chris! This is pretty neat stuff! If you can do more of this, it > could help the development team know how to improve speed. > > (Sorry about the delay in answering; compiling kcachegrind took me way > longer than it should have.) > > A few questions. > > 1. What version of Tor is this? Performance data on 0.1.2.7-alpha > or on svn trunk would help a lot more than data for 0.1.1.x, > which I think this is. (I think this is the 0.1.1.x series > because all the compression seems to be happening in > tor_gzip_compress, whereas 0.1.2.x does compression > incrementally in tor_zlib_process.) There's already a lot of > performance improvements (I think) in 0.1.2.7-alpha, but there > might be possible regressions too, and I'd like to catch them > before we release... whereas it is not likely that we'll do > anything besides security and stability to 0.1.1.x, since it's > supposed to be a stable series. > > 2. How is this server configured? A complete torrc would help. > > 3. To what extent does -O3 help over -O2? Most users seem to > compile with -O2, so we should probably change our flags if the > difference is nontrivial. > > 4. Supposedly, KCachegrind can also visualize oprofile output. If > this is true, and you could get it working, it might give more > accurate information as to actual timing patterns, with fewer > Heisenberg effects. (Even raw oprofile output > would help, actually.) > > Now, some notes on the actual data. Again, I'm guessing this is for > Tor 0.1.1.x, so some of the results could be quite different for the > development series, especially if we fixed some stuff (which I think > we did) and especially if we introduced some stupid stuff (which > happens more than I'd like). > > * It looks like most of our time is being spent, as an OR and > directory server, in compression, AES, and RSA. To improve > speed, our options are basically "make it faster" or "do it > less" for each of these. > > * AES isn't going to get used much less: A relay server still > needs to AES-ctr-crypt each cell it gets three times: once for > TLS for link secrecy on the inbound link, once with a circuit > key for long-range secrecy, and once for TLS for link security > on the outbound link. This explains the pretty even breakdown > between rijndaelEncrypt, _X86_AES_decrypt, and _X86_AES_encrypt > in the results. (If you're not following me, read the design > paper, or just trust me. ;) ) > > [We could _maybe_ save the middle > encryption in some cases by a trick similar to what we use for > CREATE_FAST cells, but it would only get rid of 1/8 of the AES > done by servers in toto, thus reducing the average server's A] > > * Making AES faster would be pretty neat; the right way to go > about it is probably to look hard at how OpenSSL is doing it, > and see whether it can't be improved. Then again, the OpenSSL > team is pretty clever, and it's not likely that there is a lot > of low-hanging fruit to exploit here. > > * So here's how RSA is getting used on my server right now: > > 0 directory objects signed, > 1643 directory objects verified, > 8 routerdescs signed, > 20554 routerdescs verified, > 38 onionskins encrypted, > 37631 onionskins decrypted, > 35148 client-side TLS handshakes, > 29866 server-side TLS handshakes, > 0 rendezvous client operations, > 70 rendezvous middle operations, > 0 rendezvous server operations. > > So it looks like verifying routers, decrypting onionskins, and > doing TLS handshakes are the big offenders for RSA. We've > already cut down onionskin decryption as much as we can except > by having clients build circuits less often. To cut down on > routerdesc verification, we need to have routers upload their > descriptors and have authorities replace descriptors less often, > and there's already a lot of work in that direction, but I don't > know if I've seen any numbers recently. We could cut down on > TLS handshakes by using sessions, but that could hurt forward > secrecy badly if we did it in a naive way. (We could be smarter > and use sessions with a very short expiration window, but it's > not clear whether that would actually help: somebody would need > to find out how frequent TLS disconnect/reconnects are in > comparison to ). We also could eliminate the indirection in the TLS handshakes. Currently the OR's make a temporary cert which they sign with a long-term one. Verifying this is a pain, but OR's don't notice. We could also use a more efficient algorithm then we do now for the authentication of the client to the OP. > > * Making RSA faster could also be fun for somebody. The core > multiplication functions in openssl (bn_mul_add_words and > bn_sq_comba8) are already in assembly, but it's conceivable that > somebody could squeeze a little more out of them, especially on > newer platforms. (Again, though, this is an area that smart > people have already spent a lot of time in.) > > * Finally, compression. Zlib is pretty tunable in how it makes > the CPU/compression tradeoff, so it wouldn't be so hard to > fine-tune the compression algorithm more thoroughly. Every > admin I've asked, though, has said that they'd rather spend CPU > to save bandwidth than vice versa. Another way to do less > compression would be to make directory objects smaller and have > them get fetched less often: there are some design proposals to > do that in the next series, and I hope that people help beat > them into some semblance of workability. > > Again, many thanks for this information; I hope we'll see more like it > in the future! > > peace,
Attachment:
signature.asc
Description: OpenPGP digital signature