On Sun, 29 Aug 2010 13:17:35 +0200 BjÃrn Scheuermann <scheuermann@xxxxxxxxxxxxxxxxxxxxx> wrote: > Steven J. Murdoch wrote: > > Another thing to do is to shrink the hash size. If we assume that > > there are going to be, say, no more than 1 million distinct IP > > addresses, we could use a 40 bit hash with only a small number of > > collisions (due to the Birthday Paradox). However, someone who tries > > to reverse the full 32 bit IPv4 space will get many collisions. > > with a 40 bit hash, the vast majority of 2^32 IP addresses will not > collide with any other IP address, despite the birthday paradox. So this > would not really be a significant additional level of protection. It > would make a difference only for the small fraction of the IP addresses > that actually do collide. > > However, you don't need that many bits anyway, as collisions are not a > big issue when your goal is to only count how many different addresses > you have seen. If you proceed as you suggested, you can, as detailed > below, do well with a 21-bit hash. With additional tweaks, you can > further reduce that to 12-bit hash values. And then it's more efficient to use FM sketches rather than a list of hash values. The *only* problem I had with your first message in this thread was your suggestion that the FM sketches could be exchanged between the operators; the FM sketches may still be quite sensitive (only $AGENCY knows exactly how sensitive they are; we really can't tell). Other than the fact that the FM sketches must remain secret, your approach is also safer than Dr. Murdoch's idea. There is a slight improvement, though: * Each instrumented Tor instance should generate an FM sketch in memory as it is contacted by directory clients. This loses most of the sensitive information instantly. * Every N minutes, the Tor instance should encrypt its current FM sketch with Dr. Loesing's public key for this purpose, base64-encode it, and write it to the log on a single line. As long as the encryption algorithm is randomized, this packages up the rest of the sensitive information so only one trusted person can recover it. The Tor instance then erases its FM sketch and starts accumulating a new one. * The Tor operators use one or more invocations of grep to extract *only* the encrypted FM sketches from the log. Sort the resulting list, and delete the original log files. This loses any potentially sensitive timing information, so that Dr. Loesing can't recover it. Each operator should then merge the encrypted-sketch lists produced by all of his/her/its Tor relays into one sorted list before sending it in for analysis. * Dr. Loesing decrypts all of the FM sketches, ORs them together, and publishes only the resulting count. Then he erases the decrypted FM sketches and sets his private key on fire. Robert Ransom
Attachment:
signature.asc
Description: PGP signature