On Sun, 29 Aug 2010 19:06:13 +0200 BjÃrn Scheuermann <scheuermann@xxxxxxxxxxxxxxxxxxxxx> wrote: > Variant #2a would be to craft a data structure in such a way that we can > be absolutely sure that it is no longer sensitive. In that case, the > operators could publish the data structure. I admit that this is most > likely impossible (even though I believe that sketches with additional > "noise" bits, as suggested in my second mail, come at least pretty > close). > > Variant #2b is to craft the data structure in such a way that we are > *reasonably* sure that the data are no longer sensitive. In that case, > the data structures should be encrypted by the operators so that only > Karsten Loesing can decrypt them, and all traces of them should be > eliminated once Karsten has finished the merging and evaluation process, > just as you suggest. I'm writing up a cryptosystem that will help with this right now; it allows untrusted parties to merge encrypted Bloom-filter-like objects, but one trusted party will be required to decrypt the merged encrypted blob and compute the estimated number of distinct IP addresses. The kind of homomorphic encryption we need is feasible, with about 160 ciphertext bits per plaintext-Bloom-filter bit. > There is one potential issue/limitation, though, in what you suggest. If > sketches are collected over N-minute intervals, and are conveyed to > Karsten without any indication of timing and order, then all Karsten can > do is to calculate the total number of distinct user IPs over the whole > period during which all these sketches have been generated. If this is > what is desired, it will be important a) that all operators collect > information during the same time interval, and only during that time > interval, and b) that this time interval is not too long, otherwise we > will end up with the number of distinct IP addresses during a whole week > or so, which might not be very helpful (due to dynamic IPs etc.). > Furthermore, if all sketches from this time period are combined by > Karsten anyway, then we might consider to let each operator OR all his > sketches locally and to send only the result to Karsten, so as to reveal > even less information to him. You're right about the limitation; with my approach in the message you're replying to, it would be best to include a sanitized time in the encrypted blob. With what I'm working on, we're stuck with (a) in your paragraph here (but *everyone* can participate in collecting data). Robert Ransom
Attachment:
signature.asc
Description: PGP signature