teor wrote: > >> On 3 Nov. 2016, at 10:37, s7r <s7r@xxxxxxxxxx> wrote: >> >> I am very happy with the torspec patch. >> >> Not quoting entirely, only want to add something wrt randomizing the >> value for fake clients based on David's and teor's comments: >> >> David Goulet wrote: >> [SNIP] >>> >>> - I think "superencrypted" -> "super-encrypted" would be nicer as everything >>> in the descriptor as that separation of word. Or even "client-encrypted" if >>> we want to add extra semantic. No strong opinion apart from the "-" :). >>> >> >> I prefer super-encrypted vs. client-encrypted. >> >>> - [XXX consider randomization of the value 16] >>> >>> If it's fixed, we basically create bucket so a client can know that there >>> are 0-16 clients or 16-32 clients and so on. >>> >>> If we randomize that value and let's say it's 7 then we have bucket of 7. If >>> that value is randomized _every_ new descriptor, we create multiple size of >>> buckets but over time someone could deduce (maybe) the low bound of clients >>> by observing all random values and thus assume there are 0-<low bound>. >>> >>> I'm uncertain here what's best but seems that in any case, bucketing is >>> happening as we pad with fake "auth-client". So I would assume here, out of >>> my head to be safe, that we might want _all_ services to kind of look the >>> same thus a fixed value would make sense following that train of thought. >>> >>> I'm liking the rest here! We'll have to think also on some padding in the >>> INTRODUCE1 cell to avoid leaking client auth is being used. >>> >> >> This is true, we create buckets no matter what, but I think it's better >> if one has to watch a hidden service for a lot more time to determine >> the probable number rather than being able to tell from the first >> descriptor that there are 0-16 clients, 16-32 clients and so on. >> >> I fully agree that randomizing _every_ new descriptor does not help and >> probably in short time someone could deduce a possible number, but I am >> slightly uncomfortable with a global fixed value for this. One more >> idea, if it's not helpful we can just go ahead with a fixed value of 16. >> >> I think it's better if we pick a random number between 8 and 32 fake >> clients and remember the picked value so it will be used for every new >> descriptor until something in our setup changes or enough time has >> passed. In order to know when to reset it, we save it (in our state) >> along with: >> 1. The number of real authorized clients when the random value was picked. >> 2. Timestamp when the random value was picked + an end of life for the >> random value. >> >> We reset the random value of fake authorized clients and also its end of >> life when: >> >> a) number of real authorized clients in torrc changes from what we have >> in our state. >> b) end of life for the random value is reached. End of life will be >> timestamp + a random period between 30 and 90 days. >> c) obvious case when Tor is re-installed and old state is lost. >> >> We call this function on every HUP and (re)start. We can tune the >> numbers 8 - 32 and period 30 - 90 days as you like. >> >> This way there are a lot of buckets and significantly more time needed >> for an observer to deduce a probable number. It is quite possible one >> can never deduce a "probable enough" number. >> >> We combine this with faking extra if needed in the encrypted portion to >> the next multiple of 10k bytes. >> >> It's true that it won't help if the hidden service operator changes the >> number of authorized clients every hour for a long period but in >> practice this doesn't happen - number of authorized clients changes >> rarely. And even in this scenario it still makes things a lot more >> confusing. >> >> Compared to other parts of prop 224, this is easy to code and should be >> worth the effort. What do you think? > > If you want to do it this way, with noise and buckets, ask someone who is > good at differential privacy to do the numbers for you, rather than guessing. > > You'll need to know the level of activity you want to hide. > > T > As I said the numbers can be changed - I was illustrating an example. I guessed some numbers that seamed reasonable to me so I could give an example, and also because it's not a critical part. We only try to hide the number of real authorized clients, or make it as hard as possible for an observer to deduce a number close to the realistic number of authorized clients, that's all. Simply using the numbers that were guessed without deep knowledge in differential privacy is a lot better than using a global fixed value of 16, but as I said this doesn't need to be a debate because I am not against the fixed value, only saying it's better to randomize, if the solution exists.
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev