[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Feedback on obfuscating hidden-service statistics

To: "A. Johnson" <aaron.m.johnson@xxxxxxxxxxxx>
Subject: Re: [tor-dev] Feedback on obfuscating hidden-service statistics
From: George Kadianakis <desnacked@xxxxxxxxxx>
Date: Wed, 26 Nov 2014 12:45:59 +0000
Cc: tor-dev@xxxxxxxxxxxxxxxxxxxx
Delivered-to: archiver@xxxxxxxx
Delivery-date: Wed, 26 Nov 2014 07:46:18 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1417005962; bh=PSwhd6J96uawvJtfiUBNKIOd8OSlYPzbNidIbDtWPdM=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=BGES/vABHXjHqBPpL8m7ut1c7GN/gxp7hYpdn2Gg6tvvdfAQ2RuRW1Dhyy93BfPZI vDbzqFds0g1m/BHOSktWdRz1njsc5aiaYeNhZ3rkwoLlKZgLw3yPplK58E89DHGdU1 5fODQyrRQ83iMtvdToQkjT3+vevgQrVyJVtULfRc=
In-reply-to: <A9675800-FCC0-4114-B03F-E3186B245EF6@xxxxxxxxxxxx> (A. Johnson's message of "Wed, 26 Nov 2014 07:14:16 +0900")
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <546C657F.6050902@xxxxxxxxxxxxxx> <9843E8B2-15FA-4C25-AE59-5D5FFAE25F6E@xxxxxxxxxxxx> <87sih7tbxs.fsf@xxxxxxxxxx> <A9675800-FCC0-4114-B03F-E3186B245EF6@xxxxxxxxxxxx>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>
User-agent: Microsoft Outlook Express 6.00.2900.5843

"A. Johnson" <aaron.m.johnson@xxxxxxxxxxxx> writes:

> Hi George,
>
>> I posted an initial draft of the proposal here:
>> https://lists.torproject.org/pipermail/tor-dev/2014-November/007863.html
>> Any feedback would be awesome.
>
> OK, Iâll have a chance to look at this in the next few days.
>
>> Specifically, I would be interested in undertanding the concept of
>> additive noise a bit better. As you can see the proposal draft is
>> still using multiplicative noise, and if you think that additive is
>> better we should change it. Unfortunately, I couldn't find any good
>> resources on the Internet explaining the difference between additive
>> and multiplicative noise. Could you expand a bit on what you said
>> above? Or link to a paper that explains more? Or link to some other
>> system that is doing additive noise (or even better its implementation)?
>
> The technical argument for differential privacy is explained in
> <http://research.microsoft.com/en-us/projects/databaseprivacy/dwork.pdf>.
> The definition appears in Def. 2, the Laplace mechanism is given in
> Eq. 3 of Sec. 5, and Thm. 4 shows why that mechanism achieves
> differential privacy.
>
> But that stuff is pretty dry. The basic idea is that youâre trying to
> the contribution of any one sensitive input (e.g. a single userâs data
> or a single component of a single userâs data). The noise that you
> need to cover that doesnât scale with the number of other users, and
> so you use additive noise.
>

Thanks for the resources!

I think I now get the general idea. I don't really understand why it
works or why Laplace is the best distribution for this job, but maybe
it doesn't matter too much for now.

The next problem is how to find the proper parameters for the Laplace
distribution. I guess the mean Î needs to be 0, but the hard part is
'b'. In a few papers I read, they set 'b' to (Îf/Î).

In the above, Îf is the "largest change a single participant could
have on the output" of the query. Trying to fit this database paradigm
to our use case, the largest change a single HS could cause to the
HSDir HS counting stats is change the result by 1. So Îf is 1, and I
think that Î is some kind of security (sensitivity) parameter, let's
set that to 0.3 or something.

So this gives us approx Laplace(0, 4) which can be seen with blue color here:
https://upload.wikimedia.org/wikipedia/commons/0/0a/Laplace_pdf_mod.svg
In the end of this post, I put a few samples from this distribution [0].
The generated noise seems reasonable for this job.

Now, I'm wondering how to do the same thing for the RP cell
statistics.  In this case, Îf would have to be the largest amount of
cells we hope to obfuscate in an RP circuit. This is a chicken-and-egg
situation, since we don't really know how many cells we usually get
without doing these stats first.

Maybe we can use the preliminary stats from #13192, which contain both
RP and IP cells (but IP cells will probably be a minority). Or maybe
we can fit the distribution dynamically based on the amount of cells
we receive every day (does this even make sense)? Or what?

BTW, I plan to start implementantion of this early next week, so that
it's ready by mid-December. I hope we have a good solution to this by
then, otherwise I will have to do something else (round up the stats
to the nearest multiple or something) :/

Thanks!

[0]:for i in xrange(100):
       print numpy.random.laplace(0,4)
   ....:     
3.75587440621
-4.28136229035
4.76311443928
4.05142557505
1.70198910055
-3.37374208295
1.12837234927
-0.905282823974
7.66083097188
0.246385660561
-3.52939581339
-1.3368353768
-1.7482807282
2.98489896819
2.87155179984
-2.72961210143
-3.04409210121
-1.1975804202
-0.34861261134
0.953918739146
-14.3586324803
0.272984575989
3.41377347603
6.48752681038
-4.74036696099
0.668294672995
3.15847434594
-1.58855932489
1.65921624515
-0.529373859224
1.1739048689
-2.2201602699
-0.510111160097
2.58474424973
-7.4773321899
-13.4406958005
-1.34083931335
0.34051030906
-1.09939649788
0.647560027442
2.05240761873
-0.275439053432
10.1238334205
9.0960448449
3.20236196087
-2.27093832694
-19.187310803
0.894898545361
3.62459774003
2.10979313978
-0.633823085078
3.32591049399
3.11206489604
6.52626921692
2.68590966921
1.64033470377
0.997911309606
2.39357922671
0.308907976786
-3.02768280735
3.07096999256
-0.907608650976
1.72587291595
-0.838153001361
-1.23764100768
-9.56662634071
7.89275256421
10.4346665539
-0.522605672578
1.88585708734
1.77708545023
-0.301420228241
8.69964251692
-3.35490635732
-3.14148766097
1.73070057195
0.0426008469217
-1.74373108092
4.18116416817
0.139266645962
-1.32024236062
-2.40639481448
-0.364266143555
-3.6882489347
5.79025078063
0.386467380832
2.5388775702
1.60630747885
3.53930934459
-3.2270856708
4.15611732496
6.53669582267
3.83838409062
-2.62835636891
-1.36484455975
5.02827935505
0.693370215176
1.91312352565
1.93007931702
3.24710666718

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

References:
- Re: [tor-dev] Feedback on obfuscating hidden-service statistics
  - From: George Kadianakis
- Re: [tor-dev] Feedback on obfuscating hidden-service statistics
  - From: A. Johnson

Prev by Author: Re: [tor-dev] Potential projects for SponsorR (Hidden Services)
Next by Author: [tor-dev] HSDir Auth and onion descriptor scraping
Previous by thread: Re: [tor-dev] Feedback on obfuscating hidden-service statistics
Next by thread: [tor-dev] Using Shadow to increase agility of Sponsor R deliverables
Index(es):
- Author
- Thread