[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: 27C3 on Tor



On Tue, Dec 28, 2010 at 11:29 PM, Roger Dingledine <arma@xxxxxxx> wrote:
[snip]
> What I'm really looking forward to is learning what modifications to Tor
> might slow down the attack. For example, what happens if we move to a 1024
> byte cell by default, or if we randomly add some extra cells periodically,
> or if we ask the entry node to add padding cells so the responses we get
> are multiples of 10KB? It would seem that there is a tradeoff between
> bandwidth overhead (wasted bytes) and protection against this attack,
> but I hope there are smart points in the tradeoff space. Alas, we're
> still not really to that point yet -- we don't know how well it actually
> works in practice against vanilla Tor, so it doesn't make sense to ask
> how well it would work in practice against a modified Tor design.


We can do some useful "back of the envelope calculations" so that we
can say _something_ useful about the rounding.

I spent a few minutes now contemplating this, and I thought I'd make
the data available that I used for anyone else for anyone interested
in studying this.

http://myrandomnode.dyndns.org:8080/~gmaxwell/wp_article_sizes.txt.gz

contains the uncompressed sizes of the wikitext for the 3.5 million
English Wikipedia articles (as of Wikimedia's 2010-10 dump).

Here is how we can use the data to reason about this attack:

Assume that the attacker knows the target is browsing Wikipedia, and
that they know the exact size of the pages loaded and want to know
what articles the person is reading. Based on this data we can compute
the  entropy and to discover how much they will learn about each page
load.  We can then study how much quantization the size reduces
entropy.

Of course, attackers have a number of additional avenues to increase
the usefulness of the data they obtain: They may have some assumptions
about the prior probabilities (other than "user is browsing
Wikipeda"), they may also reason about the interlinkedness of
articlesâ e.g. a second page load is very likely a page linked from
the first load. You might conservatively estimate that each and every
request adds its total to the attackers aggregate knowledge.

There are a number of limits to this line of studyâ Wikipedia articles
are served in HTML form (not wikitext) and in the gzip encoding. I can
wave my arms and say that I don't expect the conversion HTML and HTTPS
transport to change the entropy much, and that I expect gzip to
decrease it (because smaller sizes have intrinsically less entropy).
Normally articles contain inline imagesâ the loading sizes of these
objects probably increase the entropy enormously. These probably
aren't important compared to the fact that Wikipedia is not the whole
internet. :) Still, it's a starting point.

Here is some data,

Using the James-Stein shrinkage estimate of entropy (which gives
slightly larger results than the empirical entropy):

log2(Cell size)    Entropy in bits
0	13.48422
1	12.48014
2	11.47869
3	10.47837
4	9.478465
5	8.478762
6	7.480331
7	6.48253
8	5.493885
9	4.507543
10	3.526705
11	2.551070
12	1.599523
13	0.8287433
14	0.3627942
15	0.1329697
16	0.03448373
17	0.004374095
18	0.0002002991
19	1.336822e-05
20	6.684109e-06
(there is a single page of size zero, otherwise 20 would have 0
entropy. Over a real transport the size would never be zero, so a unit
of 2^20 would be sufficient to reduce the leakage to zero for this
data).

So for this data, changing the transmission unit from 512 (4) to 1024
(5) would only decrease the information learned by an unbiased
attacker from one request by one bit.  (Unsurprisingly, the entropy of
the pages sizes is not concentrated in the least significant bits)

If you make any assumption that the attacker accumulates data from
request to request (e.g. due to page linkage) then I think that a
change from 512 to 1024 does not effectively thwart this attack
against this data set. If the attacker does not have that ability then
the current transmission unit already provides a substantial, and
probably sufficient, reduction in information leaked.
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxxx with
unsubscribe or-talk    in the body. http://archives.seul.org/or/talk/