[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] GoSC - Website Fingerprinting project

To: tor-dev@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [tor-dev] GoSC - Website Fingerprinting project
From: Mike Perry <mikeperry@xxxxxxxxxxxxxx>
Date: Tue, 18 Mar 2014 19:30:34 -0700
Delivered-to: archiver@xxxxxxxx
Delivery-date: Tue, 18 Mar 2014 22:30:51 -0400
In-reply-to: <531F3DE5.6000304@xxxxxxxxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <531F3DE5.6000304@xxxxxxxxxxxxxxxx>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>

Marc Juarez:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Lunar:
> > Have you read Mike Perry's long blog post on the topic?
> > https://blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks
> > 
> > It outlines future research work in evaluating the efficiency of
> > fingerprinting attacks, and also mention a couple of promising defenses.
> 
> Yes, I am aware of it and I'm currently working on a study to evaluate
> the efficiency of these attacks.
> 
> As Mike Perry said in the post, most of the attacks give an unrealistic
> advantage to the adversary and probably countermeasures work much better
> than what has been shown so far.
> 
> However, some of the results of these articles suggest that there exist
> coarse-grained traffic features that are invariant to randomized
> pipelines (RP, SPDY) and thus can still identify web pages (Dyer et.
> al.). Also, edit-distance based classifiers broke some old versions of
> the RP implemented in Tor Browser.
> 
> It's an open problem to see if these features actually uniquely identify
> web pages in larger worlds than the ones considered in the literature.
> In any case, link-padding strategies are specially designed to conceal
> these features with the minimal amount of cover traffic and are becoming
> affordable in terms of bandwidth.
> 
> The project I propose would be directed to address this bug ticket:
> 
> https://trac.torproject.org/projects/tor/ticket/7028
> 
> For example, I would like to implement the common building blocks for
> link-padding countermeasures (such as a "traffic generator controller"
> in the onion proxy and the entry guard).

This sounds like a good summer-sized amount of work. I think I am in
agreement with George that pluggable transports are a good place to
start for prototyping this work. That way, you can experiment with
custom padding protocols easily, without needing to make invasive
changes to tor-core for each revision, each time.

For example, it would be neat to be able to transmit a set of statistics
to your bridge node during the connection handshake or with the circuit
setup, so that you don't have to always request downstream padding cells
with a upstream cell, and downstream padding can asynchronously arrive
according to some probability or histrogram distribution you specify.

You could also obviously specify a number of cells to send in response
to a padding cell request (from O..N, where N is some reasonable cap
similar to a largeish web object size). The current Tor link padding
protocol supports neither of these operations.

More advanced padding protocols are also possible, but may also be
overkill. We can discuss those further if this sounds interesting. I'd
also like to hear any ideas you might have on the design and/or
implementation of such a protocol.


Related: Do you happen to have any existing classifier code working
already, by any chance?

One of the ideas I've been considering is taking a closer look at the
nearest-neighbor edit distances between page class labels, for the edit
distance based classifiers. This distance provides us with an estimate
of the ideal minimum cover traffic we will need to make testing
instances jump from one nearest-neighbor label to another (causing a
false positive). It will also decrease as the world size increases (more
class labels in the same amount of N-dimensional space).

A successful defense should change of the distribution of edit distances
of test instances around their class labels (it will increase the
intra-class variance) and this in turn will increase the size of the
threshold around class labels for a given accuracy rate, reducing
accuracy or increasing false positives.

It may also be the case that low or no cost defenses (like a smarter use
of SPDY) do this, too, but we'll be able to see it for sure with
padding.

Does this make sense?


-- 
Mike Perry

Attachment: signature.asc
Description: Digital signature

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Follow-Ups:
- Re: [tor-dev] GoSC - Website Fingerprinting project
  - From: Kevin P Dyer

References:
- [tor-dev] GoSC - Website Fingerprinting project
  - From: Marc Juarez

Prev by Author: Re: [tor-dev] Panopticlick summer project
Next by Author: Re: [tor-dev] Panopticlick summer project
Previous by thread: [tor-dev] GoSC - Website Fingerprinting project
Next by thread: Re: [tor-dev] GoSC - Website Fingerprinting project
Index(es):
- Author
- Thread