[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Client simulation





On 6/10/13 4:40 AM, Karsten Loesing wrote:
On 6/6/13 7:32 PM, Norman Danner wrote:
I have two questions regarding a possible research project.

First, the research question:  can one use machine-learning techniques
to construct a model of Tor client behavior?  Or in a more general form:
   can one use <fill-in-the-blank> to construct a model of Tor client
behavior?  A student of mine did some work on this over the last year,
and the results are encouraging, though not strong enough to do anything
with yet.

The intent is that each cluster (represented by a single hidden Markov
model) represents a "type" of client, even though we don't know for sure
what that client type does.  We can make some guesses about some:  the
"type" of steady high-volume cell counts is probably a bulk downloader;
the "type" of steady zero cell counts is probably an unused circuit;
etc.  But in some sense, I'm thinking that what counts is the behavior
of the client, not the reason for that behavior.  We don't have to
instrument clients for this.  Of course, then one has to ask whether
this kind of modeling is in fact useful.  It is somewhat different than
what you are envisioning, I think.

There are about a billion variations (at last count) on this theme.  We
chose one particular one as a test case to play with the methodology.  I
think the methodology is mostly OK, though I'm not completely satisfied
with the results of the particular variation Julian worked on.  So now
I'm trying to figure out whether to push this forward and in particular
what directions and end goals would be useful.

Interesting stuff!  You're indeed taking a different approach than I
were envisioning by gathering data on a single guard rather than on a
set of volunteering clients.  Both approaches have their pros and cons,
but I think your approach leads to some interesting results and can be
done in a privacy-preserving fashion.

Two thoughts:

- I could imagine that your results are quite valuable for modeling
better Shadow/ExperimenTor clients or for deriving better client models
for Tor path simulators.  Maybe Julian's thesis already has some good
data for that, or maybe we'll have to repeat the experiment in a
slightly different setting.  I'm cc'ing Rob (the Shadow author) and
Aaron (working on a path simulator) to make sure they saw this thread.
I can help by reviewing code changes to Tor to make sure data is
gathered in a privacy-preserving way, and I'd appreciate if those code
changes would be made public together with analysis results.

I'm in the process of rewriting the data collection code, and will e-mail later with some of the details. But maybe off-list initially, as I think the first few passes will be very special-purpose and hence not of general interest (though I'm happy to discuss it more publicly if that's more appropriate).

Right now I'm considering focusing on trying to get a reasonable (partial) answer to the following question: how well do various timing-analysis attacks actually work? That is, how well do they work when the client model is "accurate?" I'm not even sure how exactly to define "accurate," though I can think of at least a few different ways. But I'm hoping that by focusing on a relatively narrow question, we can see manageable chunks of questions related to what kinds of data can be reasonably collected, and how can we use that data for other purposes.

- It might be interesting to observe how Tor usage changes over time.
Maybe the research experiment leads to a set of classifiers telling us
when a circuit is most likely used for bulk downloads, used for web
browsing, used for IRC, unused, or whatever.  We could then extend
circuit statistics to have all relays report aggregate data of how
circuits can be classified.  Requires a proposal and code, but I could
help with those.

Yes, I can see a number of longer-range applications like this. I'm not sure I want to think about proposals and code just yet.

	- Norman

--
Norman Danner - ndanner@xxxxxxxxxxxx - http://ndanner.web.wesleyan.edu
Department of Mathematics and Computer Science - Wesleyan University
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev