[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Questions about gathering information and statistics about the tor-network

Hash: SHA1

Hi Sebastian,

Sebastian Schmidt wrote:
> Hi, I'm writing a tool right now to gather some longtime statistics
> about the tor-network.

That sounds like a fun project! :)

I'm a bit in a hurry and cannot answer your posting in detail, sorry for
that. But let me give you some pointers now.

Well, first of all, I should say that your concerns about possibly
endangering anonymity of Tor users are very important. The data you
collect should not be usable to deanonymize Tor users.

For example, you mention collection of data on entry nodes (and that you
don't want to collect them, okay). What you should _not_ do is collect
precise data about who connected to your entry node at what time.
Someone else could collect similar data on their exit nodes what targets
are requested at what time. Both data sets don't pose a risk on their
own, but put together... *ouch*   A better way to collect such data
would be to aggregate them over, say, 24 or 48 hours, aggregate them by
country instead of memorizing single IP addresses, and round them up to
multiples of 8 or 16. That's about how geolocations of directory users
can be collected right now.

If you wanted to experience a few dozen enraged privacy researchers, you
should have been at last PETS when a study on the Tor network, 'Shining
Light in Dark Places: Understanding the Tor Network', was presented.
Apart from the authors' consideration to make their data available to
the research community in an 'anonymized way' (I don't recall their full
plan for anonymizing them), that paper is a good read! ;)

So, the right way to collect data about an anonymity network is for sure
a hot topic. Prepare for a lively discussion here. ;)

Anyway, I wanted to give you some pointers. Did you know that gathering
good statistics of the Tor network is on the 3-year roadmap (Section 5.7)?


This should really not stop you from doing your own statistics! We have
just started with that and there's definitely enough fun work left to
do. :) But maybe some thoughts in that document are interesting for you.

Also, you might be interested in an analysis of bridge usage in Tor. The
bridge authority Tonga collects data about all bridges in the network in
order to give them out to bridge clients. These data are also archived
for later statistical analysis. The approach of evaluating these data
might be interesting for you. The data model is more or less the same as
for non-bridge data. Ah, and please keep in mind that this is only an
early draft of the analysis *cough*. If you want, you can find the
evaluation scripts in the parent directory of the same SVN repository:


There will be some more statistics on the Tor network within the next
weeks. My plan is to evaluate archived network statuses, router
descriptors, and extra-info documents of the past 12 months to get a
better idea on the network growth and related facts. Further, I'd like
to evaluate geolocations of client requests to the directory authorities
and directory mirrors. And I want to finish that bridge data analysis.

So, to answer one of your questions: Yes, people are interested in such
statistics. :)

If you have ideas on what data should be collected (and how that can be
done in an anonymity-preserving way) or what statistics should be
performed with existing data, your input is most welcome!

And sorry again for ignoring most of your posting. I'll try to get to it
the next days.

- --Karsten
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org