[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Questions about gathering information and statistics about the tor-network

Hi Karsten,
sorry for my late reply but I was really busy these days.

On Wed, 14 Jan 2009 23:10:10 +0100
Karsten Loesing <karsten.loesing@xxxxxxx> wrote:

> Well, first of all, I should say that your concerns about possibly
> endangering anonymity of Tor users are very important. The data you
> collect should not be usable to deanonymize Tor users.
> For example, you mention collection of data on entry nodes (and that you
> don't want to collect them, okay). What you should _not_ do is collect
> precise data about who connected to your entry node at what time.
> Someone else could collect similar data on their exit nodes what targets
> are requested at what time. Both data sets don't pose a risk on their
> own, but put together... *ouch*   
> A better way to collect such data
> would be to aggregate them over, say, 24 or 48 hours, aggregate them by
> country instead of memorizing single IP addresses, and round them up to
> multiples of 8 or 16. 

You're right with this! It also wasn't my intention at all to set up such special logging facility on my node but was just an idea how all entry-nodes could do together to get some overall network stats. But you correctly reminded me that such data could be used to deanomyize people so it looks like getting information about the current number of users of the tor-network without risking their anonymity is not possible.

> That's about how geolocations of directory users
> can be collected right now.

This sounds interesting. Can those informations be questioned somehow from the dir-servers or are they non-public?

> If you wanted to experience a few dozen enraged privacy researchers, you
> should have been at last PETS when a study on the Tor network, 'Shining
> Light in Dark Places: Understanding the Tor Network', was presented.
> Apart from the authors' consideration to make their data available to
> the research community in an 'anonymized way' (I don't recall their full
> plan for anonymizing them), that paper is a good read! ;)

Thanks for the hint, I found it online (http://www.freehaven.net/anonbib/cache/mccoy-pet2008.pdf).

> So, the right way to collect data about an anonymity network is for sure
> a hot topic. Prepare for a lively discussion here. ;)
> Anyway, I wanted to give you some pointers. Did you know that gathering
> good statistics of the Tor network is on the 3-year roadmap (Section 5.7)?
> https://svn.torproject.org/svn/tor/trunk/doc/roadmaps/2008-12-19-roadmap-full.pdf
> This should really not stop you from doing your own statistics!

It won't ;)

> Also, you might be interested in an analysis of bridge usage in Tor. The
> bridge authority Tonga collects data about all bridges in the network in
> order to give them out to bridge clients. These data are also archived
> for later statistical analysis. The approach of evaluating these data
> might be interesting for you. The data model is more or less the same as
> for non-bridge data. Ah, and please keep in mind that this is only an
> early draft of the analysis *cough*. If you want, you can find the
> evaluation scripts in the parent directory of the same SVN repository:
> https://svn.torproject.org/svn/projects/dir-stats/trunk/bridge-stats/report/bridge-stats-2008-12-25.pdf

Those stats you gathered about the bridges here are really interesting! Since I read it I'm thinking how to interprete them. It looks like we have already "many" bridges for the short time they are supported in stable tor but just a small number of overall traffic (based on the bandwith consumption) on them. This could be intresting for people who want to support the network because they don't need to setup a 4TB-root for running a bridge. Also most users seem to be germans/americans and not people of countrys one would think who would be the number one. I'm thinking why? Afaik no provider in germany restricts the access to tor. Do people use bridges because they think this "extra hop" increases their anonymity instead of letting the bandwith for the people who really needs them?  

> If you have ideas on what data should be collected (and how that can be
> done in an anonymity-preserving way) or what statistics should be
> performed with existing data, your input is most welcome!

Well there are many interesing information which could be gathered without touching users anonymity at all. In contrast there are information which needs to be collected to protect their anonymity. Stats like the sudden increase of nodes in a fascist country like e.g. burma,china and so on shouldn't happen without people noticing it. For the beginning I want to get all the interesting information out of the service-descriptors and make them visible.

Have you already thought about a good way to present the data? I think best would be a dynamic solution so one gatheres all the information and users can throw exactly the information they want in one pot which they want to see joined in one graph for a timeline they can choose. But I don't know any good framework which offers this. At the moment I just found cewolf(http://cewolf.sourceforge.net/new/index.html) and I don't know if it fits the needs of which I'm thinking off. So I think gathering the already available information is more easily than finding a good way to make them easily public for the average user.

I'll be really busy the next month but as soon as I have something to show I'll let you know!


Attachment: signature.asc
Description: PGP signature