[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Safely collecting data to estimate the number of Tor users



Le Thu, 26 Aug 2010 13:31:14 +0200,
Karsten Loesing <karsten.loesing@xxxxxxx> a Ãcrit :

> Hi everyone,
> 
> in the past year or so, we put some efforts on finding out how many
> people use the Tor network every day. We expect that there are 500,000
> daily users, but we have no good data to support this expectation.
> We'd like to be more certain about the user count in order to
> understand the Tor network better and hopefully improve it.
> 
> We have started writing down the current state of counting users in a
> privacy-preserving way. Note that this is just a draft that is going
> to change over time:
> 
> 
> https://gitweb.torproject.org/karsten/metrics.git/blob_plain/refs/heads/counting-users:/report/counting-users/countingusers.pdf
> 
> One of the more promising approaches to count Tor users is to count
> unique client IP addresses on a fast directory mirror (see Section 3.2
> "Count unique IP addresses of connecting clients..."). We make use of
> the fact that clients send out 20 to 80 directory requests per day and
> very likely contact every fast directory mirror at least once. This is
> going to change with the directory guard design, though. We'll need to
> come up with a way to combine the findings of multiple directory
> guards.
> 
> So, here's my plan for researching this more: I'd like to run an
> experiment with multiple fast directory mirrors run by the same
> operator on the same host (like Jake's trusted and Pandora*, Olaf's
> blutmagie*, Moritz's torserversNet*, etc.). I'm going to write a
> patch for Tor to accept some key string in its torrc and extend
> SafeLogging to accept the value 'encrypt'. Tor will then pass all
> client IP addresses through a keyed hash function using the provided
> key string and write the result to its logs. I'm also going to
> implement #1668 to make log granularity configurable. The operators
> configure the same key string for all their relays and run them with
> the new SafeLogging option and logging granularity of 15 minutes for,
> say, a week. Operators then delete the key string and only keep the
> logs. The operators do not give out these logs to me or anyone else.
> I'm going to write Python scripts to analyze the logs and publish
> them for the operators and others to review. The operators will run
> these scripts and publish the results.
> 
> I hope to learn more about the overlap of unique IP address sets seen
> by fast directory mirrors and also about client uptime sessions. I'd
> like to try out different schemes to safely combine unique IP address
> sets to come up with a better user count.
> 
> Before writing code, what questions or concerns are there about this
> experiment? Are there better ways to achieve what I'm trying to
> achieve?
> 
> Thanks,
> --Karsten

Hi Karsten,

It seem a great idea for me and hope that it can be used as Non-Exit
Relay with High bandwitch and all flag assigned ( i mean stable. named,
HS, Guard, fast, Dir)

If yes, i will with pleasure contribute to help like always.

Best Regards

SwissTorExit