[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: Publishing sanitized bridge pool assignments
On Tue, Jan 25, 2011 at 12:12:45PM +0100, Karsten Loesing wrote:
> Hi everyone,
>
> we're pondering to publish the information which distribution pool a
> bridge is assigned to. The distribution pool defines whether we're giving
> out bridges via HTTP, via email, or not at all (reserved pool). The plan
> is to remove all sensitive information from bridge pool assignments before
> making them available on https://metrics.torproject.org/data.html.
>
> For the long version see task 2372 and comments:
>
> https://trac.torproject.org/projects/tor/ticket/2372
>
> For the summary version read on:
>
> We want to make sanitized bridge pool assignments available, so that we
> can answer questions like these:
>
> - What's the correlation between which pool the bridge is in and whether
> that bridge sees a lot of use from a given country?
>
> - Is bridge uptime affected by the pool assignment, because operators of
> bridges in the reserved pool decide that their bridge is not useful?
>
> Here's a proposed data format for bridge pool assignments:
>
> bridge-pool-assignment 2011-01-10 01:41:14
> b 127.0.0.1:443 abcdef0123456789abcdef0123456789abcdef01
> b 127.0.0.1:443 0123456789abcdef0123456789abcdef01234567
> s IP ring 1 (port-443 subring)
> s IP ring 1 (stable subring)
> s IP ring 1
>
> The timestamp in the bridge-pool-assignment line is the time when the
> assignment is written to disk (twice an hour). Lines starting with b
> contain IP address, port, and fingerprint of a bridge. For sanitizing
> purposes, we replace bridge IP addresses with 127.0.0.1 and bridge
> identities with their SHA-1 hashes. That's the same approach that we take
> for sanitizing bridge descriptors. Lines starting with s contain the
> rings or subrings that a bridge is allocated to. If a bridge is not
> assigned to any pool, it doesn't have an s line.
>
> While this information is useful for analysis, we need to be aware that
> these lists can be misused by a censor to learn what fraction of bridges
> is contained in which pool and what percentage of bridges of a given pool
> they can block. So far, they can only tell how many bridges there are in
> total and what fraction of these bridges they know. We'll have to decide
> if the questions we expect to answer using these data are worth it.
Here's a sample bridge pool assignment from September 2010 that is
sanitized as described above (all IP addresses set to 127.0.0.1, contained
fingerprints are SHA-1 hashes of the original fingerprints):
http://freehaven.net/~karsten/volatile/bridge-pool-assignment-sample
This sample is there, so that everyone gets a better idea of what is meant
by a bridge pool assignment. Does anyone object to publishing tarballs of
these sanitized bridge pool assignments on the metrics website, so that we
(and anyone else) can analyze them?
Best,
Karsten