[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Publishing sanitized bridge pool assignments

On Tue, Jan 25, 2011 at 12:12:45PM +0100, Karsten Loesing wrote:
> Hi everyone,
> we're pondering to publish the information which distribution pool a
> bridge is assigned to.  The distribution pool defines whether we're giving
> out bridges via HTTP, via email, or not at all (reserved pool).  The plan
> is to remove all sensitive information from bridge pool assignments before
> making them available on https://metrics.torproject.org/data.html.
> For the long version see task 2372 and comments:
>   https://trac.torproject.org/projects/tor/ticket/2372
> For the summary version read on:
> We want to make sanitized bridge pool assignments available, so that we
> can answer questions like these:
>  - What's the correlation between which pool the bridge is in and whether
>    that bridge sees a lot of use from a given country?
>  - Is bridge uptime affected by the pool assignment, because operators of
>    bridges in the reserved pool decide that their bridge is not useful?
> Here's a proposed data format for bridge pool assignments:
>   bridge-pool-assignment 2011-01-10 01:41:14
>   b abcdef0123456789abcdef0123456789abcdef01
>   b 0123456789abcdef0123456789abcdef01234567
>   s IP ring 1 (port-443 subring)
>   s IP ring 1 (stable subring)
>   s IP ring 1
> The timestamp in the bridge-pool-assignment line is the time when the
> assignment is written to disk (twice an hour).  Lines starting with b
> contain IP address, port, and fingerprint of a bridge.  For sanitizing
> purposes, we replace bridge IP addresses with and bridge
> identities with their SHA-1 hashes.  That's the same approach that we take
> for sanitizing bridge descriptors.  Lines starting with s contain the
> rings or subrings that a bridge is allocated to.  If a bridge is not
> assigned to any pool, it doesn't have an s line.
> While this information is useful for analysis, we need to be aware that
> these lists can be misused by a censor to learn what fraction of bridges
> is contained in which pool and what percentage of bridges of a given pool
> they can block.  So far, they can only tell how many bridges there are in
> total and what fraction of these bridges they know.  We'll have to decide
> if the questions we expect to answer using these data are worth it.

Here's a sample bridge pool assignment from September 2010 that is
sanitized as described above (all IP addresses set to, contained
fingerprints are SHA-1 hashes of the original fingerprints):


This sample is there, so that everyone gets a better idea of what is meant
by a bridge pool assignment.  Does anyone object to publishing tarballs of
these sanitized bridge pool assignments on the metrics website, so that we
(and anyone else) can analyze them?