[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Publishing sanitized bridge pool assignments

Hi everyone,

we're pondering to publish the information which distribution pool a
bridge is assigned to.  The distribution pool defines whether we're giving
out bridges via HTTP, via email, or not at all (reserved pool).  The plan
is to remove all sensitive information from bridge pool assignments before
making them available on https://metrics.torproject.org/data.html.

For the long version see task 2372 and comments:


For the summary version read on:

We want to make sanitized bridge pool assignments available, so that we
can answer questions like these:

 - What's the correlation between which pool the bridge is in and whether
   that bridge sees a lot of use from a given country?

 - Is bridge uptime affected by the pool assignment, because operators of
   bridges in the reserved pool decide that their bridge is not useful?

Here's a proposed data format for bridge pool assignments:

  bridge-pool-assignment 2011-01-10 01:41:14
  b abcdef0123456789abcdef0123456789abcdef01
  b 0123456789abcdef0123456789abcdef01234567
  s IP ring 1 (port-443 subring)
  s IP ring 1 (stable subring)
  s IP ring 1

The timestamp in the bridge-pool-assignment line is the time when the
assignment is written to disk (twice an hour).  Lines starting with b
contain IP address, port, and fingerprint of a bridge.  For sanitizing
purposes, we replace bridge IP addresses with and bridge
identities with their SHA-1 hashes.  That's the same approach that we take
for sanitizing bridge descriptors.  Lines starting with s contain the
rings or subrings that a bridge is allocated to.  If a bridge is not
assigned to any pool, it doesn't have an s line.

While this information is useful for analysis, we need to be aware that
these lists can be misused by a censor to learn what fraction of bridges
is contained in which pool and what percentage of bridges of a given pool
they can block.  So far, they can only tell how many bridges there are in
total and what fraction of these bridges they know.  We'll have to decide
if the questions we expect to answer using these data are worth it.