[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Anonymity-preserving collection of usage data of a hidden service authoritative directory



On Fri, Apr 13, 2007 at 10:53:40PM +0200, Karsten Loesing wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> How can we collect data on the usage of a hidden service authoritative
> directory in an anonymity-preserving way? Or, the question that comes
> before: Can we?
> 
> This data could be vital for designing a decentralized storage of hidden
> service descriptors. Are there 10 or 1000 hidden services running at a
> time? Are fetch requests distributed equally over all hidden services or
> are there hot spots? Those questions cannot be answered without some
> real data.
> 
> Obviously, such a collection needs to be done in an anonymity-preserving
> way. Though the anonymity of hidden services does not rely primarily on
> the integrity of the directory operator, it plays a role. The operator
> can find out which hidden service is online or attack its introduction
> points (see my other posting on encrypting descriptors).
> 
> Here are two ideas of how to collect this data:
> 
> 1. One of the directory server operators could temporarily add a handful
> of logging statements to the code that writes publish and fetch requests
> for hidden service descriptors with inquired onion addresses to a log
> file. He could then anonymize onion addresses by consistently replacing
> them with something like hash(onion address + "some random number to
> forget afterwards") and publish them on the Tor homepage. Everyone could
> make a statistic from the data, but nobody would be able to identify a
> certain hidden service.

This is vulnerable to some kinds of attacks.  For instance, if I
wanted to see the statistics for foo.onion, I could make a bunch of
requests for foo.onion at 12:03, then a bunch at 12:19, then a bunch
at 12:42, and then look through the published statistics to see which
"anonymized" address had a lot of requests at those times.  Not so
good.

> 2. We extend the code permanently to create a new status page containing
> hidden service directory activity. This could include _aggregated_
> values, e.g. number of fetches in 15-minute intervals of the last 24
> hours (comparable to bandwidth measurement).

I think the second-safest thing would be to do a combination of the
two approaches: collect information in RAM, and dump totals on a
12-hour basis, with actual addresses hidden by hashing them
concatenated to a random value that the code never actually discloses.
This is still vulnerable to some statistical attacks as above, but
less so.

The single-safest thing would be not to collect this information at
all.  Maybe we should look carefully at what we hope to learn from it,
and try to collect the statistics we want, rather than the actual raw
data.  (In other words, if you want to know how many hidden services
are active at a given time, it's far safer to have the code output the
total, rather than having the code output a list of hidden services
which you then proceed to count.)

So, I'll start by asking: what statistics about hidden services are
you hoping to collect?

yrs,
-- 
Nick Mathewson

Attachment: pgp71ksWqIojJ.pgp
Description: PGP signature