[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Please help with measuring network statistics on your relay



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello everyone,

two months ago, I wrote a blog post describing plans to extend network
measurements:

https://blog.torproject.org/blog/performance-measurements-and-blockingresistance-analysis-tor-network

In brief, this plan includes that: entry guards count the number of
clients per country per day; relays determine statistics on the number
of cells waiting in their local queues; exit nodes count the number of
bytes and streams per exit port per day. All these statistics are
aggregated, so that none of the network data can be used to de-anonymize
users. These aggregations include counting users by country, counting
events per day, and rounding up to a multiple of 4 or 8. We need these
network data to make Tor faster and/or more useful for circumvention.

As of today, the necessary code changes to gather these statistics are
ready, including improved statistics for directory requests that have
been in the code before. For now, statistics are only written to local
files and not to extra-info documents. But before changing the
extra-info document format, I want to be sure that the gathered network
data are useful.

The only missing piece is a dozen or more people who configure their
nodes to gather these statistics for two weeks or longer. During and at
the end of this time I'll need the new files ending in -stats that
contain the gathered statistics. Stable and fast nodes are preferred.
For the exit port statistics we'll need some exit nodes permitting
exiting to _all_ ports. It doesn't matter for the statistics if a node
does not permit exiting or doesn't have the Guard flag. If there are no
results to report, the affected -stats file is omitted.

If you want to help out with gathering these statistics on your node (or
just want to know how statistics are measured), please do the following:

- - Run "git clone git://git.torproject.org/git/tor/".
- - Check that you have commit b71bbdc69a56 or later in your branch.
- - Run "./autogen.sh && ./configure --enable-dirreq-stats
- --enable-entry-stats --enable-buffer-stats --enable-exit-stats && make".
- - Possibly run "make install", or use the executable in src/or/tor.
- - Add four config options to your torrc: "DirReqStatistics 1",
"EntryStatistics 1", "CellStatistics 1", "ExitPortStatistics 1"; if you
only want to gather some of the statistics, only set those config
options to 1.
- - Add another config option to your torrc, saying where Tor can find
your GeoIP database; if you cloned the tor repository to ~/tor/, the
config option would be: "GeoIPFile ~/tor/src/config/geoip".
- - Start your node and look out for notice-level logs saying that your
node is gathering statistics.
- - Wait for 24 hours for the directory to write files called
dirreq-stats, entry-stats, cell-stats, and exit-stats to its data
directory; these files are extended every 24 hours.
- - Make the content of these -stats files available to me once after 24
hours, after 1 week, and after 2 weeks; I plan to make all files public
together with their analysis by mid-August.
- - Let me know about some basic bandwidth information of your node:
fingerprint, configured BandwidthRate, BandwidthBurst, and
MaxAdvertisedBandwidth if used; also let me know if you are okay with
being mentioned with your real name in PDFs based on these data.

Be aware that this code might contain bugs that break your node! You
should be comfortable running bleeding-edge software versions.

Here is a brief description what kind of data the four -stats files will
contain (examples show fewer data and shorter measurement intervals):

1. Directory request statistics
(dirreq-stats file, --enable-dirreq-stats, DirReqStatistics 1)

written 2009-07-19 18:12:08
started-at 2009-07-19 17:41:54
ns-ips ca=8,de=8,hk=8,ir=8,my=8,ro=8,us=8
ns-v2-ips au=8,ca=8,cn=8,de=8,es=8,gb=8,il=8,it=8,kw=8,ru=8,se=8,us=8
requests-start 2009-07-19 17:41:54
n-ns-reqs ca=8,de=8,hk=8,ir=8,my=8,ro=8,us=8
n-v2-ns-reqs au=8,ca=8,cn=8,de=8,es=8,gb=8,il=8,it=8,kw=8,ru=8,se=8,us=8
n-ns-resp
ok=16,not-enough-sigs=0,unavailable=0,not-found=0,not-modified=0,busy=0
n-v2-ns-resp ok=32,unavailable=0,not-found=8,not-modified=0,busy=0
v2-ns-share 0.05%
v3-ns-share 0.05%
ns-direct-dl complete=0,timeout=0,running=0
ns-v2-direct-dl
complete=28,timeout=0,running=0,min=5744,d1=15039,d2=60333,q1=76680,d3=79327,d4=101335,md=120688,d6=137291,d7=180365,q3=198729,d8=209279,d9=272368,max=3198322541
ns-tunneled-dl complete=12,timeout=0,running=0
ns-v2-tunneled-dl complete=0,timeout=0,running=0

The dirreq-stats file counts the number of directory requests coming
from clients asking for network statuses. The ns-ips and ns-v2-ips lines
list the number of unique IPs per country for v3 and v2 statuses,
n-ns-reqs and n-v2-ns-reqs the number of requests per country. n-ns-resp
and n-v2-ns-resp list the number of response codes, or rather reasons
for sending them. v2-ns-share and v3-ns-share are estimates of the share
of requests that a directory should see. ns-direct-dl and
ns-v2-direct-dl list the number of complete downloads, timeouts, and
still running downloads for direct requests. ns-tunneled-dl and
ns-v2-tunneled-dl show the same numbers for tunneled requests. When
there are more than 16 complete downloads in the latter four lines,
statistics are given about the client bandwidths in B/s, including
minimum/maximum, deciles, quartiles, and median.

2. Cell statistics
(buffer-stats file, --enable-buffer-stats, CellStatistics 1)

written 2009-07-19 18:11:55 (1800 s)
processed-cells 350,133,131,130,128,123,110,61,12,2
queued-cells 1.61,0.15,0.11,0.31,0.06,0.38,0.12,0.00,0.00,0.00
time-in-queue 3392,585,638,1562,348,1294,145,24,6,117
number-of-circuits-per-share 45

The buffer-stats file contains some statistics about the time that cells
spend in circuit queues. processed-cells are the mean number of total
processed cells per circuit, with circuits divided by 10 classes from
loudest to quietest circuits. queued-cells describe the mean number of
queued cells over time per circuit class. time-in-queue is the mean time
in milliseconds that a cell spends in a queue.
number-of-circuits-per-share is the number of circuits per circuit class.

3. Entry statistics
(entry-stats file, --enable-entry-stats, EntryStatistics 1)

written 2009-07-19 18:12:08
started-at 2009-07-19 17:41:54
ips
de=72,us=72,fr=24,gb=24,it=24,ru=24,cn=16,ir=16,pl=16,??=8,ae=8,ar=8,at=8,az=8,be=8,bg=8,br=8,ca=8,ch=8,co=8,cz=8,es=8,fi=8,gr=8,hk=8,hu=8,id=8,ie=8,il=8,in=8,jp=8,kr=8,kw=8,mx=8,my=8,nl=8,ph=8,qa=8,ro=8,sa=8,se=8,sk=8,tr=8,ua=8,vn=8,ye=8

The entry-stats file contains the number of connecting clients to an
entry node per country and 24 hours. These numbers are contained in the
ips line.

4. Exit port statistics
(exit-stats file, --enable-exit-stats, ExitPortStatistics 1)

written 2009-07-06 12:32:03 (86400 s)
kibibytes-written
80=784877,443=184575,27619=528,38230=1079,46060=520055,53456=632231,63032=996797,other=13048442
kibibytes-read
80=19296747,443=394341,27619=505020,38230=1029286,46060=67253,53456=112665,63032=64583,other=11429424
streams-opened
80=792612,443=43324,27619=4,38230=4,46060=4,53456=4,63032=4,other=244212

The exit-stats file contains the number of KiB and opened streams per
exit port per 24 hours. The lines show tuples of the exit port number
and the number of KiB or opened streams.


If you have any questions regarding these measurements, or find a bug in
the measurement code, please let me know, here or off-list!

Thanks!
- --Karsten
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkplBIkACgkQ0M+WPffBEmXvOwCfQnH3F2cZpoPgXrMHS9cVFo3t
rH8AoKrU3CYaLCugJ3Sk8DV9LnKSGzZ7
=obKt
-----END PGP SIGNATURE-----