[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: Please help with measuring network statistics on your relay
Hi everyone,
<netiquette-discussion>First off, sorry for top-quoting my own message.
I find it easier for others to have the original context available if
required, though I'm not referring to any specific part of my earlier
message.</netiquette-discussion>
I'm looking for help by relay operators who are running fast and stable
relays (not only in the sense of the network status flags) and who want
to participate in gathering statistics about the Tor network.
To give you an idea what your statistics are used for, see the graphs here:
http://metrics.torproject.org/graphs.html
If people want to help gathering statistics on their relays, they'll
need to:
- run Tor 0.2.2.4-alpha or higher,
- enable CellStatistics, DirReqStatistics, EntryStatistics, and/or
ExitStatistics (depending on the statistics to gather), and
- enable ExtraInfoStatistics to include statistics in the extra-info
descriptors that relays upload to the directory authorities (important).
Any further steps, like sending me the contents of files from Tor's data
directory, are NOT required anymore. I'm extracting the relevant lines
from the extra-info descriptor archives (as can everybody else).
For further information see my earlier mail below. The data format has
slightly changed, and the configuration has become much easier.
Thanks,
--Karsten
On 07/21/2009 01:58 AM, Karsten Loesing wrote:
> Hello everyone,
>
> two months ago, I wrote a blog post describing plans to extend network
> measurements:
>
> https://blog.torproject.org/blog/performance-measurements-and-blockingresistance-analysis-tor-network
>
> In brief, this plan includes that: entry guards count the number of
> clients per country per day; relays determine statistics on the number
> of cells waiting in their local queues; exit nodes count the number of
> bytes and streams per exit port per day. All these statistics are
> aggregated, so that none of the network data can be used to de-anonymize
> users. These aggregations include counting users by country, counting
> events per day, and rounding up to a multiple of 4 or 8. We need these
> network data to make Tor faster and/or more useful for circumvention.
>
> As of today, the necessary code changes to gather these statistics are
> ready, including improved statistics for directory requests that have
> been in the code before. For now, statistics are only written to local
> files and not to extra-info documents. But before changing the
> extra-info document format, I want to be sure that the gathered network
> data are useful.
>
> The only missing piece is a dozen or more people who configure their
> nodes to gather these statistics for two weeks or longer. During and at
> the end of this time I'll need the new files ending in -stats that
> contain the gathered statistics. Stable and fast nodes are preferred.
> For the exit port statistics we'll need some exit nodes permitting
> exiting to _all_ ports. It doesn't matter for the statistics if a node
> does not permit exiting or doesn't have the Guard flag. If there are no
> results to report, the affected -stats file is omitted.
>
> If you want to help out with gathering these statistics on your node (or
> just want to know how statistics are measured), please do the following:
>
> - Run "git clone git://git.torproject.org/git/tor/".
> - Check that you have commit b71bbdc69a56 or later in your branch.
> - Run "./autogen.sh && ./configure --enable-dirreq-stats
> --enable-entry-stats --enable-buffer-stats --enable-exit-stats && make".
> - Possibly run "make install", or use the executable in src/or/tor.
> - Add four config options to your torrc: "DirReqStatistics 1",
> "EntryStatistics 1", "CellStatistics 1", "ExitPortStatistics 1"; if you
> only want to gather some of the statistics, only set those config
> options to 1.
> - Add another config option to your torrc, saying where Tor can find
> your GeoIP database; if you cloned the tor repository to ~/tor/, the
> config option would be: "GeoIPFile ~/tor/src/config/geoip".
> - Start your node and look out for notice-level logs saying that your
> node is gathering statistics.
> - Wait for 24 hours for the directory to write files called
> dirreq-stats, entry-stats, cell-stats, and exit-stats to its data
> directory; these files are extended every 24 hours.
> - Make the content of these -stats files available to me once after 24
> hours, after 1 week, and after 2 weeks; I plan to make all files public
> together with their analysis by mid-August.
> - Let me know about some basic bandwidth information of your node:
> fingerprint, configured BandwidthRate, BandwidthBurst, and
> MaxAdvertisedBandwidth if used; also let me know if you are okay with
> being mentioned with your real name in PDFs based on these data.
>
> Be aware that this code might contain bugs that break your node! You
> should be comfortable running bleeding-edge software versions.
>
> Here is a brief description what kind of data the four -stats files will
> contain (examples show fewer data and shorter measurement intervals):
>
> 1. Directory request statistics
> (dirreq-stats file, --enable-dirreq-stats, DirReqStatistics 1)
>
> written 2009-07-19 18:12:08
> started-at 2009-07-19 17:41:54
> ns-ips ca=8,de=8,hk=8,ir=8,my=8,ro=8,us=8
> ns-v2-ips au=8,ca=8,cn=8,de=8,es=8,gb=8,il=8,it=8,kw=8,ru=8,se=8,us=8
> requests-start 2009-07-19 17:41:54
> n-ns-reqs ca=8,de=8,hk=8,ir=8,my=8,ro=8,us=8
> n-v2-ns-reqs au=8,ca=8,cn=8,de=8,es=8,gb=8,il=8,it=8,kw=8,ru=8,se=8,us=8
> n-ns-resp
> ok=16,not-enough-sigs=0,unavailable=0,not-found=0,not-modified=0,busy=0
> n-v2-ns-resp ok=32,unavailable=0,not-found=8,not-modified=0,busy=0
> v2-ns-share 0.05%
> v3-ns-share 0.05%
> ns-direct-dl complete=0,timeout=0,running=0
> ns-v2-direct-dl
> complete=28,timeout=0,running=0,min=5744,d1=15039,d2=60333,q1=76680,d3=79327,d4=101335,md=120688,d6=137291,d7=180365,q3=198729,d8=209279,d9=272368,max=3198322541
> ns-tunneled-dl complete=12,timeout=0,running=0
> ns-v2-tunneled-dl complete=0,timeout=0,running=0
>
> The dirreq-stats file counts the number of directory requests coming
> from clients asking for network statuses. The ns-ips and ns-v2-ips lines
> list the number of unique IPs per country for v3 and v2 statuses,
> n-ns-reqs and n-v2-ns-reqs the number of requests per country. n-ns-resp
> and n-v2-ns-resp list the number of response codes, or rather reasons
> for sending them. v2-ns-share and v3-ns-share are estimates of the share
> of requests that a directory should see. ns-direct-dl and
> ns-v2-direct-dl list the number of complete downloads, timeouts, and
> still running downloads for direct requests. ns-tunneled-dl and
> ns-v2-tunneled-dl show the same numbers for tunneled requests. When
> there are more than 16 complete downloads in the latter four lines,
> statistics are given about the client bandwidths in B/s, including
> minimum/maximum, deciles, quartiles, and median.
>
> 2. Cell statistics
> (buffer-stats file, --enable-buffer-stats, CellStatistics 1)
>
> written 2009-07-19 18:11:55 (1800 s)
> processed-cells 350,133,131,130,128,123,110,61,12,2
> queued-cells 1.61,0.15,0.11,0.31,0.06,0.38,0.12,0.00,0.00,0.00
> time-in-queue 3392,585,638,1562,348,1294,145,24,6,117
> number-of-circuits-per-share 45
>
> The buffer-stats file contains some statistics about the time that cells
> spend in circuit queues. processed-cells are the mean number of total
> processed cells per circuit, with circuits divided by 10 classes from
> loudest to quietest circuits. queued-cells describe the mean number of
> queued cells over time per circuit class. time-in-queue is the mean time
> in milliseconds that a cell spends in a queue.
> number-of-circuits-per-share is the number of circuits per circuit class.
>
> 3. Entry statistics
> (entry-stats file, --enable-entry-stats, EntryStatistics 1)
>
> written 2009-07-19 18:12:08
> started-at 2009-07-19 17:41:54
> ips
> de=72,us=72,fr=24,gb=24,it=24,ru=24,cn=16,ir=16,pl=16,??=8,ae=8,ar=8,at=8,az=8,be=8,bg=8,br=8,ca=8,ch=8,co=8,cz=8,es=8,fi=8,gr=8,hk=8,hu=8,id=8,ie=8,il=8,in=8,jp=8,kr=8,kw=8,mx=8,my=8,nl=8,ph=8,qa=8,ro=8,sa=8,se=8,sk=8,tr=8,ua=8,vn=8,ye=8
>
> The entry-stats file contains the number of connecting clients to an
> entry node per country and 24 hours. These numbers are contained in the
> ips line.
>
> 4. Exit port statistics
> (exit-stats file, --enable-exit-stats, ExitPortStatistics 1)
>
> written 2009-07-06 12:32:03 (86400 s)
> kibibytes-written
> 80=784877,443=184575,27619=528,38230=1079,46060=520055,53456=632231,63032=996797,other=13048442
> kibibytes-read
> 80=19296747,443=394341,27619=505020,38230=1029286,46060=67253,53456=112665,63032=64583,other=11429424
> streams-opened
> 80=792612,443=43324,27619=4,38230=4,46060=4,53456=4,63032=4,other=244212
>
> The exit-stats file contains the number of KiB and opened streams per
> exit port per 24 hours. The lines show tuples of the exit port number
> and the number of KiB or opened streams.
>
>
> If you have any questions regarding these measurements, or find a bug in
> the measurement code, please let me know, here or off-list!
>
> Thanks!
> --Karsten