[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #3036 [Torperf]: Tweak Torperf's .mergedata format and make it the new default
#3036: Tweak Torperf's .mergedata format and make it the new default
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Torperf | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Right now, we have three Torperf data formats: the .data files containing
the output of trivsocks-client.c, the .extradata files containing the
output of the Python script attached to Tor's control port, and the
.mergedata files containing the consolidation of the two formats.
I'd like to tweak the .mergedata format to make it easier to process, and
I want to make it the new default Torperf output format.
Here's what I'd like to change:
- Every data point in the new .mergedata format should contain the meta
data that is necessary to generate Torperf graphs. This meta data
contains the file size, the source (moria, siv, ferrinii, etc.), and
possibly a custom guard choice and/or custom circuit build timeout. I
could imagine adding these meta data as `FILESIZE=51200, SOURCE=ferrinii,
GUARDS=slowratio, CBT=75`.
One motivation for this change is to remove the dependency from the
filename, which is how we currently encode these meta data, e.g.,
`slowratio75cbt-50kb.mergedata`.
Also, I'd like to be able to concatenate multiple Torperf files and have
a single file for a) the standard Torperf runs of a given month and b) the
Torperf runs from a given experiment. This makes it easier for people to
download and process our Torperf data.
- We should combine the SEC and USEC fields and simply write timestamps
as floats with a precision of, say, two decimal places, like we do in
`LAUNCH=1302523261.18`. For example, `STARTSEC=1302523501
STARTUSEC=703442` would become `START=1302523501.70`. This saves a lot of
bytes and maybe even a few CPU cycles when parsing the single fields of a
data point.
- When measuring hidden service performance as in #1944, we should add
custom fields for the various hidden service substeps, e.g.,
`START_RENDCIRC`, `GOT_INTROCIRC`, etc.
What do you think? Do these changes make sense? If so, here are the next
steps:
- The first step in this endeavor is to wait for the results of #2687
where we try to implement an efficient .mergedata parser in R.
- The next step would be to change `consolidate_stats.py` to add the new
meta data fields and combine SEC and USEC fields for us.
- As soon as we have the new .mergedata format, I'll update metrics-db to
aggregate the various Torperf files and prepare them for the metrics
website. I'll also update metrics-web to parse the .mergedata format
instead of the .data format. And of course, I'll update the
[https://metrics.torproject.org/papers/data-2011-03-14.pdf Overview of
Statistical Data in the Tor Network] to describe the new format.
- Once we start working on #2565, we might want to dump the .data and
.extradata formats entirely and have Torperf only output the .mergedata
format.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/3036>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs