[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #3036 [Torperf]: Tweak Torperf's .mergedata format and make it the new default
#3036: Tweak Torperf's .mergedata format and make it the new default
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Torperf | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Changes (by karsten):
* cc: mikeperry, Sebastian, rransom, arma (added)
Comment:
I'm picking up this ticket again, because I learned a few days ago that we
were not archiving Torperf data correctly. Looks like we lost 2--4 months
of siv's data. Oops.
While looking into the archiving problem I decided to work on the new
Torperf data format which will be a lot easier to archive than the current
format. As a positive side effect, the new format will be much easier to
understand for non-core Torperf developers. I'm planning to archive only
the new format and not archive the current formats in the future. So, the
new format should contain all relevant information.
I realize that the Torperf rewrite won't happen anytime soon, so I'm going
to implement the new Torperf format in metrics-db. Torperf will still
generate the old formats, but metrics-db will convert the output to the
new format. Whenever the Torperf rewrite happens it can output the new
format itself.
The suggested new format is pretty much as described in this ticket. The
basic idea is that there is a single line per Torperf run which is
sufficient to learn about 1) the Tor and Torperf configuration, 2)
measurement results, and 3) additional information that might help explain
the results.
1. Configuration
- SOURCE: Configured name of the data source; required.
- FILESIZE: Configured file size in bytes; required.
- Other meta data describing the Tor or Torperf configuration, e.g.,
GUARD for a custom guard choice; optional.
2. Measurement results
- START: Time when the connection process starts; required.
- SOCKET: Time when the socket was created; required.
- CONNECT: Time when the socket was connected; required.
- NEGOTIATE: Time when SOCKS 5 authentication methods have been
negotiated; required.
- REQUEST: Time when the SOCKS request was sent; required.
- RESPONSE: Time when the SOCKS response was received; required.
- DATAREQUEST: Time when the HTTP request was written; required.
- DATARESPONSE: Time when the first response was received; required.
- DATACOMPLETE: Time when the payload was complete; required.
- WRITEBYTES: Total number of bytes written; required.
- READBYTES: Total number of bytes read; required.
- DIDTIMEOUT: 1 if the request timed out, 0 otherwise; optional.
- Other measurement results, e.g., START_RENDCIRC, GOT_INTROCIRC, etc.
for hidden-service measurements.
3. Additional information
- LAUNCH: Time when the circuit was launched; optional.
- USED_AT: Time when this circuit was used; optional.
- PATH: List of relays in the circuit, separated by commas; optional.
- BUILDTIMES: List of times when circuit hops were built, separated by
commas; optional.
- TIMEOUT: Circuit build timeout that the Tor client used when building
this circuit; optional.
- QUANTILE: Circuit build time quantile that the Tor client uses to
determine its circuit-build timeout; optional.
- CIRC_ID: Circuit identifier of the circuit used for this measurement;
optional.
- USED_BY: Stream identifier of the stream used for this measurement;
optional.
- Other fields containing additional information; optional.
Note that two pieces of information from the current .extradata files are
not included in the new Torperf data format:
- Build timeout details: The current .extradata files contain the full
BUILDTIMEOUT_SET events that were sent by Tor via its control port. They
are not part of the new format, because they mostly explain why Tor picked
a given circuit build timeout, where the timeout itself is already part of
the new format. In theory, it would be possible to include some details
of the last BUILDTIMEOUT_SET event that was received before a Torperf run
was finished and written to the .extradata file.
- Unused circuits: The .extradata files also contain information about
circuits that were not used by Torperf. There's hardly any relation to
the Torperf measurements, so they're left out. In theory, one could
include aggregate information about the number of failed circuits before a
Torperf run was finished and written to the .extradata file.
I understand that people may find the information that was left out here
important. I could also imagine that people find other information
important. We can't put all data that was generated while performing
Torperf measurements in this format. We'd end up adding Tor's debug logs
to the format. We should identify relevant information that is sufficient
for most analyses. For example, I can be convinced to add single fields
or aggregated data from the build timeout events or unused circuits. But
if someone wants to analyze a specific aspect of Tor's performance,
they'll need to keep Tor's logs or controller events in addition to the
new Torperf data format.
Please find siv's 5 MiB Torperf data in the new format attached to this
ticket as an example.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/3036#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs