[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[or-cvs] r19328: {torflow} Document the new ratios speedracer kicks out. (torflow/trunk)
Author: mikeperry
Date: 2009-04-15 02:00:38 -0400 (Wed, 15 Apr 2009)
New Revision: 19328
Modified:
torflow/trunk/README.PerfMeasurements
Log:
Document the new ratios speedracer kicks out.
Modified: torflow/trunk/README.PerfMeasurements
===================================================================
--- torflow/trunk/README.PerfMeasurements 2009-04-15 05:55:56 UTC (rev 19327)
+++ torflow/trunk/README.PerfMeasurements 2009-04-15 06:00:38 UTC (rev 19328)
@@ -18,6 +18,7 @@
similar advertised capacity and then fetching the same URL over and over
again via 2-hop circuits consisting of nodes in that group.
+
A. Configuring SpeedRacer
At the time of this writing, it has the following configuration
@@ -47,6 +48,7 @@
convergence. Results will be saved after each multiple of 'save_every'
fetches. The incremental results are cumulative.
+
B. Running SpeedRacer
Like soat, speedracer should be given its own Tor that is not performing
@@ -71,6 +73,8 @@
SpeedRacer outputs a lot of statistics in aggregate form in
./NetworkScanners/data/speedraces/stats-<pct_start>-<pct_end>-<n>-<time>
+and
+./NetworkScanners/data/speedraces/ratios-<pct_start>-<pct_end>-<n>-<time>
pct_start and pct_end denote the range of the slice. N denotes the
number of fetches so far, and time is the timestamp of that run. The
@@ -80,18 +84,135 @@
The statistics stored with each node are indicated in the key at the top
of each stat file.
-For the purposes of speedracer, the interesting statistics are the EB
-stat and the BR stat. The EB stat is the average stream capacity we
-observe for a node, and the BR stat is the ratio of a node's advertised
-bandwidth to its average stream capacity.
+For the purposes of speedracer, the interesting statistics are actually
+in the ratio files. The stats files are more auxiliary in nature, describing
+failure and attempt counts.
+
+1. Ratio files
+
+The ratio files are the initial set of options created for consideration
+for reweighting nodes' advertised bandwidths. They contain a set of ratios
+that can be multiplied by an advertised bandwidth to produce a new value
+to be voted on by participating authorities for use in NS documents and
+client node selection. This means that faster and more reliable nodes have
+higher ratio values.
+
+They are described succinctly in the key for the file:
+
+Metatroller Ratio Statistics:
+ SR=Stream avg ratio AR=Advertised bw ratio BRR=Adv. bw avg ratio
+ CSR=Circ suspect ratio CFR=Circ Fail Ratio SSR=Stream suspect ratio
+ SFR=Stream fail ratio CC=Circuit Count SC=Stream Count
+ P=Percentile Rank U=Uptime (h)
+
+In detail:
+
+a. SR=Stream avg ratio
+
+This is the ratio of the node's observed average stream capacity to the
+average observed stream capacity for the entire slice. It is candidate
+#1 for reweighting, and may be the only one we eventually use. The ratio
+file itself is sorted by this number.
+
+b. AR=Advertised bw ratio
+
+This value is provided only for reference. It is the ratio of the
+advertised bandwidth of the router to the average advertised bandwidth
+of the slice.
+
+c. BRR=Adv. bw avg ratio
+
+This ratio is actually a ratio of ratios. First, the ratio of the node's
+observed stream capacity to its advertised bandwidth is taken. Then this
+function is averaged across all nodes, and each node is given a value
+that is the ratio of its observed bandwidth to stream capacity to the
+average for the slice.
+
+This was originally my first choice for ratio usage. I initially thought
+it would be ideal to use for penalizing nodes lying about their
+bandwidth. But upon reflection it seems to double-penalize these nodes:
+Nodes that lie will naturally attract more traffic than they can handle,
+which decreases their observed stream capacity proportionally. Taking
+the ratio of of this to their already inflated advertised bandwidth
+amount would double-count the discrepancy.
+
+d. CSR=Circ suspect ratio
+
+This value is the ratio of the node's circuit suspected failure rate
+to the average circuit success rate for the slice. A "suspected failure"
+is attributed to every member node currently present in a circuit at the
+time of failure, plus the next hop if an extend was in progress. Nodes
+beyond this position in the path are not blamed for the failure.
+
+This is my second choice for a reweighting ratio. However, we currently
+don't (and probably can't) differentiate between failures because of
+hibernation or shutdown versus actual connectivity issues.
+
+e. CFR=Circ Fail Ratio
+
+This value is similar to the Circ suspect ratio except that in TorFlow
+parlance, a "failure" is only counted against the extender and extendee
+nodes. Earlier nodes are not counted.
+
+This would be a nice ratio to use, except for the fact that it is
+probably more useful if we could get separate stats on extender vs
+extendee, but that is currently not supported. Also, given the relative
+frequency of timeout failures, and the fact that timeout failures can be
+caused by or contributed to by earlier hops in the circuit, we would
+probably want to treat those specially for this stat as well.
+
+f. SSR=Stream Suspect Ratio
+
+The stream suspect ratio is counted similarly to the circuit suspect
+ratio, except that stream suspects are only attributed to pre-exit nodes
+if the failure reason is one of "TIMEOUT", "INTERNAL", "TORPROTOCOL",
+or "DESTROY".
+
+g. SFR=Stream Fail ratio
+
+The stream fail ratio records only the success vs failure of exit nodes,
+as such non-exit nodes will never have a value for this stat.
+
+Both the stream stats are better dealt with by the SoaT exit scanner, as
+the values for these tend to be pretty binary: either the exit is able
+to make external connections or it isn't. In the few cases where they are
+not binary, usually the same reliability information is represented in the
+Circuit Suspect Ratio, and more accurately at that.
+
+
+Rationale and suggestions for usage:
+
+The ratios are computed relative to the average values of that stat for
+the slice as opposed to the network as a whole primarily because this
+will enable us to concurrently use Steven Murdoch's queuing theory
+load-optimum selection weighting in tandem with these reweighting
+ratios. His ratios are based on queuing theory effects of node
+selection on faster vs slower nodes for various network loads. If we
+both are correcting the load of individual nodes with respect to the
+network as a whole, we will end up over-compensating.
+
+My current thinking is that we should combine the circuit fail ratio
+and the stream weighting ratio linearly, with 50% of the ratio changes
+coming from each. There's no formal justification for this of course.
+Typically the circuit failure ratios are usually pretty close to 1 for
+most nodes except those with serious issues, so this primarily has the
+effect of dampening the change by the observed stream ratios, except
+in cases of really unreliable nodes.
+
+So the formula would be something like:
+ NewBandwidth = Advertised*(0.5*SR+0.5*CSR)
+
+
+2. Stats files
+
For ease of review, the nodes are sorted and printed in lists according
to a few different metrics. For speedracer, the most useful list is the
first one, but the others are useful for buildtimes, where these same
stat files are also available. The data being displayed is the same, it
is just reordered in each list. These lists are:
-1. Bandwidth Ratios
+a. Bandwidth Ratios
This list is sorted by the ratio of advertised bandwidth to average
stream capacity (the BR stat). Nodes at the top of this list advertise a
@@ -99,7 +220,7 @@
actually were seen to carry over streams used to fetch the URL (the EB
stat).
-2. Failed Counts
+b. Failed Counts
This list is less interesting for speedracer. In it, the nodes are
sorted by the sum of stream and circuit failures (SF and CF,
@@ -107,7 +228,7 @@
where as circuit failures are attributed to the extender and the
extendee at the time of failure.
-3. Suspected Counts
+c. Suspected Counts
This list is sorted by 'suspected' failure counts (SS and CS). Suspected
failure counts are attributed to each node that was a member of the
@@ -117,21 +238,21 @@
all nodes in the path, and as such do not show up in the 'failed'
counts for nodes.
-4. Fail Rates
+d. Fail Rates
This list is sorted by the rate of failures per hour of node uptime.
-5. Suspect Rates
+e. Suspect Rates
This list is sorted by the rate of suspected failures per hour of
node uptime.
-6. Failed Reasons
+f. Failed Reasons
This list groups nodes by their failure reason, and sorts the reasons by
most prevalent, and sorts the nodes within these lists.
-7. Suspect Reasons
+g. Suspect Reasons
This is the same as the failed reasons, except it is sorted by
'suspected' counts.
@@ -145,6 +266,7 @@
creating circuits over and over again through percentile slices of the
network, similar to speedracer.
+
A. Running Buildtimes
Buildtimes can actually be run concurrently with one of either