[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Path-spec - fast circuits

     On Sat, 13 Feb 2010 11:18:33 -0500 Nick Mathewson <nickm@xxxxxxxxxxxxx>
>On Sat, Feb 13, 2010 at 5:33 AM, Scott Bennett <bennett@xxxxxxxxxx> wrote:
>> =A0 =A0 I've withheld comment on the above for a long time, mainly becaus=
>> I had intended to include it in a write-up that I still haven't found
>> the time to do, but I really think it cannot be avoided any longer.
>> I would greatly appreciate a justification for the presumption that
>> any process other than the tor node in question can possibly provide
>> a more accurate measurement of its data rate capacities. =A0Any other
>> process, *even on the same computer*--much less anywhere else, can only
>> measure the performance of the TCP connections between itself and the
>> tor node in question, whereas the tor node in question has a complete
>> picture of all of its simultaneous connections to all processes, wherever
>> they may exist around the planet.
>Right.  If I'm a Tor node, I have a better picture of my own actual
>usage than any other process anywhere in the network.
>But one big problem is that you have no guarantee whatsoever that I'm
>telling you the truth about my measurements.  See for example Kevin
>Bauer et al's "Low Resource Routing Attacks Against Tor."

     Yes, I've understood that from the outset, but I haven't seen any
evidence that such abuse is actually happening.
>As a hackish workaround, we had clipped the largest believable
>self-reported bandwidth, so that a hosstile or broken server couldn't
>trivially claim to have infinite capacity and attack or DOS the
>network.  But this meant that genuinely high-capacity nodes got

     That was also apparent and created an *actual* problem of wasted
>Neither of the above points is imaginary; Bauer et al demonstrated
>their attacks on planetlab, and the underutilized capacity really

     I disagree.  They showed by experiment how such a problem would
work, an exercise that struck me as kind of silly because it was clear
from theory that such an attack would have to work like that.  But IIRC
they did *not* show that such abuse was actually occurring in the wild,
and AFAIK, no one else has shown that either.  Meanwhile, the measurement-
from-afar method is creating *actual* misallocation of resources, so a
problem has been intentionally introduced *by the tor developers* to
counteract a "problem" that had *not* been shown to be occurring in the
actual global tor network and should therefore have been considered
imaginary pro tempore.
>(A smaller problem was that nodes were reporting their observed
>bandwidth _usage_, whereas clients really care about the expected
>performance of their circuits.)

     Now you've finally touched upon the worst problem of resource
allocation that existed prior to introduction of the measurement-from-afar
method.  The number being reported in descriptors is *past peak data rate
utilization*, which is being used for two incompatible purposes.  One of
those purposes is as grist for statistical analysis of tor network
performance by the developers and other interested parties.  The other use
is by client code as a proxy for data rate *capacity* for route selection.
Using historical (i.e., since the node was initialized) data rate usage
information, which is recorded in a manner useful for statical analysis,
as an approximation for capacity information can be a reasonable approach,
but it stumbles badly if it is updated quasi-periodically but not as
*successive* approximations that converge upon the true capacity.  What is
needed is to detect the actual capacity as closely as possible for the
benefit of clients, which requires that successive approximations converge
upon reality.  Reported usage will normally be less than or equal to
the actual limits on capacity, so it is harmful to ever report a value in a
descriptor that is less than any value in descriptors published previously
since the most recent initialization because a lesser value than previously
reported is a *less accurate* (i.e., diverging) approximation than previously
reported approximations.  What the client needs from the reported values,
then, is incompatible with the needs of researchers like the tor developers
and others engaged in statistical studies of tor network loading.
     There are other aspects of the current data reporting/collection that
are problematical from the perspective of sampling theory and time-series
analysis, both for network performance analysis and for capacity detection
for use by clients.  For example, the value reported is not a binned value,
but rather is the maximum value of the moving minimum value in a 10-sample-
wide moving window, where the base sampling rate at this level is 1 Hz.  For
the purpose of statistical analysis of network performance, such a value
gives a rather distorted picture of what is going on in the network.  For
client use, it is more helpful because it more closely approximates the
capacity than simple averages or totals, although it would likely wreak
havoc with any attempt that might someday be made for client recognition of
periodic network loads and compensation for such periodicity for the same
reasons that it distorts any statistical analysis.
     Another problem w.r.t. sampling theory for time-series analysis is that
the value reported in a descriptor covers the past 24 hours, but the
reporting interval is typically ~18 hours, giving a six-hour *overlap* in
the time periods represented by each published value.  Further, whenever the
reporting interval is less than 18 hours, that overlap is correspondingly
greater than six hours.  In any case, six hours' worth of measuring gets
representation *twice* in the values published in every consecutive pair of
     There are theoretical reasons to suspect that the tor network's total
data flow fluctuates on periodic bases.  Consider the likelihood of diurnal
fluctuations due to the fact that there are very few clients in the time
zones spanned by much the Pacific and Atlantic Oceans.  Consider the
likelihood of weekly fluctuations due to the business/work week portion of
each calendar week (five days vs. seven days).  There may well be some low-
amplitude fluctuations with other periods, e.g., monthly, seasonal, annual,
and so forth.  The diurnal and semanal periods seem likely to have the highest
amplitudes.  However, diurnal periodicity in the data flow through a node
cannot be determined because the sampling period via descriptor publication
is .75 day, which puts the period corresponding to the Nyquist frequency
(~.67 cycles/day) at 1.5 days.  In other words, any periodic fluctuations
that cycle more often than once every day and a half are too fast to be
detected and their amplitudes calculated.  Worse still is that fluctuations 
with periods shorter than 1.5 days will have their amplitudes aliased onto
the amplitudes of slower periodic fluctuations.  So although there are very
probably periodic changes in the tor network's loading, the ones most likely
to have the greatest impact are the ones we cannot measure in order to find
some way to compensate for them, and they will also corrupt analysis of
slower periodic changes in loading because the sampling rate (descriptor
updates) is too low.
     Although one can envision modification of the route selection procedure
to take advantage of load frequency and phase spectra for each node as they
may apply at the time the route selection takes place, any attempt to develop
means of compensating for amplitudes and phases of periodic network loads is
pointless until the data collection process has been thoroughly revamped.  If
this discussion proceeds far enough, I do have some initial thoughts on
remodeling the data collection process, but they are not yet well developed
and would benefit from a wider discussion first.
>Mike and others can probably talk more about the other issues here.
     That would be great.  I would certainly welcome a broader discussion of
this topic than has occurred here to date.  There are other, related issues
to address here, too, such as the use of "bandwidth" testing circuits upon
node initialization (and, apparently, under a few other circumstances, too)
that are longer than one hop, the number of circuits used in such testing,
the number of cells send through each such circuit as part of the test,
testing all of those circuits simultaneously, delays exceeding one hour in
getting a newly initialized node listed in the consensus, and so on.  I have
reasons, however, to consider each of these to be of lesser impact than the
issues discussed in earlier paragraphs.  The most urgent matter, I believe,
is to separate the dual uses of the peak observed data rate value published
in each descriptor, so that the conflicts engendered by conflating them can
be eliminated.

                                  Scott Bennett, Comm. ASMELG, CFIAG
* Internet:       bennett at cs.niu.edu                              *
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxxx with
unsubscribe or-talk    in the body. http://archives.seul.org/or/talk/