[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Understanding bwauth data in Stem?

Hash: SHA1

On 06/12/14 00:26, Anna Kornfeld Simpson wrote:
> Thanks all for the responses!
> On Fri, Nov 21, 2014 at 4:53 PM, Sebastian Hahn
> <sebastian@xxxxxxxxxxxxxx> wrote:
>> Hi there,
>> On 21 Nov 2014, at 23:44, Damian Johnson <atagar@xxxxxxxxxxxxxx>
>> wrote:
>>>> In other words, if I sorted the descriptors by "measured"
>>>> value, what
>> would
>>>> that order mean?
>>> I *think* that would be the ordering of 'relays who receive the
>>> most tor client traffic due to having a more highly weighted
>>> heuristic for relay selection'.
>> that would be accurate, is my understanding
> Is there documentation of why this "heuristic for relay selection"
> does not correlate that well with "bandwidth" in the descriptor?
> I've attached a couple of scatter plots pulled from moria1's
> "measured" and "bandwidth" values for each descriptor a couple
> hours ago (and the plots look similar from the other bwauths).  One
> shows all values, the other shows the bottom 75% of values (sorted
> by measurements), and neither shows as much of a correlation as I
> would expect.  Are there factors other than bandwidth that 
> contribute to this "heuristic for relay selection"?

Hi Anna,

I don't have answers, but maybe ideas for further investigations:

 - Not sure if this was mentioned before, but did you take a look at
the spec?

 - Maybe try removing bandwidth values close to 10000, or just values
exactly at 10000.  IIRC, values are capped at that value.  (Removing
just those values may be more accurate than removing the top 25%.)

 - Very small bandwidth values might be the result from newly started
or restarted relays.  (Advertised) bandwidth values are "the volume of
traffic, both incoming and outgoing, that a relay is willing to
sustain, as configured by the operator and claimed to be observed from
recent data transfers."  If a relay didn't observe larger data
transfers, the reported bandwidth value will be small, but still the
(past) measurements might be large.  Maybe compare this for single
relays over time.

 - There's an interesting pattern at 1024 (?) kB/s.  Maybe there are
more at 512 kB/s and others.  Can you reduce the amount of
overplotting in the graph?  In R/ggplot2, you'd set the "alpha" value
to something smaller than 1, so that dots become somewhat transparent.
 Could be that these patterns are normal, because operators tend to
pick certain bandwidth rates more often than others.

All the best,

Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org

tor-dev mailing list