[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: URGENT: patch needed ASAP for authority bug

To: tor-relays@xxxxxxxxxxxxxx
Subject: Re: URGENT: patch needed ASAP for authority bug
From: Sebastian Hahn <mail@xxxxxxxxxxxxxxxxx>
Date: Thu, 15 Apr 2010 14:59:41 +0200
Cc: Mike Perry <mikeperry@xxxxxxxxxx>, Nick Mathewson <nickm@xxxxxxxxxxxxx>
Delivered-to: archiver@xxxxxxxx
Delivered-to: tor-relays-outgoing@xxxxxxxx
Delivered-to: tor-relays@xxxxxxxxxxxxxx
Delivery-date: Thu, 15 Apr 2010 08:59:52 -0400
In-reply-to: <201004151242.o3FCgku1022658@xxxxxxxxxxxxx>
References: <201004151242.o3FCgku1022658@xxxxxxxxxxxxx>
Reply-to: tor-relays@xxxxxxxxxxxxxx
Sender: owner-tor-relays@xxxxxxxxxxxxxx

Hi Scott,

no reason to panic currently. I've cc'ed Mike and Nick here, in casethey

can better explain what is going on.

On Apr 15, 2010, at 2:42 PM, Scott Bennett wrote:

I believe I spotted an authority bug with pretty severeconsequencesthis a.m. It is having seriously bad effect on the star heavyweightnodeof the tor network, Olaf Selke's blutmagie. I can't submit a PR foritdue to the flyspray web page's problems with letting me log in, andOlafwrote me that he's at work at the moment and can't submit a PR untilhe
gets home after work.  So please read on, and if someone would please
submit an urgent PR for this, we (and probably others) wouldappreciate it.If you do, please shoot a note off to Olaf <olaf.selke@xxxxxxxxxxxx>tolet him know about it, so he won't submit a duplicate PR. I don'tthinka fix for this one should wait for the next release. Instead,patches forboth "stable" and "alpha" branches should be made available toauthorityoperators as soon as someone can come up with them. (Only theauthoritiesneed to be fixed right away because the bug is somewhere in theauthority
code for generating consensus entries.)
    Here's what I found.  blutmagie's torrc is set up for a target
throughput rate of 18000 KB/s and a maximum burst rate of 24000 KB/s.
Olaf noticed that blutmagie was being swamped by a horrendous load of
incoming connections nearly all the time, so he tried using
MaxAdvertisedBandwidth to reduce the frequency of inbound connections.
He repeatedly lowered the maximum advertised rate, and blutmagie's
descriptor correctly reflects that, now showing a target rate of2000 KB/s,
but the connection rate showed no apparent change.  He recently began
reporting this trouble on OR-TALK, IIRC, but no one seemed to knowwhy thelimit on the advertised target rate, even when set so low comparedto theactual rate and also compared to the rates published by otherheavyweight
nodes, why the advertised rate didn't reduce the load.
    The problem lies in the consensus document, where it shows (or did
an hour or so ago),

w Bandwidth=27900
Note that 27900 KB/s is considerably higher than the maximum burstratein the descriptor and is 13.95 times the supposed maximum advertisedrate.
That means that, while old client versions that use the values in the
descriptors in their route selection process will probably honor themaximumadvertised rate of 2000 KB/s, newer clients use the rate in theconsensus,27900 KB/s, in theirs, thus continuing to drown blutmagie in anongoing
flood of incoming connections.
The authorities are currently disregarding the limit publishedin everynode's descriptor and instead are conjuring up their own numbers.This needs
to stop and right away.

The value in the consensus is not an actual bandwidth, but rather itis a

bandwidth weight, used by clients to do load balancing. This value is
automatically determined by directory authorities doing active
measurements of nodes capacity, to more evenly distribute the load.
Blutmagie, due to having huge capacity, gets a big share of the network

by having a lot of unused bandwidth. I have warned that this mightlead to

sad consequences, as available bandwidth is not the only factor to
determine how much traffic a node can handle, but rather there are other
things to take into account (number of circuits you need to establish,
higher memory requirements to service lots of connections compared to
only one connection that the bandwidth scanner uses, higher overhead
when more connections need to be handled).

Another side-effect is that limiting your bandwidth via MaxAdvertised*
options is no longer viable, because the active measurements are
affecting circuit building, not the passive advertised values. This has
bad consequences for everyone who tries to attract few clients, but
has lots of bandwidth (we're seeing the problem on a few vservers as
well).

I'm not sure what can be done about this, because measuring
bandwidth is easy and has led to dramatic speed increases in the
network for people running the 0.2.2.x versions (only those use the
bandwidth weights currently, afaik); whereas measuring a node's
capacity to deal with massive amounts of connections is not trivial.

Something that might or might not figure into this is that newly started

Tor clients do active speed tests, building test circuits for thefirst ~hour

and a half to find a good value for timing out slow circuits. These
additional circuits might explain a generally higher load on the relays,
but I'm not sure about this here.

So, to summarize: There is currently no bug in the authority code, they
are working as intended. I'm waiting for Mike's further input here to
see if we need or can do something about the trouble it seems to
create for blutmagie.

Sebastian

References:
- URGENT: patch needed ASAP for authority bug
  - From: Scott Bennett

Prev by Author: Re: descriptor published, but router missing from consensus
Next by Author: Re: Tor client performance (was Re: URGENT: patch needed ASAP for authority bug)
Previous by thread: URGENT: patch needed ASAP for authority bug
Next by thread: Re: URGENT: patch needed ASAP for authority bug
Index(es):
- Author
- Thread