On 01 Jan (21:12:38), s7r wrote:
One of my relays (guard, not exit) started to report being overloaded since
once week ago for the first time in its life.
The consensus weight and advertised bandwidth are proper as per what they
should be, considering the relay's configuration. More than this, they have
not changed for years. So, I started to look at it more closely.
Apparently the overload is triggered at 5-6 days by flooding it with circuit
creation requests. All I can see in tor.log is:
[warn] Your computer is too slow to handle this many circuit creation
requests! Please consider using the MaxAdvertisedBandwidth config option or
choosing a more restricted exit policy. [68382 similar message(s) suppressed
in last 482700 seconds]
[warn] Your computer is too slow to handle this many circuit creation
requests! Please consider using the MaxAdvertisedBandwidth config option or
choosing a more restricted exit policy. [7882 similar message(s) suppressed
in last 60 seconds]
This message is logged like 4-5 or 6 time as 1 minute (60 sec) difference
between each warn entry.
After that, the relay is back to normal. So it feels like it is being probed
or something like this. CPU usage is at 65%, RAM is at under 45%, SSD no
problem, bandwidth no problem.
Very plausible theory, especially in the context of such "burst" of traffic,
we can rule out that all the sudden your relay has become facebook.onion
guard.
Metrics port says:
tor_relay_load_tcp_exhaustion_total 0
tor_relay_load_onionskins_total{type="tap",action="processed"} 52073
tor_relay_load_onionskins_total{type="tap",action="dropped"} 0
tor_relay_load_onionskins_total{type="fast",action="processed"} 0
tor_relay_load_onionskins_total{type="fast",action="dropped"} 0
tor_relay_load_onionskins_total{type="ntor",action="processed"} 8069522
tor_relay_load_onionskins_total{type="ntor",action="dropped"} 273275
So if we account the dropped ntor circuits with the processed ntor circuits
we end up with a reasonable % (it's >8 million vs <300k).
Yeah so this is ~3.38% drop so it immediately triggers the overload signal.
So the question here is: does the computed consensus weight of a relay
change if that relay keeps sending reports to directory authorities that it
is being overloaded? If yes, could this be triggered by an attacker, in
order to arbitrary decrease a relay's consensus weight even when it's not
really overloaded (to maybe increase the consensus weights of other
malicious relays that we don't know about)?
Correct, this is a possibility indeed. I'm not entirely certain that this is
the case at the moment as sbws (bandwidth authority software) might not be
downgrading the bandwidth weights just yet.
But regardless, the point is that it is where we are going to. But we have
control over this so now is a good time to notice these problems and act.
I'll try to get back to you asap after talking with the network team.
Also, as a side note, I think that if the dropped/processed ratio is not
over 15% or 20% a relay should not consider itself overloaded. Would this be
a good idea?
Plausible that it could be better idea! Unclear what an optimal percentage is
but, personally, I'm leaning towards that we need higher threshold so they are
not triggered in normal circumstances.
But I think if we raise this to 20% let say, it might not stop an attacker
from triggering it. It might just make it that it is a bit longer.