[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-relays] Your system clock just jumped

Gordon Morehouse wrote:
> Jobiwan Kenobi:
>> I've been running a relay for about  months now. It runs on an 1.6 

That should have been: about 3 months.

>> Ghz single core Atom with hyperthreading, 1GB of RAM. It's on my 
>> home connection; I advertise only 176 KB/sec.
>> Under normal conditions, it pumps around 100KB/sec, it has around 
>> 600 connections, it uses around 20% CPU. Everything looks very 
>> healthy.
>> Lately I have seen a couple of incidents where the number of 
>> connections suddenly goes up to over 3000, traffic increases 
>> heavily, CPU usage goes well over 150% (out of possible 200). 
>> Traffic can go up to between 500 and 1000 KB/sec for long periods 
>> of time. Sometimes it seems that my relay just can't take it 
>> anymore. In the log, the ratio of TAP handshakes goes wild, and I 
>> get clock jump warnings. My clock does not jump. This is Tor 
>> hanging while allocating memory.
> Welcome to the world of the Raspberry Pi / BeagleBone / CubieBoard
> operator, except normally we'd have crashed (without some defenseive
> measures) before the clock jump thing - the Pi in particular has a
> known dodgy "clock."
>> Today I had a really bad episode, where the box started thrashing. 
>> When it became responsive again, Tor was left in a state where it 
>> was constantly downloading about 400KB/sec more than it was 
>> uploading. Normally I have a little bit more up than down, because 
>> I'm a directory server as well. I can not explain having a lot
>> more down than up. I can fantasize that those hangs/clock jumps 
>> ("assuming established circuits no longer work") could leave
>> 'half' circuits.
> As best I can tell, probably that's a flood of incoming TAP requests
> or TLS handshakes.
>> Finally I restarted my relay, (which I really don't like to do)
>> and after a while it stabilized. At this point, my router shows a
>> peak of almost 8000 NAT sessions.
>> Is this normal behavior of the network (esp. the sudden increase
>> in connections) or is this another kind of attack/probe like what 
>> we've seen in early September? Is this because this machine is
>> just too underpowered? Should I collect/provide any diagnostics?
>> Have others seen similar events?
> Have a look at the Raspberry Pi threads and search for "circuit
> creation storms."  I'm slowly developing a set of defensive iptables
> rules for low-power relays which you might want to have a look at, but
> as your machine is far more capable than a Pi, you'll need to adjust
> accordingly (and then, I hope, contribute back!)
> https://github.com/gordon-morehouse/cipollini/tree/master/contrib/90_slowboards/tcp_syn_limit
> (Ignore the fail2ban stuff for now, I found a more efficient way to
> handle the problem with the help of a list reader.)

Thanks Gordon, 

I'm not sure I can get iptables up and working on this box. It is 
more of an appliance.  (Tho I did get a build environment on it to 
build Tor.)

Throttling incoming connections is probably not the answer in 
this case, as it can still build up to a large number. Throttling 
handshakes might, but that can't be done on network level. 

Anyway, this 'attack' (if that's what it is .. millions of TAP 
handshakes per hour) doesn't kill my relay, but after those clock 
jump messages, it is left in a state where it downloads waaay more 
data that it uploads. As if the circuits it assumed to be no longer 
working are still sending data that doesn't get relayed. 

If this is the case, would it be possible to detect this and either 
block those circuits, close those connections, or make no 
assumptions in the first place? 

Another one of these is going right now as I write. 

When I set my bandwidth rate to 256 KB, download fills that up 
while upload is at only 150 KB or so. When I set it to 1 MB, 
download fills that up while upload stays at roughly 150 KB. 
CPU is well over 100%.

Normally I have the rate at 10 Mbit, a bit less than my actual 
bandwidth, but advertise much less. When I don't have 3000+ 
connections, sometimes I see it do high volume for long times with 
relatively low CPU load. 

This time I'm not going to relaunch it but let it recover on its 
own. With a lowered bandwidth rate since most of it's going into a 
sink hole anyway.


tor-relays mailing list