[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #25347 [Core Tor/Tor]: Tor stops building circuits, and doesn't start when it has enough directory information
#25347: Tor stops building circuits, and doesn't start when it has enough directory
information
-------------------------------------------------+-------------------------
Reporter: teor | Owner: asn
Type: defect | Status:
| needs_revision
Priority: Medium | Milestone: Tor:
| 0.3.3.x-final
Component: Core Tor/Tor | Version: Tor:
| 0.3.0.6
Severity: Normal | Resolution:
Keywords: 031-backport, 032-backport, | Actual Points:
033-must, tor-guard, tor-client, tbb- |
usability-website, tbb-needs, |
033-triage-20180320, 033-included-20180320 |
Parent ID: #21969 | Points: 1
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by s7r):
Replying to [comment:10 asn]:
> Looking at your logs, it seems like your guard rejected about 230 new
circuit creations in 15 minutes with the excuse of `RESOURCELIMIT`. And
your client just kept making more and more circuits to the same guard that
were getting rejected... I've also noticed this exact same behavior on a
client of mine recently.
>
I see that as well, but this happens more often and Tor has no problems in
switching to guard 2/3 or even guard 3/3 to maintain functionality. This
time (it happens rarely) it completely remained in this useless state.
> My theory on why `RESOURCELIMIT` was used by your guard (given that you
say that DoS patch was disabled) is that `assign_onionskin_to_cpuworker()`
failed because `onion_pending_add()` failed because
`have_room_for_onionskin()` failed. That means that the relay was
overworked and had way too many cells to process at that time.
Unfortunately, I can't see whether you are sending NTOR or TAP cells given
your logs.
>
I know for sure the DoS patch is not related because I triple checked all
3 primary guards and not even one of them was running a Tor version that
includes the DoS patch we merged. I think I was using only NTOR cells,
because I was only trying to reach check.tpo and duckduckgo clearnet
websites.
> Like you said, I think the most obvious misbehavior here is that you
keep on hassling your guard even tho it's telling you to relax by sending
your `RESOURCELIMIT` `DESTROY` cells. Perhaps one approach here would be
to choose a different guard after a guard has sent us `RESOURCELIMIT`
cells, in an attempt to unclog the guard and to get better service.
'''Let's think about this some more:'''
>
> What's the best behavior here? Should we mark the guard as down after
receiving a single `RESOURCELIMIT` cell, or should we hassle the guard a
bit before giving up?
>
This is the most important part we need to take care of. I dislike the
idea to remove the guard after receiving a single `RESOURCELIMIT` cell. At
least we should retry it after some time using the exponential backoff
exactly as we do when one of our primary guards is not running or not
listed, and maintain the same logic, timing and behavior so we don't have
to maintain more branches.
> Most importantly, can we make sure that the `DESTROY` cell came from the
guard and not from some other node in the path? If we can make sure that
the `DESTROY` cell came from the guard, this seem to me like a pretty safe
countermeasure since we should trust the guard to tell us whether it's
overworked or not.
>
As I can understand from arma's comment the `DESTROY` cell can only come
from the guard.
> WRT timeline here, I think working on this countermeasure (mark guard as
down when overworked to get better service) seems like a plausible goal
for 033, but anything more involved will probably need to wait for 034.
>
> Would appreciate feedback from Nick or Tim here :)
>
> ----
>
> I still can't explain why you managed to bootstrap after hacking your
state file tho. Perhaps a coincidence? Perhaps you were overworking your
guard and when you stopped, it relaxed? Perhaps the hack worked
differently than you imagine? Not sure.
I sincerely hope so. But it makes me think: for many hours the guard is
overworked, and when I delete my state file and restart and edit again the
new state file putting back all the previous 3/3 primary guards that were
not allowing me to connect, it just connects fine. I don't have any
evidence that there was something wrong with the state file, and I don't
see what could be wrong with it, it does not make any sense. It is very
hard to reproduce / catch this bug in the wild.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25347#comment:23>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs