[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #25347 [Core Tor/Tor]: Tor keeps on trying the same overloaded guard over and over
#25347: Tor keeps on trying the same overloaded guard over and over
-------------------------------------------------+-------------------------
Reporter: teor | Owner: asn
Type: defect | Status:
| needs_revision
Priority: Medium | Milestone: Tor:
| 0.3.3.x-final
Component: Core Tor/Tor | Version: Tor:
| 0.3.0.6
Severity: Normal | Resolution:
Keywords: 031-backport, 032-backport, | Actual Points:
033-must, tor-guard, tor-client, tbb- |
usability-website, tbb-needs, |
033-triage-20180320, 033-included-20180320 |
Parent ID: #21969 | Points: 1
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by arma):
More thoughts while pondering this discussion:
(1) It is really surprising to me that s7r could have been experiencing
this bug (as currently described) for 7 hours. I think it was probably
some other bug if it was really that long. For example, one of the ones
where we mark things down and stop trying them, or one of the ones like
#21969. asn says he only looked at a tiny fraction of the logs of the 7
hours, so let's be careful jumping to conclusions about what bug (or
combination of bugs) he actually experienced.
(2) If a relay sends you a destroy cell with reason resourcelimit, it
means that the relay has so many create cells on its queue that it thinks
it won't get to this one in the next 1.75 seconds (MaxOnionQueueDelay
default). So that's some real overload right there -- especially since
even if you send another one and you don't get a destroy back, it means
you squeaked into the queue, but you still have all those other creates
ahead of you.
(2b) Do we have any reason to believe that the calculation in
have_room_for_onionskin() is at all accurate? That is, are we sometimes
sending this response when there are only 0.25 seconds worth of create
cells in our queue? Or are we sometimes not sending them even though there
are 5 seconds of cells queued?
(3) It would be nice to find a way for the dir auths to scale back the
consensus weights of relays that are overloaded like this. That is, it
would sure be swell if we could make this something that the dir auths
solve for all the users, not something that each user has to encounter and
then adapt to. But while I see why we want that, we should be realistic
and realize that we won't get it: the dir auths act on a time schedule of
hours, so they will catch perenially overloaded relays (say, relays that
genuinely have a wildly wrong weighting or are just simply broken), but
they won't be able to catch transient hotspots (including hotspots induced
by bad people).
(4): I think we really need to figure out how how often this happens in
practice. That means scanning relays over time. Now, it happens that
pastly's sbws might be able to collect this data for us. Also, the
torperfs and onionperfs of the world could have this data already too, if
they collect it. Do they? Noticing it in sbws has the slight added
advantage that if we can figure out how to use it in computing weights,
it's all ready to be used.
(5) I would be a fan of a feature where we track the destroy-resource-
limit responses we receive over time, and if there have been (say) 30
different seconds recently where we got at least one destroy-resource-
limit, and none of our attempts worked, we call the guard down. We
shouldn't call it down in response to just one hotspot though (e.g. "I
sent twenty create cells and I got twenty destroy-resource-limit
responses"), since they're correlated with each other, that is, if you got
one then it's not surprising that you'll get a second right after. And we
might want to retry a guard that we mark down this way sooner than the 30
-minute-later default from prop271.
(6) I agree with asn that making it too easy to force a client to rotate
guards is scary. The pool of 60-or-so guards from prop271 is a huge pool,
and the only way to use that design securely imo is to have it be the case
that some of those 60 guards are very hard to push clients away from.
(7) I agree with Mike that the confirmation attack ("send a bunch of
create cells to each guard one at a time and see when your target onion
service stops responding") is worrisome. But would a bandwidth congestion
attack work there too? I guess it would be more expensive to pull off with
the same level of reliability.
(7b) In a two guard design, I wonder if we should be even more reluctant
to abandon a guard due to a transient problem like this. After all, if we
do abandon one, we're increasing our surface area past two. And if we
don't, in theory we still have one that's working.
(8) Remember CREATE_FAST? If your guard is otherwise fine but it's too
busy to process your create cell... and you were about to do something
foolish to your anonymity like move to another guard or go offline in
response... hm. :)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25347#comment:34>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs