[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Proposal 137: Keep controllers informed as Tor bootstraps



On Sun, Jun 15, 2008 at 01:41:24PM -0400, Nick Mathewson wrote:
> >   So in this case we send
> >   650 STATUS_CLIENT NOTICE/WARN BOOTSTRAP \
> >   PROGRESS=num TAG=string SUMMARY=string WARNING=string REASON=string
>
> Are the strings quoted?  They should be.

SUMMARY and WARNING are quoted. The rest aren't, because they can only
be a word, not arbitrary strings.

>  But if this is so, I think
> maybe REASON should be an identifier-like thing rather than a string:
> using an identifier will discourage us from making cosmetic
> compatibility-breaking improvements to reasons in the future.

Yes, REASON is an identifier, not an arbitrary string. It is one of the
same set of words that REASON can be in the ORCONN failure status events.

> Question: are bootstrapping events meant to be sent only during the
> initial setup phase, or might they appear later if Tor needs to
> bootstrap again?  In other words, should controllers expect to see
> only a monotonically increasing series of phase numbers, or should
> they be ready for a Tor to tell them, "I am no longer a bootstrapped
> Tor; I'm back at phase X"?

Currently they only appear at initial bootstrap, and they only increase.

We might change either of those in the future though. I haven't really
figured out an intuitive way to let the bootstrapping progress bar move
backward though; I've heard many complaints about Microsoft progress
bars that zip around so much that you can't make any guesses about how
much progress you've made.

> >   Phase 0:
> >   tag=starting summary="starting"
> > 
> >   Tor starts out in this phase. It doesn't actually send a status event
> >   to say so.
> 
> What's it for, then?  

It's basically just an internal state. We won't ever actually send the
event in practice, because there will never be a controller attached at
the point when the event happens.

In any case, now that I've added "getinfo status/bootstrap-phase", which
the controller can use to get up to speed on Tor's current situation when
it connects, we initialize to the Starting state and the controller can
learn about it with the getinfo.
http://archives.seul.org/or/cvs/Jun-2008/msg00438.html

> >   Tor will stay at this phase until it has successfully established
> >   a TCP connection with some directory mirror. Problems in this phase
> >   generally happen because Tor doesn't have a network connection, or
> >   because the local firewall is dropping SYN packets.
> > 
> >   Phase 10
> >   tag=handshake_dir summary="Finishing handshake with directory mirror"
> > 
> >   This event occurs when Tor establishes a TCP connection with a relay
> >   (or its https proxy if it's using one). Tor remains in this phase until
> >   the TLS handshake with the relay is finished.
> 
> Do you mean to say "relay" here?  I know all directories are relays,
> but it might be more reasonable to call the server in question a
> "directory".

Ah. This is a terminology question that goes broader than this proposal.
To me, a directory is a signed string. There are directory servers that
serve the directory.

In any case, here I really do mean relay, since we're connecting to
a relay's ORPort and doing a Tor handshake with it. We could imagine
having Tors that allow Tor handshakes but don't actually relay traffic;
but we don't have those currently so it seems a moot point.

(Bootstrapping phases for folks not using PreferTunneledDirConns are not
well-defined yet. It seems like an edge case that isn't urgent to tackle.)

>  [...]
> >   Phase 25:
> >   tag=loading_status summary="Loading networkstatus consensus"
> > 
> >   Once we've established a directory connection, we will start fetching
> >   the networkstatus consensus document. This could take a while; this
> >   phase is a good opportunity for using the "progress" keyword to indicate
> >   partial progress.
> > 
> >   This phase could stall if the directory mirror we picked doesn't
> >   have a copy of the networkstatus consensus so we have to ask another,
> >   or it does give us a copy but we don't find it valid.
> 
> If we fall back to another directory, do we go back to phase 5?  It
> would seem reasonable...

Not currently. I've been more treating the bootstrapping phases as "things
Tor managed to successfully do", rather than what Tor is actually up
to at the time. That way when it's been sitting in the same phase for
a while, you can start being concerned that Tor is having a hard time
doing whatever phase it's stuck on.

> We should also say what the scope of reason strings are.  Does a
> reason always mean the same thing with tag X as it does with tag Y?

I don't think we've thought that far ahead. Currently I think the answer
is yes, insofar as the tags are things like "NOROUTE" and that has pretty
general meaning. But I'd like to leave their exact meaning vague for now,
until we have a better sense of what we want to do with them.

> >   Currently Tor ignores the first nine bootstrap problem reports for
> >   a given phase, reports the tenth to the controller, and then ignores
> >   further problems at that phase. Hopefully this is a good balance between
> >   tolerating occasional errors and reporting serious problems quickly. (We
> >   will want to revisit this approach if there are many different 'reason'
> >   values being reported among the first ten problem reports, since in
> >   this case the controller will only hear one of them.)
> 
> This is ugly and fragile.  Instead of suppressing N failures and
> reporting the N+1th, we should report all the failures, perhaps with a
> count of how many have occurred so far or an indication of when we'll
> next retry.  We should also offer advice to controller developers
> about when they should report failures.  (Perhaps: "Wait for X
> failures in any phase before deciding the phase is hopeless."  Or
> perhaps: "After a failure, wait X seconds to see if Tor recovers and
> the phase advances before deciding the phase is stalled.  Wait Y
> seconds longer before deciding the phase is hopeless."  Or perhaps:
> "If the retry time on any failure is 'now' or under 30 seconds from
> now, Tor believes it can fix the problem itself.")

Actually, it turns out that it's more complex than this: while waiting
for 10 connect failures before complaining may be reasonable for a normal
relay, a bridge user may be totally screwed after just one failure,
e.g. if he only has one bridge. Similarly, a Tor client in a private
network may not have 10 different relays to try. So I think you're right
that we should report every potential problem, but I think we also need
to give the controller some hint about which problem we think is the
final straw and it's time to get the user involved.

Maybe we should report them all, and include a
RECOMMENDATION=WARN when we hit a threshold (and the threshold will vary
depending on Tor's internal config).

The problem with giving advice to controller operators like "blindly wait
X seconds and then you'll know there's a problem" is that Tor takes a
hugely different amount of time to bootstrap, and thus to encounter a
threshold of problems, on a tiny modem vs a university connection. 30
seconds might be enough to try only a few connections on a really crappy
line. Tor is the one that knows what's going on internally, so it would
seem that it should be the one to call the shots, right?

> >   Controllers should also have some mechanism to alert their user when
> >   bootstrapping problems are reported. Perhaps we should gather a set of
> >   help texts and the controller can send the user to the right anchor in a
> >   "bootstrapping problems" help page?
> 
> This brings up an interesting issue.  There are lots of pieces of
> translatable text that could reasonably used across different UIs.
> We'd prefer not to stick them in Tor, since Tor doesn't get
> internationalized.  But making each tool maintain their own lists is
> also a little sad, since it results in a bit much duplication of
> effort.  Thoughts?

No thoughts, other than that you're right that my Warning strings won't
get used much in practice. I had figured Vidalia would just grab them
and use them, but Matt needed to make his own Qt strings so the Vidalia
translators could translate them. So now the Warning strings are there to
help Tor debugging, and in case somebody wants to make a really trivial
controller and not have to worry about strings.

In the future it might be that the Warning strings are not simply the
output of strerror, but rather they speculate about what exact problem
your Tor is having. But even then, that should be resolved by making up
more Tags, not by trying to show a carefully crafted English string to a
user who only knows Italian. Should we just get rid of the Warning string?

--Roger