[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: Proposal 158: Clients download consensus + microdescriptors
On Wed, Jan 21, 2009 at 12:46:04AM -0500, Nick Mathewson wrote:
> > Descriptor elements that are small and frequently changing should go
> > in the consensus itself, and descriptor elements that are small and
> > relatively static should go in the microdescriptor. If we ever end up
> > with descriptor elements that aren't small yet clients need to know
> > them, we'll need to resume considering some design like the one in
> > proposal 141.
>
> This is a good breakdown, and clarifies our motivation decently well.
>
> Does this mean that the ports should (assuming it's possible) get
> moved into the microdescriptor? I think exit policies are relatively
> stable.
Yes, we could move ports, exit policies, and the version line into the
microdescriptor.
We could also dump the descriptor digest and timestamp-of-descriptor
as well.
Both of these steps involve breaking the consensus for current clients,
though (you can't do them until all clients are using microdescriptors),
so I didn't discuss them in this proposal.
We might want to think at some point about teaching clients to handle
consensus status lines that are missing these entries. I think that
can be done in a separate (related) proposal. But we should still do
it reasonably soon, or that will be the reason why we can't dump these
elements from the consensus.
> [...]
> > 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
> [...]
> > It would be nice to have a more foolproof way to agree on what
> > microdescriptor hash each authority should vote for, so we can avoid
> > missing "m" lines. Just switching to a new consensus-method each time
> > we change the set of microdescriptor-elements won't help though, since
> > each authority will still have to decide what hash to vote for before
> > knowing what consensus-method will be used.
> >
> > Here's one way we could do it. Each vote / consensus includes
> > the microdescriptor-elements that were used to compute the hashes,
> > and also a preferred-microdescriptor-elements set. If an authority
> > has a consensus from the previous period, then it should use the
> > consensus preferred-microdescriptor-elements when computing its votes
> > for microdescriptor-elements and the appropriate hashes in the upcoming
> > period. (If it has no previous consensus, then it just writes its
> > own preferences in both lines.)
>
> Here's a way that recovers a little more gracefully from
> desynchronization. The vote could include two sets at most: the one
> you would like to use, and the one that was used in the most recent
> consensus you have. You include m-lines for both. If either set
> wins, your m-lines influence the consensus.
>
> (If your favorite set is the one that the last consensus lists, you
> wouldn't include duplicate m-lines.)
Fine with me. I think desync will be rare, so either of the approaches
should be fine. Whatever is easiest to code and maintain.
> > 3.2. Directory mirrors serve microdescriptors
> >
> > Directory mirrors should then read the microdescriptor-elements line
> > from the consensus, and learn how to answer requests. (Directory mirrors
> > continue to serve normal relay descriptors too, a) to serve old clients
> > and b) to be able to construct microdescriptors on the fly.)
>
> Advantages for "Authorities build microdescriptors":
> + We have more flexibility about what the microdescriptors can
> contain. For instance, they can't include the equivalent of the
> "p" lines in the current consensus format, even though those need
> to be calculated from exit policies, and are not simple copies.
> This is especially important if our goal is to shift stable info
> into the microdescriptors in order to keep consensuses small
> while making clients download descriptors less.
>
> That's 3 advantages for "Caches build", and only 1 for "Authorities
> build", but I think that the advantage of "authorities build" is much
> bigger. It lets us consider things like the exit-ports line, binary
> packing of onion keys [not actually a win, but the next thing could
> be], and so on. What do you think?
Agreed, I think that is better.
Can we just have authorities make microdescriptors available with the
same interface that mirrors make them available? I think so.
> > The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
> > http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
>
> This implies that unless the mirror knows the microdescriptors for
> every router in the last two or three consensuses, the client is out of
> luck. Thus, the mirror must have kept track of the fields listed for
> microdescriptors in all the live consensuses. So be it.
Yep. Mirrors will probably cache microdescriptors exactly as clients do.
If they don't change much, this will not be much of a burden.
Though the client isn't totally out of luck: we should do some 'retry'
schedule just like we do already for the case where the mirror doesn't
have the descriptor we want.
> > The format of a microdescriptor is the header line
> > "microdescriptor-header"
> > followed by each element (keyword and body), alphabetically. There's
> > no need to mention what hash it's for, since it's self-identifying:
> > you can hash the elements to learn this.
>
> We should mention that the header line is semantically important. If
> you see:
> microdescriptor-header foo bar
> foo X
> then you know that the base descriptor has no bar element, whereas if
> you see:
> microdescriptor-header foo
> foo X
> then you know nothing about the bar element.
I don't understand this part. It seems like you're trying to add a new
feature or something. What will clients care whether foo and bar are
both listed? Do authorities specify that the microdescriptor should list
"foo bar"? If so, it would seem that this header needs to be included
in what is hashed.
Also, you talk of elements in the "base descriptor". How does this play
with putting "p" lines (exit policies) into the microdescriptor, where
that isn't an element in the base descriptor?
Or is the suggestion just that when serving the microdescriptor, the
mirror will put the first keyword of each line of the microdescriptor
into the microdescriptor-header line? What does that buy us?
> What are clients supposed to do, btw, if they find that the
> microdescriptors that the authority lists do not contain some field
> they regard as essential? I assume the answer is, "This must never
> happen. Once a client version uses a field in microdescriptors, that
> field must be present in microdescriptors until all client versions
> requiring it are obsolete." Yes?
Sure.
> Otherwise clients that want that field need to fall back to descriptors.
Clients or relays falling back to needing a real descriptor is never an
option we want to accept. Otherwise we just lengthen the period of time
where every mirror must cache every descriptor.
> > The hash of the microdescriptor is simply the hash of the concatenated
> > elements -- not counting the header line or hypothetical footer line.
> > Unless you prefer that?
>
> Just the elements is fine.
See above.
> > Is there a reasonable way to version these things? We could say that
> > the microdescriptor-header line can contain arguments which clients
> > must ignore if they don't understand them. Any better ways?
>
> If we go with the authorities-build-microdescriptors idea, let's have
> them numbered like the consensus version.
"Numbered like" meaning we start with version 6?
> > When a client gets a new consensus, it looks to see if there are any
> > microdescriptors it needs to learn. If it needs to learn more than
> > some threshold of the microdescriptors (half?), it requests 'all',
> > else it requests only the missing ones.
>
> The client should estimate the typical compressed microdescriptor size
> (CM). Requesting another microdescriptor costs 41 bytes in the HTTP
> request. If the client wants N microdescriptors, and 41*N > CM, it
> should request all.
Most clients have higher download than upload, so the math isn't quite
this simple. But sure, that's a fine start. We can tweak it later if we
come up with a better way, without affecting anything else.
> > 3.3.1. Information leaks from clients
> >
> > If a client asks you for a set of microdescs, then you know she didn't
> > have them cached before. How much does that leak? What about when
> > we're all using our entry guards as directory guards, and we've seen
> > that user make a bunch of circuits already?
> >
> > Fetching "all" when you need at least half is a good first order fix,
> > but might not be all there is to it.
> >
> > Another future option would be to fetch some of the microdescriptors
> > anonymously (via a Tor circuit).
>
> Are these leaks worse than leaks from descriptor downloading? If so,
> how?
Well, they are different. In the descriptor case, having cached info from
two days ago means you have no valid descriptors. In the microdescriptor
case, cached info from two days ago probably gives you a lot of the
microdescriptors already.
I'm not sure if one is worse. It feels like the partitioning opportunities
will be greater when the lifetimes of the blobs we cache are much higher
(and have more variance).
> > 4. Transition and deployment
> >
> > Phase one, the directory authorities should start voting on
> > microdescriptors and microdescriptor elements, and putting them in the
> > consensus. This should happen during the 0.2.1.x series, and should
> > be relatively easy to do.
>
> As we discussed on IRC, I believe this should wait till 0.2.2.x.
> Getting the authorities onto newer versions is comparatively easy, and
> 0.2.1.x is in feature freeze now.
Sounds good.
--Roger