[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: draft proposal: download server descriptors on demand



Hi Christian,

Thanks for keeping this discussion moving.

On Tue, Nov 11, 2008 at 10:35:44PM +0100, Christian Fromme wrote:
> So, to sum this up:
> 
> 1) Everything a client needs to build a circuit will be found in the new
>    client consensus, except (maybe) the onion keys

Yep. Going forward, the more general version of this is
  "Everything a client needs to build a circuit should either be in the
  consensus networkstatus, or in a small per-relay document called a
  microdescriptor."

Currently the only thing we expect for the microdescriptor is the onion
key, but down the road we might imagine putting other things in it if
they're too big to put in the consensus.

> 2) The onion keys will be available to the client in one of the following
>    ways:
>    a) Part of the concensus, hoping that sizes won't increase too badly 
>       over time. With consensus diffs in mind this means for a client to
>       fetch about 200KB (I think Roger came up with that number on IRC)
>       once and from then on only the diffs, which are naturally less
>       + No change in circuit-building
>       + No new relay cells
>       + Easier to migrate
>       - Mightn't scale well in the future

According to weasel's numbers, the compressed consensus size would change
from 90KB to 241KB if we add onion keys directly.

Here's how I would organize our options:

1) Add the onion key to the consensus, same as option 2a above.
   + No change in circuit-building
   + No new relay cells
   + Easy to migrate
   - Doesn't scale so well (the size of the consensus grows with number
     of relays).
   - consensus diffs get a lot bigger, since whenever a relay gets
     added to the consensus (e.g. if we mark it Running again) we add
     in its whole onion key.
   - if we find a second big thing we want clients to also know, we have
     to either add that to the consensus too (which could double its
     size again) or make clients start fetching descriptors again or
     move to one of these other plans.

2) Put a hash of the microdescriptor in the consensus, and have clients
   fetch and cache microdescriptors preemptively from dir mirrors
   when starting up, like they currently fetch descriptors. (Dir
   mirrors fetch them from the authorities and cache them.) This means
   a compressed consensus of about 100KB, and a microdescriptor set of
   about 128*1500=200KB. After bootstrapping, clients only need to fetch
   the microdescriptors that have changed. If they have only an onion key
   in them, they will change once a week -- so after bootstrapping clients
   spend very little bandwidth to maintain the microdescriptor cache.
   + No change in circuit-building
   + No new relay cells
   + Still pretty easy to migrate
   + Consensus diffs won't be overly huge
   - Bootstrapping cost still somewhat high, and goes up as the network
     grows.

3) Put a hash of the microdescriptor in the consensus, and have clients
   fetch a copy of the next microdescriptor on every circuit-extend
   operation. So long as the microdescriptors stay smallish, they fit
   in oneish cell, and we don't have to worry about that awful "router
   descriptor padding" question.
   + Scales well for the client if network sizes grow.
   + Also scales well if the microdescriptor size grows. If we discover
     other big things we want the clients to know, this approach becomes
     increasingly appealing.
   - We have to modify circuit-building at the client side, since we're
     adding more steps.
   - We either add an extra round-trip of latency (if we use a separate
     RELAY_FETCH_MDESC cell to fetch it), or we do the even more complex
     approach of a hybrid CREATE_AND_FETCH_MDESC design.
   - We now force all relays to know microdescriptors for all relays,
     which may hurt potential future plans to get away from a clique
     topology.

Now, note in all these variations that there's still extra stuff
in the consensus that clients don't need: we could take out the
hash-of-descriptor and the timestamp. Since these items change
daily-per-relay, consensus diffs will get way smaller. (Nobody's done
the math yet on how much smaller but I'm optimistic.) For the sake of
migration, we shouldn't drop them until 0.2.1.x is obsolete. (Otherwise
mirrors can't fetch them for old clients, and old clients can't know
which ones to fetch.) (Sites like the torstatus page can still fetch all
the original signed server descriptors, if we make the votes available
that list the hash-of-descriptor that each authority voted on.)

I think which choice we take depends on the properties of the
microdescriptor:
  Option 1 is the best choice if it stays small and changes often (at
    least daily), since caching it separately from the consensus doesn't
    save us much bandwidth.
  Option 2 is the best choice if it stays small and changes seldom.
  Option 3 is the best choice if it grows large.

For the current situation -- onion keys that rotate weekly -- I think
option 2 is the winner. But if we later add something that changes often,
then option 2 is going to feel like a hassle compared to the simpler
and just as efficient option 1. And if we ever add a lot more bytes,
then we're going to want to move to option 3.

Does that mean we should go with option 2, and keep this discussion in
mind next time we want to add more items to the consensus?

While I'm at it, what's a good way to version the microdescriptor? We
could just stick a version byte at the front, but what does a client do
if it encounters a version it doesn't recognize? About the only plan
I can come up with is to put a version byte in front, and if we want
to add things just add them, and clients ignore stuff beyond what they
know to look for in a given version. If we need to change the version,
we teach clients about both while still using the old one, and once
all the clients have upgraded we switch to the new one (at which point
clients can then forget how to understand the old one). Slow process.

Another point to ponder: nickm was talking about having relays generate
their own microdescriptors and sign them, and then maybe saving space
by not including the signature when clients fetch it. I think the
microdescriptor should be a straight transform from the 'real' descriptor,
and so it doesn't need any signature or timestamp. Relays don't even
need to know that they exist, except as opaque blobs that they mirror.

--Roger