[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: draft proposal: download server descriptors on demand



On Mon, Jun 16, 2008 at 12:06:44AM +0200, Peter Palfrader wrote:
 [...]

(By request, added as 141.)

Good start on a proposal.  It would be good to get the directory
bandwidth down on this one.  Also, if clients don't need to download
so many descriptors, they will save lots of RAM formerly used to hold
routerinfo_t objects.

I've made some initial comments below.

 [...]
>   Furthermore the server descriptor also contains the exact version of
>   the Tor software that the server is running and some decisions are
>   made based on the server version number (for instance a Tor client
>   will only make conditional consensus requests [proposal from 13 Apr
>   2008 that never got a number] when talking to Tor servers version
>   0.2.1.1-alpha or later).

This part doesn't need to come from the descriptor; the consensus also
reports client versions.

 [...]
> 3. Doing away with the need for all SDs

I was confused by this heading until I realized that you meant, "Doing
away with the need to hold all SDs." rather than "Doing away with all
need for SDs."
 
> 3.1 Load balancing info in consensus documents
> 
>   One of the reasons why clients download all server descriptors is for
>   doing load proper load balancing as described in 2.1.  In order for
>   clients to not require all server descriptors this information will
>   have to move into the network status document.
> 
>   [XXX Two open questions here:
>    a) how do we arrive at a consensus weight?

Perhaps the vote could contain the node's bandwidth, and this could be
used to calculate the weights?  It's necessary that the consensus
remain a deterministic function of the votes.

>    b) how to represent weights in the consensus?
>       Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."

That would break backward compatibility.  Adding a new per-line
instead would probably be better.  We should play with representations
here till we wind up with something compressible, and we should figure
out the space impact of doing this.

> ]
>
> 3.2 Fetching descriptors on demand
> 
>   As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
>   and the onion key for a server.
> 
>   A client already knows the IP address and the ports from the consensus
>   documents, but without the onion key it will not be able to send
>   CREATE/EXTEND cells for that server.  Since the client needs the onion
>   key it needs the descriptor.
> 
>   If a client only downloaded a few descriptors in an observable manner
>   then that would leak which nodes it was going to use.
> 
>   This proposal suggests the following:
> 
>   1) when connecting to a guard node for which the client does not
>      yet have a cached descriptor it requests the descriptor it
>      expects by hash.  (The consensus document that the client holds
>      has a hash for the descriptor of this server.  We want exactly
>      that descriptor, not a different one.)
>
>      [XXX: How?  We could either come up with a new cell type,
>       RELAY_REQUEST_SD that takes only a hash (of the SD), or use
>       RELAY_BEGIN_DIR.  The former is probably smarter since we will
>       want to use it later on as well, and there we will require
>       padding.]

My first thought was that I'd prefer to avoid multiplying machinery
here.  When we design RELAY_REQUEST_SD, let's try to keep looking to
see whether we can add a padding argument to RELAY_BEGIN_DIR rather
than forcing a new relay cell type?

But now I think that for nodes that don't want to be full-on directory
mirrors, I think a separate mechanism here might be a good idea.

>      A client MAY cache the descriptor of the guard node so that it does
>      not need to request it every single time it contacts the guard.
> 
>   2) when a client wants to extend a circuit that currently ends in
>      server B to a new next server C, the client will send a
>      RELAY_REQUEST_SD cell to server B.  This cell contains in its
>      payload the hash of a server descriptor the client would like
>      to obtain (C's server descriptor).  The server sends back the
>      descriptor and the client can now form a valid EXTEND/CREATE cell
>      encrypted to C's onion key.
> 
>      Clients MUST NOT cache such descriptors.  If they did they might
>      leak that they already extended to that server at least once
>      before.
>
>   Replies to RELAY_REQUEST_SD requests need to be padded to some
>   constant upper limit in order to conceal a client's destination
>   from anybody who might be counting cells/bytes.
> 
>   [XXX: detailed spec of RELAY_REQUEST_SD cell and its reply]
>   [XXX: figure out a decent padding size]

Something else to figure out here is migration.  When the first cut of
this system is done, only new servers will support RELAY_REQUEST_SD.
This means that clients will still need to pre-download descriptors
under some circumstances.

In fact, the rules will be pretty weird here.  If extends are done by
first asking B for C's descriptor, then clients need to know whether B
supports RELAY_REQUEST_SD.  If it doesn't, they need to have C's
descriptor, which means they need to have downloaded it in advance.

In its final version, this proposal needs a migration plan.

> 3.3 Protocol versions
> 
>   [XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
>   information described in 2.3 above.  If we need it, it might have
>   to go into the consensus document.]

We don't use it much as-is, but the hope is for it to eventually take
the place of most calculations we currently do using version numbers.
On the bright side, adding it immediately after the version numbers
will cost us approximately nothing in the compressed document size.
 
>   [XXX: Similarly find out where we need the version number of a
>   remote tor server.  This information is in the consensus, but
>   maybe we use it in some place where having it signed by the
>   server in question is really important?]

I don't believe so.

> 3.4 Exit selection
> 
>   Currently finding an appropriate exit node for a user's request is
>   easy for a client because it has complete knowledge of all the exit
>   policies of all servers on the network.
> 
>   [XXX: I have no finished ideas here yet.
>     - if clients only rely on the current exit flag they will
>       a) never use servers for exit purposes that don't have it,
>       b) will have a hard time finding a suitable exit node for
>          their weird port that only a few servers allow.
>     - the authorities could create a new summary document that
>       lists all the exit policies and their nodes (by fingerprint).
>       I need to find out how large that document would be.
>     - can we make the "Exit" flag more useful?  can we come
>       up with some "standard policies" and have operators pick
>       one of the standards?

Generally, most policies should take the form of "Here are the ports I
allow.  Here are the addresses I disallow."  If we codify a few
port-sets, we might be in business.

> 4. Future possibilities
> 
>   This proposal still requires that all servers have the descriptors of
>   every other node in the network in order to answer RELAY_REQUEST_SD
>   cells.  These cells are sent when a circuit is extended from ending at
>   node B to a new node C.  In that case B would have to answer a
>   RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
>
>   In order to answer that request B obviously needs a copy of C's server
>   descriptor.  In the future we might amend RELAY_REQUEST_SD cells to
>   contain also the expected IP address and OR-port of the server C (the
>   client learns them from the network status document), so that B no
>   longer needs to know all the descriptors of the entire network but
>   instead can simply go and ask C for its descriptor before passing it
>   back to the client.

We might want to include this information in RELAY_REQUEST_SD anyway
now, so that when servers start supporting fetch-on-demand, clients
will already be sending them the info they need to do it.  I think it
should include an identity fingerprint digest too, so that B can open
an authenticated OR connection to C as needed.

(These issues also complicate any eventual p2p-Tor designs, if every
B needs to know every C's descriptor.  We'd also need to keep the
client cache and the server cache separate, so that it's not so easy
to probe about whether B already knows C.)

-- 
Nick