[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Proposal: Provide diffs between consensuses



Questions:

  - How old can a consensus be for us to try to fetch a diff.
    We could of course always try to fetch a diff, but how smart
    is it to let a dirserver know that we were last online 3.8 days
    ago?
  - Do we agree on using the ed patch format?
  - What to do when the client already has the latest consensus


Filename: xxx-consensus-diffs
Title: Provide diffs between consensuses
Version: $Revision$
Last-Modified: $Date$
Author: Peter Palfrader
Created: 13-Jun-2008
Status: Open

1. Overview.

  Tor clients and servers need a list of which relays are on the
  network.  This list, the consensus, is created by authorities
  hourly and clients fetch a copy of it, with some delay, hourly.

  This proposal suggests that clients download diffs of consensuses
  once they have a consensus instead of hourly downloading a full
  consensus.

2. Numbers

  After implementing proposal 138 which removes nodes that are not
  running from the list a consensus document is about 92 kilobytes
  in size after compression.

  The diff between two consecutive consensus, in ed format, is on
  average 13 kilobytes compressed.

3. Proposal

3.1 Clients

  If a client has a consensus that is recent enough it SHOULD
  try to download a diff to get the latest consensus rather than
  fetching a full one.

  [XXX: what is recent enough?
	time delta in hours / size of compressed diff
	 0	20
	 1	9650
	 2	17011
	 3	23150
	 4	29813
	 5	36079
	 6	39455
	 7	43903
	 8	48907
	 9	54549
	10	60057
	11	67810
	12	71171
	13	73863
	14	76048
	15	80031
	16	84686
	17	89862
	18	94760
	19	94868
	20	94223
	21	93921
	22	92144
	23	90228
	[ size of gzip compressed "diff -e" between the consensus on
	  2008-06-01-00:00:00 and the following consensuses that day.
	  Consensuses have been modified to exclude down routers per
	  proposal 138. ]

   Data suggests that for the first few hours diffs are very useful,
   saving about 60% for the first three hours, 30% for the first 10,
   and almost nothing once we are past 16 hours.
  ]

3.2 Servers

  Directory authorities and servers need to keep up to X [XXX: depends
  on how long clients try to download diffs per above] old consensus
  documents so they can build diffs.  They should offer a diff to the
  most recent consensus at the URL

  http://tor.noreply.org/tor/status-vote/current/consensus/diff/<HASH>/<FPRLIST>

  where hash is the full digest of the consensus the client currently
  has, and FPRLIST is a list of (abbreviated) fingerprints of
  authorities the client trusts.

  Servers will only return a consensus if more than half of the requested
  authorities have signed the document, otherwise a 404 error will be sent
  back.  The fingerprints can be shortened to a length of any multiple of
  two, using only the leftmost part of the encoded fingerprint.  Tor uses
  3 bytes (6 hex characters) of the fingerprint.  (This is just like the
  conditional consensus downloads that Tor supports starting with
  0.1.2.1-alpha.)

  If a server cannot offer a diff from the consensus identified by the
  hash but has a current consensus it MUST return the full consensus.

  [XXX: what should we do when the client already has the latest
  consensus?  I can think of the following options:
    - send back 3xx not modified
    - send back 200 ok and an empty diff
    - send back 404 nothing newer here.

    I currently lean towards the empty diff.]

4. Diff Format

  Diffs start with the token "network-status-diff-version" followed by a
  space and the version number, currently "1".

  If a document does not start with network-status-diff it is assumed
  to be a full consensus download and would therefore currently start
  with "network-status-version 3".

  Following the network-status-diff header line is a diff, or patch, in
  limited ed format.  We choose this format because it is easy to create
  and process with standard tools (patch, diff -e, ed).  This will help
  us in developing and testing this proposal and it should make future
  debugging easier.

  [ If at one point in the future we decide that the space benefits from
    a custom diff format outweighs these benefits we can always
    introduce a new diff format and offer it at for instance
    ../diff2/... ]

  We support the following ed commands, each on a line by itself:
   - "<n1>d"          Delete line n1
   - "<n1>,<n2>d"     Delete lines n1 through n2, including
   - "<n1>c"          Replace line n1 with the following block
   - "<n1>,<n2>c"     Replace lines n1 through n2, including, with the
                      following block.
   - "<n1>a"          Append the following block after line n1.
   - "a"              Append the following block after the current line.
   - "s/.//"          Remove the first character in the current line.

  Note that line numbers always apply to the file after all previous
  commands have already been applied.

  The "current line" is either the first line of the file, if this is
  the first command, the last line of a block we added in an append or
  change command, or the line immediate following a set of lines we just
  deleted (or the last line of the file if there are no lines after
  that).

  The replace and append command take blocks.  These blocks are simply
  appended to the diff after the line with the command.  A line with
  just a period (".") ends the block (and is not part of the lines
  to add).  Note that it is impossible to insert a line with just
  a single dot.  Recommended procedure is to insert a line with
  two dots, then remove the first character of that line using s/.//.