[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Glider: Automatic updates for Tor bundles



This one is in the repository now, at updater/trunk/spec/glider-spec.txt.

Please check it out and let me know what you think.  It build on ideas
from earlier specs from Jake and Matt.

				- = -

             Glider: Automatic updates for Tor bundles

0. Preliminaries

0.0. Scope

   This document describes a system for distributing Tor binary bundle
   updates.

0.1. Proposed code name

   Since "auto-update" is so generic, I've been thinking about going with
   "glider", based on the sugar glider you get when you search for "handy
   pocket creature".  I haven't yet done a search to find out whether
   somebody else is using the name, so we shouldn't get too attached to it
   before we see if it's taken.

0.2. Non-goals

   This is not meant to replace any existing download mechanism for
   users who prefer that mechanism.  For example, just downloading
   source will still work fine.

   Similarly, we're not trying to force users who do not want to use
   downloaded binaries to use them, or to force users who do not want
   automatic updates to get them.  {This should be obvious, but enough
   people have asked that I'm putting it in the document.}

   This is not a general-purpose package manager like yum or apt: it
   assumes that users will want to have one or more of a set of
   "bundles", not an arbitrary selection of packages dependant on one
   another.  (Rationale: these systems do what they do pretty well.)

   This is also not a general-purpose package format.  It assumes the
   existence of an external package format that can handle install,
   update, remove, and version query.

0.3. Goals

   Once Tor was a single executable that you could just run.  Then it
   required Privoxy.  Now, thanks to the Tor Browser Bundle and related
   projects, a full installation can contain Tor, Privoxy, Torbutton,
   Firefox, and more.

   We need to keep this software updated.  When we make security fixes,
   quick uptake helps narrow the window in which attackers can exploit
   them.

   We need updates to be easy.  Each additional step a user must take to
   get updated means that more users will stay with older insecure
   versions.

   We need updates to be secure.  We're supposed to be good at crypto;
   let's act like it.  There is no good reason in this day and age to
   subject users to rollback attacks or unsigned packages or whatever.

   We need administration to be simple.  Tor doesn't have a release
   engineering team, so we can't add too many hard steps to putting out
   a new release.

   The system should be easy to implement; we may need to do multiple
   implementations on the client side at least.

0.3.1. Goals for package formats and PKIs

   It should be possible to mirror a repository using only rsync and
   cron.

   Separate keys should be used for different people and different
   roles.

   Only a minimal set of keys should have to be kept online to keep
   the system running.

   The system should handle any single computer or system or person
   being unavailable.

   The formats and protocols should be pretty future-proof.

1. System overview

   The basic unit of updatability is a "bundle".  A bundle is a set of
   software components, or "packages", plus some rules about installing
   them.  Example bundles could be "Tor Browser, stable series" or
   "Basic Tor, development series".

   When Glider has responsibility for keeping a bundle up to date, we
   say that a user has "subscribed" to that bundle.

   Conceptually, there are four parts to keeping a bundle up to date:

      Polling:
        - Periodically, Glider asks a mirror whether there is a newer
          version of some bundle that a user has subscribed to.  If so,
          Glider determines what's in the bundle.

      Fetching:
        - If the bundle contains packages that Glider hasn't installed
          or hasn't cached, it needs to download them from a mirror.
          This can happen over any protocol; v1 should support at least
          http and https-over-Tor.  V1 should also support resuming
          partial downloads, since many users have unreliable
          connections.

          Later versions could support Bittorrent, or whatever.

      Validation:
        - Throughout the process, Glider must ensure that all the
          bundles are signed correctly, all the packages are signed
          correctly, and everything is up-to-date.

          We want to specify this so that users can't be tricked about
          the contents of a bundle, can't install a malicious package,
          and can't be fooled into believing that an old bundle is
          actually the latest.

      Installation:
        - Now Glider has a set of packages to install.  The format of
          these packages will be platform-dependent: they could be pkg
          files on OSX, MSI files on Win32, RPMs or DEBs on Linux, and
          so on.  Glider should query the user for permission to start
          installing packages, then install the packages.  (All other
          steps should generally happen automatically, in the
          background, without needing user intervention.)

1.1. The repository

   Each Glider instance knows about one or more "repositories".  A
   repository is a filesystem somewhere that contains the packages in a
   set of bundles, and some associated metadata.  A repository must
   exist at one or more canonical hosts, and may have a number of full
   or partial mirrors.

   In v1, each Glider instance will know about only one repository.

1.2. The PKI

   The trust root for the whole system is, necessarily, whatever users
   download when they first download a copy of Glider.  We need to make
   sure that the first download happens from a site we trust, using
   HTTPS.

   Glider ships with root keys, which in turn are used to verify the
   keys for all the other roles.  There are a few root keys, operated by
   trusted admins for the system.  If root keys ever need to be changed,
   we can just ship an update of Glider: it's supposed to be
   self-updating anyway.

   The root keys are only used to sign a 'key list' of all the other
   keys and their roles.  A key list is valid if it has been signed by a
   threshold of root keys.

   Each package is signed with the key of its authorized builder.  For
   example, one volunteer may be authorized to build the mac versions of
   several packages, and another may be authorized to build the windows
   version of just one.

   Each bundle is signed with the key of its maintainer.  It's assumed
   that the bundle maintainer might be the package maintainer for some
   but not all of the packages.

   The list of mirrors is also signed.  If the mirror list is
   automatically updated, this key must be kept online; otherwise, it
   can be offline.

   To prevent an adversary from replaying an out-of-date signed
   document, an automated process periodically signs a timestamped
   statement containing the hashes of the mirror list, the latest
   bundles, and the key list, using yet another special-purpose key.
   This key must be kept online.

1.3. Threat Model And Analysis

   We assume an adversary who can operate compromised mirrors, and who
   can possibly compromise the main repository.  At worst, such an
   adversary can DOS users in a way that they can detect.

   We're assuming for the moment an OSX/Win32-like execution model,
   where all packages will run equal privilege, but occasionally
   installation will require higher privilege.  This means that once a
   hostile package is installed, it can basically do whatever it
   wants.  As rootkit writers demonstrate, compromise is really
   tenuous: any attacker who can induce a user to install a hostile
   piece of code has, in effect, permanently compromised that user
   until they reinstall.

   Thus, if an adversary compromises enough keys to sign a compromised
   package, or tricks a packager into signing a compromised package,
   and manages to get that package into a signed bundle, the best we
   can do is to limit the number of users who are affected.  We do
   this by compartmentalizing signing keys so that only the package
   and bundle in question are at risk.

   (If we had replicated build processes and a bit-by-bit reliable
   build process, we could have multiple packagers test that a binary
   was built properly, and multiply sign it.  This would be effective
   against an adversary compromising a single packaging key, but not
   against one compromising a source repository.)

2. The repository layout

   The filesystem layout in the repository is used for two purposes:
     - To give mirrors an easy way to mirror only some of the repository.
     - To specify which parts of the repository a given key has the
       authority to sign.

   The following files exist in all repositories and mirrors:

    /meta/keys.txt

         Signed by the root keys; indicates keys and roles.
         [???? I'm using the txt extension here.  Is that smart?]

    /meta/mirrors.txt

         Signed by the mirror key; indicates which parts of the
         repository are mirrored at what mirrors.

    /meta/timestamp.txt

         Signed by the timestamp key; indicates hashes and timestamps
         for the latest versions of keys.txt and mirrors.txt.  Also
         indicates the latest version of each bundle for each os/arch.

         This is the only file that needs to be downloaded for polling.

    /bundleinfo/bundlename/os-arch/bundlename-os-arch-bundleversion.txt

         Signed by the appropriate bundle key.  Describes what
         packages make up a bundle, and what order to install,
         uninstall, and upgrade them in.

    /pkginfo/packagename/os-arch/version/packagename-os-arch-packageversion.txt

         Signed by the appropriate package key.  Tells the name of the
         file that makes up a package, its hash, and what procedure
         is used to install it.

    /packages/packagename/os-arch/version/(some filename)

         The actual package file.  Its naming convention will depend
         on the underlying packaging system.

3. Document formats

3.1. Metaformat

   All documents use Rivest's SEXP meta-format as documented at
     http://people.csail.mit.edu/rivest/sexp.html
   with the restriction that no "display hint" fields are to be used,
   and the base64 transit encoding isn't used either.

   (We use SEXP because it's really easy to parse, really portable,
   and unlike most other tagged data formats, has a
   trivially-specified canonical format suitable for hashing.)

   In descriptions of syntax below, we use regex-style qualifiers, so
   that in
        (sofa slipcover? occupant* leg+)
   the sofa will have an optional slipcover, zero or more occupants,
   and one or more legs.  This pattern matches (sofa leg) and (sofa
   slipcover occupant occupant leg leg leg leg) but not (sofa leg
   slipcover).

   We also use a braces notation to indicate elements that can occur
   in any order.  For example,
        (bread {flour+ eggs? yeast})
   matches a list starting with "bread", and then containing one or
   more  of flours, zero or one occurrences of eggs, and one
   occurrence of yeast, in any order.  This pattern matches (bread eggs
   yeast flour) but not (bread yeast) or (bread flour eggs yeast
   macadamias).

3.2. File formats: general principles

   We use tagged lists (lists whose first element is a string) to
   indicate typed objects.  Tags are generally lower-case, with
   hyphens used for separation.  Think Lispy.

   We use attrlists [lists of (key value) lists] to indicate a
   multimap from keys to values.  Clients MUST accept unrecognized
   keys in these attrlists.  The syntax for an attrlist with two
   recognized and required keys is typically given as ({(key1 val1)
   (key2 val2) (ATTR VAL)*}), indicating that the keys can occur in
   any order, intermixed with other attributes.

   Timestamp files will be downloaded very frequently; all other files
   will be much smaller in size than package files.  Thus,
   size-optimization for timestamp files makes sense and most other
   other space optimizations don't.

   Versions are represented as lists of the form (v I1 I2 I3 I4 ...)
   where each item is a number or alphanumeric version component.  For
   example, the version "0.2.1.5-alpha" is represented as (v 0 2 1 5
   alpha).

   All signed files are of the format:

       (signed
          X
          (signature ({(keyid K) (method M) (ATTR VAL)*}) SIG)+
       )

   where: X is a list whose first element describes the signed object.
          K is the identifier of a key signing the document
          M is the method to be used to make the signature
          (ATTR VAL) is an arbitrary list whose first element is a
             string.
          SIG is a signature of the canonical encoding of X using the
          identified key.

   We define two signing methods at present:
       sha256-oaep : A signature of the SHA256 hash of the canonical
         encoding of X, using OAEP+ padding. [XXXX say more about mgf]

   All times are given as strings of the format "YYYY-MM-DD HH:MM:SS",
   in UTC.

   All keys are of the format:
      (pubkey ({(type TYPE) (ATTR VAL)*}) KEYVAL)
   where TYPE is a string describing the type of the key and how it's
   used to sign documents.  The type determines the interpretation of
   KEYVAL.

   The ID of a key is the type field concatenated with the SHA-256
   hash of the canonical encoding of the KEYVAL field.

   We define one keytype at present: 'rsa'.  The KEYVAL in this case
   is a 2-element list of (e n), with both values given in big-endian
   binary format.  [This makes keys 45-60% more compact than using
   decimal integers.]

   All RSA keys must be at least 2048 bits long.


   Every role in the system is associated with a key.  Replacing
   anything but a root key is supposed to be relatively easy.

   Root-keys sign other keys, and certify them as belonging to roles.
   Clients are configured to know the root keys.

   Bundle keys certify the contents of a bundle.

   Package keys certify packages for a given program or set of
   programs.

   Mirror keys certify a list of mirrors.  We expect this to be an
   automated process.

   Timestamp keys certify that given versions of other metadata
   documents are up-to-date.  They are the only keys that absolutely
   need to be kept online.  (If they are not, timestamps won't be
   generated.)

3.3. File formats: key list

   The key list file is signed by multiple root keys.  It indicates
   which keys are authorized to sign which parts of the repository.

   (keylist
     (ts TIME)
     (keys
       ((key ({(roles (ROLE PATH)+) (ATTR VAL)*}) KEY)*)
     ...
   )

   The "ts" line describes when the keys file was updated.  Clients
   MUST NOT replace a file with an older one, and SHOULD NOT accept a
   file too far in the future.

   A ROLE is one of "timestamp" "mirrors" "bundle" or "package".

   PATH is a path relative to the top of the directory hierarchy.  It
   may contain "*" elements to indicate "any file", and may end with a
   "/**" element to indicate all files under a given point.

3.4. File formats: mirror list

   The mirror list is signed by a mirror key.  It indicates which
   mirrors are active and believed to be mirroring which parts of the
   repository.

   (mirrorlist
     (ts TIME)
     (mirrors
       ( (mirror ({(name N) (urlbase U) (contents PATH+) (weight W)
                   (official)?  (ATTR VAL)})) * )
     ...
  )

  Every mirror is a copy of some or all of the directory hierarchy
  containing at least the /meta, /bundles/, and /pkginfo directories.

  N is a descriptive name for the mirror; U is the URL of the mirror's
  base (i.e., the parent of the "meta" directory); and the PATH
  elements are the components describing how much of the packages
  directory is mirrored.  Their format is as in the keylist file.

  W is an integer used to weight mirrors when picking at random;
  mirrors with more bandwidth should have higher weigths.   The
  "official" element should only be present if the mirror is (one of
  the) official repositories operated by the Tor Project.

3.5. File formats: timestamp files

  The timestamp file is signed by a timestamp key.  It indicates the
  latest versions of other files, and contains a regularly updated
  timestamp to prevent rollback attacks.

  (ts
    ({(at TIME)
      (m TIME MIRRORLISTHASH)
      (k TIME KEYLISTHASH)
      (b NAME VERSION TIME PATH HASH)*})
  )

  TIME is when the timestamp was signed.  MIRRORLISTHASH is the digest
  of the mirror-list file; KEYLISTHASH is the digest of the key list
  file; and the 'b' entries are a list of the latest version of each
  bundles and their locations and hashes.

3.6. File formats: bundle files

  (bundle
    (at TIME)
    (os OS)
    [(arch ARCH)]
    (version V)
    (packages
      (NAME VERSION PATH HASH ({(order INST UPDATE REMOVE)
                                (optional)?
                                (gloss LANG TEXT)*
                                (longloss LANG TEXT)*
                                 (ATTR VAL)*})? )* )
  )

  Most elements are self-explanatory; the INST, UPDATE, and REMOVE
  elements of the order element are numbers defining the order in
  which the packages are installed, updated, and removed respectively.
  The "optional" element is present if the package is optional.
  "Gloss" is a short utf-8 human-readable string explaining what the
  package provides for the bundle; "longloss" is a longer such
  utf-8 string.

  (Note that the gloss strings are meant, not to describe the package,
  but to describe what the package provides for the bundle.  For
  example, "The Anonymous Email Bundle needs the Python Runtime to run
  Mixminion.")

  Multiple gloss strings are allowed; each should have a different
  language. The UI should display the must appropriate language to the
  user.

3.7. File formats: package files

  (package
    ({(name NAME)
     (version VERSION)
     (format FMT ((ATTR VAL)*)? )
     (path PATH)
     (ts TIME)
     (digest HASH)
     (shortdesc LANG TEXT)*
     (longdesc LANG TEXT)*
     (ATTR VAL)* })
  )

  Most elements are self-explanatory.  The "FMT" element describes the
  file format of the package, which should give enough information
  about how to install it.

  No two package files in the same repository should have the same
  name and version.  If a package needs to be changed, the version
  MUST be incremented.

  Descriptions are tagged with languages in the same way as glosses.

4. Detailed Workflows

4.1. The client application

  Periodically, the client updater fetches a timestamp file from a
  mirror.  If the timestamp in the file is up-to-date, the client
  first checks to see whether the keys file listed is one that the
  client has.  If not, the client fetches it, makes sure the hash of
  the keys file matches the hash in the timestamp file, makes sure its
  date is more recent than any keys file they have but not too far in
  the future, and that it is signed by enough root keys that the
  client recognizes.

       [If the timestamp file is not up-to-date, the client tries a
       few mirrors until it finds one with a good timestamp.]

       [If the keys file from a mirror does not match the timestamp
       file, the client tries a new mirror for both.]

       [If the keys file is not signed by enough root keys, the client
       warns the user and tries another mirror for both the timestamp
       file and the keys file.]

  Once the client has an up-to-date keys file, the client checks the
  signature on the timestamp file.  Assuming it checks out, the client
  refreshes the mirror list as needed, and refreshes any bundle files
  to which the user is subscribed if the client does not have
  the latest version of those files.  The client checks signatures on
  these files, and fetches package metadata for any packages listed in
  the bundle file that the client does not have, checks signatures on
  these, and fetches binaries for packages that might need to be
  installed or updated.  As the packages arrive, clients check their
  hashes.

  Once the client has gotten enough packages, it informs the user that
  new packages have arrived, and asks them if they want to update.

  Clients SHOULD cache at least the latest versions they have received
  of all files.

4.1.1. Download preferences

  Users should be able to specify that packages must be only
  downloaded over Tor, or must only be downloaded over encrypted
  protocols, or both.  Users should also be able to express preference
  for Tor vs non-Tor and encrypted vs non-encrypted, even if they
  allow both.

4.2. Mirrors

  Periodically, mirrors do an rsync or equivalent to fetch the latest
  version of whatever parts of the repository have changed since the
  version they currently hold.  Mirrors SHOULD replace older versions
  of the repository idempotently, so that clients are less likely to
  see inconsistent state.  Mirrors SHOULD validate the information
  they receive, and not serve partial or inconsistent files.

4.3. Workflow: Packagers

  When a new binary package is done, the person making the package
  runs a tool to generate and sign a package file, and sends both the
  package and the package file to a repository admin.  Typically, the
  base package file will be generated by inserting a version into a
  template.

  Packages MAY have as part of their build process a script to
  generate the appropriately versioned package file.  This script
  should at a minimum demand a build version, or use a timestamp in
  place of a build version, to prevent two packages with the same
  version from being created.

4.4. Workflow: bundlers

  When the packages in a bundle are done, the bundler runs a tool on
  the package files to generate and sign a bundle file.  Typically,
  this tool uses a template bundle file.

4.5. Workflow: repository administrators

  Repository administrators use a tool to validate signed files into the
  repository.  The repository should not be altered manually.

  This tool acts as follows:
     - Package files may be added, but never replaced.
     - Bundle files may be added, but never replaced.
     - No file may be added unless it is syntactically valid and
       signed by a key in the keys file authorized to sign files of
       this type in this file's location location.

     - A package file may not be added unless all of its binary
       packages match their hashes.

     - A bundle file may not be added unless all of its package files
       are present and match their hashes.

     - When adding a new keylist, bundle, or mirrors list, the
       timestamp file must be regenerated immediately.

5. Parameter setting and corner cases.

5.1. Timing

  The timestamp file SHOULD be regenerated every 15 minutes.  Mirrors
  SHOULD attempt to update every hour.  Clients SHOULD accept a
  timestamp file up to 6 hours old.

5.2. Format versioning and forward-compatibility:

  All of the above formats include the ability to add more
  attribute-value fields for backwards-compatible format changes.  If
  we need to make a backwards incompatible format change, we create a
  new filename for the new format.

5.3. Key management and migration:

  Root keys should be kept offline.  All keys except timestamp and
  mirror keys should be stored encrypted.

  All the formats above allow for multiple keys to sign a single
  document.  To replace a compromised root key, it suffices to sign
  keylist documents with both the compromised key and its replacement
  until all clients have updated to a new version of the autoupdater.

  To replace another key, it suffices to authorize the new key in the
  keylist.  Note that a new package or bundle key must re-sign and
  issue new versions of all packages or bundles it has generated.



F. Future directions and open questions

F.1. Package decomposition

  It would be neat to decouple existing packages.  Right now, we'd
  never want a windows user to have to fetch an openssl dll and Tor
  separately.  But if they're using an auto-update tool, it'd be
  pretty keen to have them not need to fetch a new openssl every time
  Tor has a bugfix.

F.2. Caching at Tor servers.

  See Tor Proposal number 127.

F.3. Support for more download methods

  Ozymandns, chunked downloads, and bittorrent would all be neat
  ideas.

F.4. Support for bogus clocks.

  Glider should have a user configurable "no, my clock is _supposed_
  to be wrong" mode, since lots of users seem to _like_ having their
  clocks in 1970 forever.

R. Ideas I'm rejecting for the moment

R.1. Considering recommended versions from Tor consensus directory documents

  This requires a working Tor to update Tor; that's not necessarily a
  great idea.

R.2. Integration with existing GPG signatures

  The OpenPGP signature and key format is so complicated that you'd
  have to be mad to touch it.