[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] Adjust pynchon-gate message-encrypting-and-packing f...



Update of /home/freehaven/cvsroot/doc/pynchon-gate
In directory moria.mit.edu:/tmp/cvs-serv1935

Modified Files:
	pynchon-spec.txt 
Log Message:
Adjust pynchon-gate message-encrypting-and-packing format to explain how to summarize undelivered messages for receipt/decryption in later cycles.  Still not perfect.

Index: pynchon-spec.txt
===================================================================
RCS file: /home/freehaven/cvsroot/doc/pynchon-gate/pynchon-spec.txt,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -d -r1.5 -r1.6
--- pynchon-spec.txt	15 Aug 2004 20:43:29 -0000	1.5
+++ pynchon-spec.txt	1 Sep 2004 19:47:22 -0000	1.6
@@ -5,6 +5,32 @@
                original design by Bram Cohen and Len Sassaman)
 
 
+
+Table of Contents
+
+   1.       Overview
+   1.1.     Notation and terminology
+   1.2.     Cryptographic functions
+   2.       Encrypting and packaging messages
+   2.1.     Key maintenance
+   2.2.     Packaging messages
+   2.2.1.   Message synopses
+   2.3.     Managing pending messages
+   2.4.     Decrypting packaged messages
+   3.       The bucket pool
+   3.1.     Bucket pool formats
+   3.2.     Constructing the bucket pool
+   3.3.     Distributing the bucket pool
+   4.       Retrieving the bucket pool
+   4.1.     PIR protocol
+   4.1.1.   Error codes
+   4.1.2.   Optimizing the distributor implementation
+   4.2.     Retrieving a bucket
+   4.3.     Retrieving a cycle's worth of messages
+   5.       Account administration
+   6.       System information
+
+
 1. Overview
 
    The Pynchon Gate Protocol (PynGP) uses Private Information Retrieval to
@@ -54,21 +80,41 @@
    R(n) -- n octets of random data.
    INT(v,n) -- The nonnegative integer "v", expressed as a n-octet value
        in network (big-endian) order.
-   PRNG(s,n) -- A cryptographically string pseudorandom sequence of n octets,
+   PRNG(s,n) -- A cryptographically strong pseudorandom sequence of n octets,
        generated using the seed s.
    Z(n) -- A sequence of n zero octets.
    ENC(s,k) -- Encrypt the string s using a stream cipher, with the key k.
    DEC(s,k) -- Decrypt the string s using a stream cipher, with the key k.
+   COMPRESS(s) -- Compress the string s using the "deflate" algorithm and the
+       zlib message format as defined in RFCs 1950 and 1951.
+
+   "abc" -- A three-byte string, containing the ASCII values for the
+        characters "a", "b", and "c".  Unless otherwise noted, strings are
+        _not_ NUL-terminated.
+
+   [61 62 63] -- A three-byte string, containing bytes with the hexadecimal
+        values 61, 62, and 63.
+
+   x[a:b] -- A substring of the string x, starting on the a'th character of
+        x, and ending with the b-1'th character of x.  (Thus, x[0:LEN(x)] =
+        x.)
 
    HASH_LEN -- The length of the output of our cryptographic digest function.
    SEED_LEN -- The length of the seed for our PRNG.
-   K -- A client-selected security parameter: how many distributors will the
-        client connect to?
+   KEY_LEN -- The length of the keys used by our encryption functions.
 
-   "abc" -- A three-character string.  Unless otherwise noted, strings are
-            _not_ NUL-terminated.
+   K -- A client-selected security parameter determining how many
+        distributors the client will connect to.
 
-2. Packaging messages
+   (We require that HASH_LEN >= KEY_LEN and HASH_LEN >= SEED_LEN)
+
+1.2. Cryptographic functions
+
+   To implement PynGP, we recommend AES-128 in counter mode for encryption
+   (so that ENC == DEC); SHA-256 for our hash function; and ENC(Z(n),s) as
+   our PRNG.
+
+2. Encrypting and packaging messages
 
 2.1. Key maintenance
 
@@ -85,6 +131,9 @@
       SUBKEY(j+1,i) = H(SUBKEY(j,i) | "NEXT SECRET")
       SUBKEY(0,i) = H(S[i] | "NEXT SECRET")
 
+   {Rationale: We use a separate chain of keys for each cycle so that it is
+   easier for a user to resynchronize after missing a few cycles.}
+
    Also, the nymserver derives an opaque UserID for each recipient each
    cycle:
       UserID[i] = H(S[i] | "USER ID")
@@ -102,27 +151,46 @@
    nymserver.  These messages can be emails, padding, or other status
    messages.
 
-   The messages in a cycle i are encrypted in order, with the j'th message on
-   i encrypted using SUBKEY(j,i).  Encryption uses a stream cipher, so that
-   encrypted message length is the same as plain-text length.  Since keys are
-   not re-used, no IV is necessary.
+   The messages in a cycle i are encrypted in order, with the j'th message
+   received on i encrypted and identified using derivatives of SUBKEY(j,i) as
+   follows:
+
+         MsgID(j,i) = H(SUBKEY(j,i) | "MESSAGE ID")
+         MsgKey(j,i) = H(SUBKEY(j,i) | "MESSAGE KEY")
+
+   MsgKey(j,i) is used as a key to encrypt the message, and MsgID(j,i) is
+   prepended to the encrypted message in order to identify it to the user
+   (Message IDs are necessary because messages may be packaged out of order.)
+   Encryption uses a stream cipher, so that encrypted message length is the
+   same as plain-text length.  Since keys are not re-used, no IV is
+   necessary.
+
+   {Rationale: We derive MsgKey from SUBKEY rather than using SUBKEY
+   directly, so that a user can store MsgKey(j,i) without worrying that a
+   compromise will expose MsgKey(j+1,i).}
+
+   [XXXX Should we just have MsgID(j,i) = H(MsgKey(j,i)) ? - NM ]
+
+   The 0'th subkey is reserved for the INDEX message.  The 1'st subkey is
+   reserved for the SUMMARY message.
 
    All messages begin with a single-octet 'TYPE' field.  The values for TYPE
    are:
 
-         PAD_REST  [00]
-         MAIL      [01]
-         ACK       [02]
+         INDEX     [00]
+         PAD_REST  [01]
+         MAIL      [02]
+         ACK       [03]
+         SUMMARY   [04]
          ERROR     [FF]
 
    All message types except for PAD_REST use in the following format:
 
          TYPE     Message type    [1 octet]
-         LEN      Message length  [4 octets]
          DATA     Message body    [LEN octets]
          HASH     Hash of message [HASH_LEN octets]
 
-   HASH is equal to H(TYPE|LEN|DATA).  It is redundant.
+   HASH is equal to H(TYPE|DATA).  It is redundant.
 
    PAD_REST uses a variable length format:
 
@@ -130,38 +198,120 @@
          PADDING  Random data     [Up to end of the current bucket]
 
    PAD_REST can only appear as the last of a user's messages in a given
-   cycle; thus, it does not need a length field.  (In fact, it had better
-   *not* have a length field, since it needs to be able to fit in one octet.)
+   cycle; thus, it does not need a length field.
 
    Other message types have bodies as follows:
 
-         TYPE:  MAIL [01]
+         TYPE: INDEX [00]
+         DATA: A description of the contents of all messages in today's
+               buckets.  It begins with the number of messages in today's
+               buckets, encoded as a 4-byte big-endian integer.  Then,
+               for each message, in includes:
+                    ID   -- The MsgID for the message  [HASH_LEN octets]
+                    LEN  -- The length of the encrypted message [4 octets]
+         NOTES: There MUST be exactly one index message in each cycle's worth
+                of buckets, and it MUST appear first.  The index message
+                itself is not included in the count of messages.
+
+                The index message is always encrypted with MsgKey(0,i).
+
+         TYPE:  MAIL [02]
          DATA: One or more email messages, concatenated, then compressed.
-               Each message M is packaged as INT(LEN(M),4)|M .
+               Each message M is prefixed with a length field, as
+               INT(LEN(M),4)|M .
          NOTES: Nymservers SHOULD generate MAIL messages for incoming email
                 as greedily as possible, to minimize the amount of time that
                 unencrypted mail is stored.
 
-         TYPE:  ACK [02]
+         TYPE:  ACK [03]
          DATA:  C         Cookie         [HASH_LEN octets]
          NOTES: Acknowledges that the nymserver has received a control
                 message from the user and processed it.  The 'cookie'
                 must match the cookie sent in the control message being
                 acknowledged.
 
+         TYPE:  SUMMARY [04]
+         DATA:  Message synopses for messages not included in this cycle's
+                bucket.  See 2.2.1 below.
+
          TYPE:  ERROR [FF]
          DATA:  ERR      Error code     [2 octets]
                 MSG      Message        [Variable]
 
+2.2.1. Message synopses
 
-   {XXXX Add more message types, including 'summary of pending mail'.}
+   When it is not possible for the nymserver to package all of a user's
+   pending mail, it instead packages as much as it can, and uses the rest of
+   the user's space to include a synopsis for each undelivered message it
+   received in that cycle.
 
-   {XXXX There's a problem with our encryption scheme and our long-term
-    goals.  We'd like to be able to queue pending mail if there's too much
-    mail, or set rule for delivering mail according to user-set priorities.
-    But this seems to conflict with the idea of chaining encryption keys and
-    greedily encrypting everything as soon as it comes in.  This needs an
-    answer!}
+   The synopsis for an email consists of one or more of the email's headers,
+   in RFC822 format.  The following headers MUST be included in synopsis when
+   present: "From", "To", "Cc", "In-Reply-To", "Message-ID", "Subject".
+   Other headers MAY be included or omitted.  Nymservers MAY add additional
+   headers for user convenience, for example by running SpamAssassin or
+   similar tools to mark suspicious messages as probably spam.
+
+   The nymservers SHOULD compress and encrypt each synopsis at the same time
+   as it encrypts its corresponding messages.  To encrypt the synopsis for
+   message j, received on cycle i, the nymserver uses the key: SynopKey(j,i)
+   = H(SUBKEY(j,i) | "SYNOPSIS KEY")
+
+   To pack multiple synopses into a SUMMARY message, each is first prefixed
+   with its corresponding message ID, then a 4-octet integer size field.
+
+2.3. Managing pending messages
+
+   When a nymserver receives an email for a user, it encrypts that message
+   and its synopsis as described in 2.2 and 2.2.1. above, and stores them,
+   indexed by the user, by the message ID (described in 2.2) and by the cycle
+   in which they were received.
+
+   When building a user's buckets for a given cycle, the nymserver tries
+   to include all the pending messages for that user.  If this is not
+   possible, the nymserver instead includes:
+
+      - As many pending messages as possible.
+      - All ACK and ERR messages.
+      - A SUMMARY message listing all pending messages *not* included in
+        the current cycle.
+
+        [XXXX Should we list everything that's pending, or should we list
+        everything that's pending and new? -NM]
+
+   When choosing which messages to include, the nymserver SHOULD by default
+   include oldest first, but MAY allow users to configure other preferences,
+   and MUST allow users to specifically override its choice of messages by
+   requesting particular messages to be delivered or deleted (see section 5
+   below).
+
+   The nymserver SHOULD delete pending messages that it has not been able
+   to deliver for a long time, and MAY delete messages to force disk quota
+   compliance.
+
+2.4. Decrypting packaged messages
+
+   Upon receiving their messages for a cycle, clients decrypt the messages
+   as follows.
+
+       1.  First, decrypt the INDEX message with MsgKey(0,i).  Use the
+           INDEX message to split the bucket into messages; associate each
+           message with its message ID.
+
+       2.  For each message ID in the INDEX message, check to see whether we
+           have stored that message ID from a previous cycle.  If so, use the
+           corresponding message key to decrypt the message.
+
+       3.  See whether we have a message with MsgID(1,i).  If so, this is
+           the SUMMARY message.  Decrypt it.
+
+       4.  Generate MsgID(2,i) ...  MsgID(N,i) until a message ID is found
+           which is not listed in the SUMMARY message or the INDEX message.
+           Decrypt all messages listed in the INDEX message.  For each
+           MsgID in the summary, store it and its corresponding key until
+           the message is received in a later cycle, or until the user tells
+           the nymserver to delete the message, or until the message is "too
+           old"
 
 3. The bucket pool
 
@@ -563,7 +713,8 @@
 
 5. Account administration
 
-   {XXXX Write me.}
+   {XXXX Write me.  Clients need a way to set preferences, create accounts,
+   request messages to be included, request message to be deleted.}
 
 6. System information
 

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/