[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

End to end Mixminion issues, with proposed solutions



Hello, everyone!

I wrote this piece up to try to resolve a bunch of nagging points in the
current spec.  I propose the following combined solution.  (Comments
would be really neat.)

PART I: END-TO-END MIXMINION: Or, "Weren't there supposed to be clients?"

Right now, we've got several aspects of the system in a half-specified
state, and it turns out that they all cover a similar area: end-to-end
communication. These half-specified aspects are:
    1) End-to-end encryption on forward messages 
        [Alice wants to send a message to Bob that only Bob can read. Alice
         knows Bob's identity, but wants to remain anonymous herself.]

    2) 'Stateless' reply blocks
        [Alice wants to generate a bunch of SURBs, but not have to store
         separate keys for each one.]

    3) Recognizing message sizes
        [Bob has just received a message from Alice.  How does he know where
         it ends?]

    4) Recognizing forward/reply messages as a recipient
        [Bob has just received a message from Alice.  Must he decode it?
         Should he try his public key, or is the message a reply, or is it
         unencrypted?]

    5) The ad-hoc use of the TAG field.


(To abbreviate, I'm going to refer to the following 'message types':
     FWD (unencrypted forward message)
     EFWD (encrypted forward message)
     REPL (Reply block delivery)
     SREPL ('Stateless' reply: requires minimal client-side storage.)
     JUNK (Any other message type, with payload tagged after crossover.)
 )

Here are some design goals:
    1) A recipient should be able to decrypt all incoming messages
       with a minimum of effort.
    2) Although it will always be more secure for a recipient to run
       her own exit node, a recipient should be able to receive FWD
       and S?REPL messages with without running her own exit node and
       still get acceptable anonymity.  [This is desirable because
       some recipients are likely to have unreliable network
       connections or no fixed IP address.]
    3) Minimal exit distinguishability: as much as possible, an
       adversary observing an exit node should not be able to tell the
       different 'message types' apart. (Obviously, because FWD
       messages are delivered as plaintext, they can't be mistaken for
       EFWD,REPL,SREPL, or JUNK.  The others, though, should be
       indistinguishable to anybody but the recipient.) [This is
       desirable to keep an edge-watching adversary from partitioning
       the traffic more than necessary.]
    4) FWD recipients must still be able to read their messages
       without any client support.
    5) All message types must be deliverable by SMTP.
    6) As much as possible, recipients should be able to tell whether
       they have decrypted a message correctly.

And here's the design.  It's got some weaknesses, but I think it's a
good start.  Please let me know what's wrong, and how to fix it.

 _META_
  A) All exit types have their Routing-Info field begin with a 20-byte
     Tag.  In all cases, the contents of a Tag field should be a 20-byte
     random number, with MSB=0.  (I.e., 0 <= Tag < 2**(159): Tag is really
     only 159 bits.)
 
     [See C below to find out why the MSB=0.]

     (Remember, the Tag is in the header; the sender generates it for
     forward messages, and the recipient generates it for replies.)

 SENDER:
  B) For FWD messages, the sender generates a 159-bit random number for
     the Tag.  For a text message, the sender pads the payload with 00
     bytes.  For a binary message, the sender sends 2 size bytes,
     the payload, and random padding up to 28K.

  C) For EFWD messages, the sender generates a 128-bit temporary key K at
     random, and RSA-OAEP encrypts it with the recipient's public key.
     If the encrypted key EK has its MSB=1, the sender generates a new
     key and tries again.  Otherwise, the sender uses the first 20
     bytes of EK as the Tag.  The payload contains: The remaining bytes
     of EK, followed by the LIONESS encryption (with K) of:
                   2 size bytes, 
                   the message
                   Random padding up to 28K.

     {{This is the iffy part.  I know that RSA-encrypted data is
     distinguishable from random numbers, since the encrypted message
     is never greater than the public key modulus.  I _think_ that
     discarding results with MSB=1 forces the encrypted data to 'look 
     random', but I've really got no clue.

     Does this work at all?  How well?  If this doesn't work, what might?

     Efficiency shouldn't be too bad:  On average, a sender only has
     to do 1 extra RSA public-key encryptions per EFWD message.}}

  D) REPL/SREPL messages: The sender does nothing to the Tag field;
     that's part of the SURB header, and under the recipient's
     control.  The recipient can't tell whether the SURB is 'stateless'
     or not.

     Each SURB includes an encryption key K.  The sender generates a
     payload containing the string 'Mix Reply', two bytes of size
     data, the message, and random padding up to 28K.  The sender then
     encrypts this payload with K and sends it.

  SMTP DELIVERY:
  E) If the message is to be delivered over a text medium such as
     SMTP, the exit node checks whether the message is contains only
     'printing' characters (hex 07..0D, 20..FF), and 
     ends with a string of 00 characters.  If so, the exit node strips
     the 00 characters at the end, and delivers the message as-is.
 
     Otherwise, the exit node sends the entire payload, base-64
     encoded.

     In either case, the exit node includes the tag as part of the
     SMTP body, base-64 encoded.

     {This way, senders can send text-only FWD messages to 
     recipients without special software.  The problem is, I think
     this fails in the presence of UTF-8 and friends.  Is this okay?}

   SURB GENERATION:

   F) Generating a 'stateful' reply block:
      The SURB generator generates a set of shared secrets: one for
      each hop of the SURB, and one for end-to-end encryption.  The
      generator also creates a 159-bit identifier, and stores the keys
      on disk, indexed by this identifier.  The generator uses the
      identifier as the Tag field.

   G) Generating a 'stateless' reply block:
      The SURB generator must remember a secure password, or have a
      long-term secret stored on disk.  We call this secret SEC.
 
      The SURB generator searches for a 159-bit seed S such that
      H(S|SEC|"Validate") ends with a 0 byte.  This seed becomes the
      Tag.  The SURB generator then uses H(S|SEC|"Generate") to seed a
      PRNG to generate shared secrets as above.
  
      {See J below to learn why we search for H(S|SEC|"Validate")
       ending with 0.}

   RECEIVING A MESSAGE:

    H) First, if the message is a text FWD message, the sender will
       probably know what to do with it; it's obviously text.

       (Else go to step I.)
 
    I) Second, if a recipient uses 'stateful' reply blocks, she checks
       whether the messages Tag field matches the index of any of her
       stored key sets.  If so, she uses that key set to decrypt the
       message.  If the end result does not start with 'Mix Reply',
       then the payload has been junked.

       (If the Tag field is *not* the index of a stored keyset, go to
       step J.)

    J) If a recipient uses 'stateless' reply blocks, she checks
       whether H(Tag|SEC|"Validate") ends with 0.  If so, she uses
       H(Tag|SEC|"Generate") to reconstruct the key set.  If the end
       result does not start with 'Mix Reply', then the payload has
       been junked, _OR_ the message was not actually generated with a
       reply block.

       {Why the ends-with-0 step?  Performance. Decrypting a 16-hop
        message takes as much as 95 ms on my laptop, because of the 17
        LIONESS steps.  On the other hand, trying 256 SHA-1 hashes to
        find one ending with 0 takes 1 ms.  By spending 1 ms per SURB
        generation, we save save an average of 94.6 ms every time we
        receive a message _not_ send with SREPL.}

       (If this step fails, go to step K.)

    K) If the recipient has a public key of size N, she appends the
       tag to the first N-20 bytes of the payload, and tries to
       decrypt the result.  (If the OAEP padding doesn't check, the
       message is junk, or was not encrypted: go to step L.)  The
       result is a key; the recipient uses it to decrypt the rest of
       the message.

    L) The message is either unencrypted data, or junk.

Open questions:
 1) We should figure out compression at some point.
 2) Likewise for fragments.
 3) Likewise for k-of-n and friends.
 4) Did I really get indistinguishability for everything but FWD?
 5) Will my scary hack really make RSA-encrypted data hard to distinguish
    from random numbers?  If not, do we have any other choices?


Part II: WHAT'S AN ADDRESS?

Another end-to-end issue is: how can clients specify addresses?  

To send a forward message, or to generate a SURB, the program must know:
    1) The final routing type.
    2) The non-Tag portion of the routing info.
    3) If the sender wants to send a EFWD message, the recipient's
       public key.
    4) If the routing type depends on a particular node (as does
       MBOX), a valid sever descriptor for that node.

If the sender knows (3) and (4), we're okay.  But -- as George asks --
how is a sender supposed to find them out?  If the sender just gets them
off a keyserver, an observer (or the keyserver) can tell who wants to
talk to whom.  

Possible solution: just use the Mix system to query the keyserver! The
would-be recipient uses Minion to sent the keyserver a list of users and
a SURB to receive the reply.  (Alternatively, just send a non-encrypted
anonymous mail to the user and ask for his key.)

Of course, we still have some pretty severe usability problems.  When I
want to use this thing from the command line, what do I need to tell the
client?  For now, I'm going to make users specify public keys and
'final' nodes on the command line, but that'll probably need to change. 
Otherwise, it'll be too easy to accidentally send a reply to the wrong
server.