[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[minion-cvs] Move all specification files into a separate directory ...
Update of /home/minion/cvsroot/doc/spec
In directory moria.mit.edu:/tmp/cvs-serv6508/spec
Added Files:
E2E-spec.txt dir-spec.txt minion-spec.txt path-spec.txt
Log Message:
Move all specification files into a separate directory to keep them
together; rework the specification to be more RFCish; admit that it
hasn't been LaTeX for a long time.
I haven't really touched E2E spec, other than to move some of the open
issues there and indent it 3 spaces.
We still need a lot more exposition and reworking on the specification
before we can call it readable. For instance, it seems to assume you
know how crossover points work, and why we have our 2-header
structure.
I've broken the serverdesc and directory parts into a new dir-spec,
and added a new path-spec. I'm leaving nym-spec in place, since I
haven't looked at it in ages.
--- NEW FILE: E2E-spec.txt ---
END TO END MIXMINION
This document attempts to motivate a set of requirements for
end-to-end message transport on Type III (Mixminion) remailers, and
to provide a design meeting those requirements. The first design[1]
was sent to mixminion-dev on on September 20; the second design[2] on
October 11 2002.
This is a minor revision of the second design. It should be folded
into the main spec.
[XXXX Or, we should aggressively compartmentalize the main spec. This
part is only needed if you're writing a client or an exit module for
a server. Relay-only servers don't need this. -NM]
[1] http://archives.seul.org/mixminion/dev/msg000013.html
[2] http://archives.seul.org/mixminion/dev/Oct-2002/msg00000.html
=== Introduction ===
STATUS: All of this specification, except for K-of-N fragmentation, is
implemented in the codebase today. (I have a prototype of the K-of-N
backend.
OVERVIEW
The current specification describes only how to send single Mixminion
packets from one user to another, either via forward-anonymous paths
or via single-use reply blocks. It lacks support for the following
desirable features:
1) End-to-end encryption on forward messages. (Alice wants to send
a message to Bob. Bob does not run a Mixminion node of his
own, but *does* have a long-term public key. Alice encrypts
a packet with Bob's public key, and sends it along a path
ending with an exit node that delivers the message to Bob. Bob
decrypts the contents of the packet with his public key.)
2) "State-carrying" reply blocks. (Formerly called 'Stateless'
reply blocks because they require no stored state on the
client side.) (Bob wants to generate many SURBs without
having to store a separate set of keys for each SURB.)
3) Message sizes. (When Bob receives the contents of a packet
from Alice, he should be able to tell how much of the packet is
random padding and how much is the real message.)
4) Multi-packet messages. (Alice wants to send a message to Bob,
but the message is too long to fit into the payload of a single
packet. Alice breaks the message into N packets, such that any
K of them are sufficient to reconstruct the original message.
IF Bob has the software to reassemble the original message,
Alice simply sends all the packets to Bob. If, however, Bob is
not running Mixminion software, Alice sends all of the packets
to a single exit node, and the exit node reconstructs the
original message and delivers it to Bob.)
5) In-system message compression. (Alice wants to send a long
text message to Bob. By compressing it, she can fit it into
far fewer packets.)
6) A complete decoding algorithm. (Bob receives a message. He
must be able to determine its contents without knowing whether
it plaintext forward, encrypted forward, or a reply.
Additionally, if the packet is corrupt, Bob should be able to
tell.)
My last message addressed 1, 2, 3, and 6. This document simplifies
that design, and adds 4 and 5.
The following requirements complicate the design:
1) Exit indistinguishability between message types. (If the exit
node is not operated by the recipient, it should learn as
little as possible about the messages it relays. Obviously, it
can distinguish plaintext forward messages from all other
messages. It should, however, be impossible for such an exit
node to tell whether a message is a reply, an end-to-end
encrypted forward message, or junk.)
2) Format homogeneity. (All senders must compress and fragment
their messages in the same way, or else recipients will be able
to partition incoming messages based on the format used to send
them.)
3) Message recipients who don't run any Mixminion software. (We've
always said that users who don't get any benefit of anonymity
should not need to run any special software.)
4) Message recipients who run Mixminion software, but who do not
run their own nodes. (Such recipients may decline to run nodes
because they do not have long-term IP addresses, because they
do not have reliable network connections, or because they do
not want to run a 24/7 server for other reasons.)
5) No patent-encumbered algorithms.
TERMINOLOGY
Last time I got in a bit of trouble by not defining my terms clearly.
Here's another try.
**Roles** The following roles describe various users' and servers'
relationships to a given packet. These roles are not entirely
static: a user may fill different roles for different packet.
Sender: The user that creates a given message, and decides to send it
via the Mixnet.
Encoder: The user or server that converts a message into Mixminion
packets. A user who runs Mixminion software acts as her own
encoder; one who uses an email-gateway to introduce packets into
the system does not.
Relay: An intermediary node in a Mixminion chain.
Exit node: The last node in the relay chain. An exit node is "observable"
when it relays packets to recipients over an untrusted network,
and "private" when it delivers messages locally to recipients that
trust it.
Decoder: The user or server that parses mixminion packets, and
decrypts or reconstructs the original message. Except in the case
of plaintext forward-anonymous messages, the decoder is identical
with the recipient.
Recipient: The intended target of a message.
SURB generator: The user who creates a SURB.
**Data Structures**
Message: An arbitrary sequence of bytes which a sender wants to send
to a decoder. It may be too long to fit into a single packet.
Packet: A 32K data chunk, containing two headers and a payload
as described in the current Mixminion spec.
Tag: A 20 byte sequence contained in the final subheader of every
packet. Every routing-info field associated with an exit
routing-type has a Tag subfield as its first 20 bytes.
Payload: The 28K contents of a packet.
Tagged Payload:The 28K contents of a packet, plus the Tag from the
packet's header.
**Kinds of packets**
Forward-anonymous: The sender is anonymous, and knows the receiver's
identity. The sender creates all the headers.
Plaintext forward-anonymous: A forward-anonymous packet on which the
sender does not use end-to-end public-key encryption. The exit
node sees the plaintext contents of a plaintext forward-anonymous
packet. If the exit node is observable, any eavesdropper
between the exit node and the recipient can see the plaintext.
Encrypted forward-anonymous: A forward-anonymous packet encrypted by
the encoder using the recipient's public key.
Reply: a packet sent using a SURB.
Junk: a packet whose payload is garbled between the crossover point
and the recipient.
**Kinds of SURBs**
Stateless SURB: A mode of SURB generation in which the SURB generator
only needs to remember a single medium-to-long-term secret to
decrypt packets sent to any number of SURBs.
Note that a client must use a different secret for each
pseudonymous identity it maintains; otherwise, it could not
distinguish among messages sent to its different pseudonymous
identities. (An attacker could then confirm that "Alice" and
"Bob" are the same person's pseudonym by sending the "Bob"
pseudonym a message addressed to Alice.)
Note also that long-term secrets must be very hard to guess;
otherwise, exit nodes can mount an offline secret-guessing attack
to trivially break a client's SURBs. Passwords are too easy to
guess; 20-byte random keys are more suitable.
=== The Design ===
BUILDING BLOCK: COMPRESSION
We define a compression primitive based on ZLIB, as defined in
RFC-1950 and RFC-1951. Because these standards describe only a
message format, and do not mandate a single compression algorithm, we
must restrict the allowable means of compression further. (If we
allowed encoders to choose among valid ZLIB compression algorithms,
they would become partitionable.)
Specifically, we define COMPRESS(M) to equal the result of
compressing M using zlib 1.1.4 (as available from www.gzip.org) with
the following parameters, not forcing any explicit flushes until the
message is compressed:
Compression level = 9 (maximum)
Compression method = DEFLATED (default)
Window size = 32K (default)
Memory level = 8 (default)
Compression strategy = DEFAULT (default)
Any software can be used, so long as it gives the same result as zlib
1.1.4 with these parameters. Mark Adler from the Gzip team has
averred that this is so with all zlib versions from 1.0.4 through
1.1.4, but *may* change in some future version.
If I recall correctly, the zlib website says that COMPRESS(M) is no
longer than 12 bytes + 1.02 * len(M), and at least as long as
len(M)/1000.
We also define DECOMPRESS as the inverse of COMPRESS: namely, ZLIB
decompression as described in RFCs 1950 and 1951. Note that not
DECOMPRESS is not defined for every sequence of bytes.
[XXXX Allowing message compression creates a DOS opportunity: with
highly repetitive messages, zlib can give up to 1000-fold
compression. This could allow an attacker to use a 28KB payload as
a mailbomb that uncompressed to 28MB.
The current code takes the following approach: when delivering a
message, if the uncompressed size is over 20K, and the compression
ratio is >20, we do not uncompress the message, but instead deliver
it as-is, with a warning that it may be a zlib bomb.
Does that sound reasonable? How about the parameters? -NM]
[XXXX As a first compromise, I'm going to change this requirement to
say that this checking happens *only when* using an exit method
(such as SMTP) that doesn't do its own bombing-prevention. -NM]
BUILDING BLOCK: ERASURE CORRECTING ENCODING
We define a primitive, FRAGMENT, that breaks a K-packet message into
N>K packets, such that any K of those packets are sufficient to
reconstruct the original message. FRAGMENT(M,K,N,i) is the i'th such
packet.
We also define a primitive RECONSTRUCT(K,N,P_i1, P_i2,...P_iK) that
reconstructs the original message.
Specifically, we use the XXXX algorithm as described at XXXX. This
algorithm has several advantages: first, it has freely-available
implementations in C and Java under a modified-BSD style license.
Second, it runs with acceptable performance for modest values of K
(less than 30 or so). Third, it does not seem to be patent encumbered.
Of course, this algorithm has disadvantages. First, it lacks a
complete, byte-level specification beyond that given at XXXX. Second,
it performs poorly with very large K. Sadly, it seems to be about the
best we can do.
To avoid large K, we do not fragment messages directly. Instead, we
first divide each large message into a set of chunks, with the size of
each chunk no larger than can fit in 16 packets. We do K-of-N
fragmentation of the chunks.
[XXXX Actually, fragmenting fragments may be a better approach; see
the mailing list. -NM]
PROCEDURE: Divide a message into packets. [DIVIDE(M,PS,EXF)]
ARGUMENTS
M: the message to send
PS: payload sized (fixed)
EXF: expansion factor (fixed: everyone must use the same EXF,
see below.)
Let M_SIZE = CEIL(LEN(M) / PS)
Let K = Min(16, 2**CEIL(Log2(M_SIZE)))
Let NUM_CHUNKS = CEIL(M_SIZE / K)
Let M = M | PRNG(Len(M) - NUM_CHUNKS*PS*K)
For i from 0 to NUM_CHUNKS - 1:
Let CHUNK_i = M[i*PS*K : (i+1)*PS*K]
End
Let N = Ceil(EXF*K)
For i from 0 to NUM_CHUNKS-1:
For j from 0 to N-1:
FRAGMENTS[i*N+j] = FRAGMENT(CHUNK_i, K, N, j)
End loop
End loop
Return FRAGMENTS.
Note that a message will arrive intact if and only if at least K
packets from each chunk arrive. We examine the chance for a message
to arrive under three sets of assumptions.
First, suppose we didn't support erasure correction: that is to
say, EXF=1.0. Suppose that messages are lost with probability P.
Thus, a message arrives with probability
P_msg = P ** N_PACKETS
(If P=.99, a 1MB message arrives with probability < 69%.)
Second, suppose that packets are lost independently and uniformly, such
that each packet arrives with probability P. Thus, a message arrives
with probability:
_N_ N_CHUNKS
P_msg = [ > COMB(N,i) P**i (1-P)**(N-i) ]
---
i=k
Where COMB(N,i) = the number of combinations of N elements, taken i
at a time.
(If P=.95 and EXF=1.3, P_msg > .9986)
Third, consider a different failure model. It's far more likely that
packets are lost due to entire paths failing. In this case, it makes
sense to "stripe" each chunk over a set of paths, such that Packet 1
from Chunk 1 goes along the same path as Packet 1 from Chunk 2, Packet
1 from Chunk 3, and so on. Thus, if we choose N different paths, we
can reconstruct the message so long as at least K paths are
functioning. If we choose Q paths for Q<N, at least K*Q/N paths must
be functioning. Assuming paths succeed with probability Ppath,
_Q_
P_msg = > COMB(Q,i) Ppath**i (1-Ppath)**(Q-i)
---
i=(k*Q/N)
[I'm leaving EXF and Q vague, since I have no idea of P or Ppath, and
since I don't actually plan to implement this for several versions. ]
THE DESIGN ITSELF
A) All exit routing types (such as SMTP and MBOX) have their Routing-Info
field begin with a 20-byte Tag. To all entities but the recipient,
the Tag field should look like a 20-byte random number, with
MSB=0. (I.e., 0<=Tag<2**159.) [See previous message for rationale.]
The payloads of plaintext forward-anonymous packets, encrypted
forward-anonymous packets, and reply packets are all encoded
slightly differently. To all entities except the recipient, the
contents of these packets look like 28KB of random data.
The recipient, however, can reconstruct the contents of a packet
given a payload and a tag.
B) Every tagged payload eventually decodes to a 'fragment' or
'single-packet-message'. The formats are:
SINGLE-PACKET-MESSAGE Payload: (overhead=22 bytes.)
(Single-packet message flag) [1 bit: 0]
(Length of packet contents) [15 bits]
(Hash of packet contents) [20 bytes]
(Packet contents) [up to capacity]
FRAGMENT Payload (overhead = 46 bytes)
(Multi-packet message flag) [1 bit: 1]
(Packet index) [15 bits]
(Hash of remaining packet fields) [20 bytes]
(Message ID) [20 bytes]
(Message size) [4 bytes]
(Packet contents) [capacity]
[Note: this format allows at messages at most 4 GB in size.
Decoders (exit nodes and recipients) may impose additional limits.]
[Note 2: Perhaps we should only use 8 or 10 bytes of he hash: after
all, we're not actually worried about collisions, and lioness
should prevent hostile alteration of the payload: we only need to
use it as a checksum. On the other hand, we'd only be saving 10
bytes per packet.]
C) The message encoder behaves as follows:
1. First, determine the message type. If we're using a SURB,
the message type is 'REPLY'. If we're sending to a private
exit node, or we're sending a plaintext message, the message
type is 'PLAIN'. If we're sending message via an observable
exit node to a sender with a known public key, the message
type is 'ENCRYPT'.
If message-type 'PLAIN' or 'REPLY', let SZ = 28K. Else let
SZ= 28K-38bytes. (SZ is the amount of data that fits
in a single payload. 'ENCRYPT' messages lose 42 bytes
to OAEP padding and 16 to encode the session key, and
gain 20 by spilling the encrypted data into the decoding
Tag.)
2. Compress the message. Call this CM.
3. If Len(CM) > (SZ-22 bytes)
Generate a random 20-byte number as a message ID.
Divide CM into fragments of length SZ-46 bytes according to
the K-of-N procedure given above.
Construct a FRAGMENT payload for each of the fragments.
Else,
Construct a SINGLE-PACKET-MESSAGE payload containing the
size of the message and a hash to verify correct decoding.
Endif.
Send each payload as described below.
4. To send a payload P:
Assert Len(P) == SZ.
If the message type is PLAIN:
Generate a random 159-bit number for the Tag field.
Use the payload P as it is.
End if
If the message type is ENCRYPT:
Let KS be the size of the recipient's public key.
do
Generate a 128-bit temporary key Ksession.
Let M = (Ksession | P)
Let M' = OAEP/RSA encrypt S[0:KS-38]
while the MSB of M[0] = 1.
Let Tag = M'[0:20].
Let Payload = M'[20:KS] | SPRP_ENC(Ksession,
"END-TO-END ENCRYPT",
M[KS-38 : SZ+16])
Endif
If Message type is 'reply':
(We don't generate Tag; it's part of the reply block.)
Use the payload P as it is.
Endif
D) The exit node behaves as follows:
If the payload begins with 0size, hash(X), X:
It is a short-payload, plaintext message. Send msg[0:size]
to the recipient as-is. If any characters in msg[0:size]
are not plaintext characters, send the message encoded in
base64.
Else if the payload begins with 1idx, hash(X), X:
Begin K-of-N reconstruction. When the message is
complete, send it to the recipient.
Else:
Send the tag and payload to the recipient as they are; the
recipient will need to decode them.
Endif
E) SURB generators behave as follows:
Let SEC be our long-term secret.
Do
Generate a random 159 bit seed S.
Until H(S|SEC|"Validate") ends with a 0 byte.
// (We check for the 0 byte so we can recognize stateless
// SURBs without doing many costly LIONESS operations.)
Generate the encryption keys from the seed H(S|SEC|"Generate").
Let Tag=S.
Generate a SURB using the encryption keys; use Tag as the
tag field in the final subheader.
F) The recipient behaves as follows for non-plaintext messages.
If we ever use stateless SURBs:
For every pseudonymous identity ID we maintain:
Let SEC = this identity's SURB secret.
If H(Tag|SEC|"Validate") ends with 0:
Reconstruct decryption keys from PRNG(Tag|SEC|"Generate").
Use the key set to decrypt the payload.
If the decrypted payload is of the form 0size, H(X), X:
This packet is addressed to this identity.
The message contents are X[0:size]. Return.
Else if decrypted payload is of the form 1idx, H(X), X:
This packet is addressed to this identity.
Begin K-of-N message reconstruction. Return.
Endif
// If the hash is incorrect, we may have incorrectly guessed
// the message to be a stateless SURB. Continue.
// (Implementations must inform users which identity
// has been used, in order to prevent a SURB-swapping attack)
Endif
Endloop
Endif.
If we have an end-to-end public key of size KS,
Let M' = Tag | Payload[0:KS-20]
Decrypt M' with the public key and check the OAEP padding;
call the decrypted value M.
If the padding is valid:
Let Ksession = M[0:20].
Let P' = SPRP_DEC(Ksession, "END-TO-END ENCRYPT",
Payload[KS-20:28K])
Let P = M[20:Len(M)] | P'
If P is of the form 0size, H(X), X:
The message contents are X[0:size]. Return.
Else if P is of the form 1idx, H(X), X:
Begin K-of-N reconstruction. Return.
Endif.
Endif.
Endif.
If we reach this point, the message is junk.
G) To implement K-of-N reconstruction, recipients and exit nodes that
support K-of-N reconstruction keep an internal, long-term map from
Message-ID to sets of fragments. They behave as follows:
Upon receiving a FRAGMENT payload, verify that it is of the
format 1Idx, H(X), X. From X retrieve Message-ID,
Message-Size, and Contents.
If have already reconstructed Message-ID, this packet is
superfluous. Discard it and return.
If we have no packets for Message-ID, reconstruct NUM_CHUNKS, K,
and N from Message-Size as in the DIVIDE procedure above.
Associate <NUM_CHUNKS, K, N, Message-Size> with Message-ID.
If Message-Size does not match the Message-Size associated with
Message-ID, the message is corrupt. Drop all payloads for Message-ID
and remember that Message-ID is finished.
If we already have a payload with this index stored for
Message-ID, check whether it is equal to Contents. If the
payloads are equal, return. If the payloads are unequal,
the message is corrupt: drop all payloads for Message-ID
and remember that Message-ID is finished.
Let CHUNK = Floor(Idx/N).
If we have already reconstructed chunk CHUNK,
return.
Else, if we have fewer than K-1 payloads with indices in the range
[N*CHUNK .. (N+1)*CHUNK - 1],
Store Contents under Message-ID with index Idx, and return.
Else,
Use Contents along with the payloads in the range
mentioned above to reconstruct Chunk, then discard all the
payloads in the range. Store Chunk with index CHUNK.
If all chunks are reconstructed:
Concatenate the chunks in order to form M.
The final payload is M[0:Message-Size]; uncompress it
and deliver it.
Remember that Message-ID is finished.
Endif.
Endif.
[Notes:
1) We can safely forget all stored packets and lists of
delivered messages every time we rotate our server key.
2) Exit nodes that support reconstruction are vulnerable to a
DOS attack: Until the message is reconstructed, they must
hold all the payloads they have received for each
Message-ID.]
H) Not all exit nodes support reconstruction. We add a new section to
server descriptor blocks, "Delivery/Fragments". It has the
following elements:
'Maximum-packets': The largest size of a message to be queued.
(NUM_CHUNKS * N) to be queued. Must be an even power of 2.
OPEN ISSUES:
1. Addresses. (See message of Sep 20.)
2. Preventing DOS attacks on K-of-N reconstruction.
3. Encoding for keys for encrypted-forward messages.
%%%%%% FOR REFERENCE, HERE ARE SOME SECTIONS FROM minion-spec.tex
%%%%%% THAT ARE NOW SUPERSEDED BY THIS DOCUMENT. SOME OF THE
%%%%%% COMMENTS ARE STILL APROPOS.
\section{Decoding of messages}
Messages that are received by a client can either be sent using the
forward path, or a SURB. They might either arrive in a mixminion
format, that includes all the headers, or stripped of the two headers
with only the a TAG field attached to them.
A client that receives a message that is ultimately destined to them
should perform the following operations to decode it:
[XXXX A note about the philosophical significance of the ``KEY'':
Here (and in the E2E document) the KEY that is used to test
whether a message can be decoded is really attached to a
particular pseudonym of the recipient. This, I think, must
be made more explicit than it is otherwise people implementing
protocols on top of MixMinion will get things wrong.
Maybe it would even be a good strategy if we use something like
NYMLOGINNAME|KEY(or PASSWORD) just to stress that after this step
any operation on the data is linkable to this pseudonym,
and no operation should be performed that link two different
pseudonyms together.
In fact a test as described below allows a particular pseudonym
to check if a message if for it. It is WRONG to give it a sequence
of KEYs and just get back a set of messages that is not annotated
with the identity of the decoder. -GD]
PROCEDURE: Decode a message.
Input: KEY is the long term key of one of the identities of the receipient.
TAG field of sub-header or header where
TAG = ( Encrypt(KEY, nHops | seed) | padding up to 44b).
M the body of the message.
Output: P, the plaintext of the message.
If the message was sent using the forward-path then
P = M; exit;
Otherwise, we regenerate all keys used to encrypt the payload:
(seed, nHops) = Decrypt(KEY,TAG)[0:17]
SK_1..SK_16 = PRNG(seed, nHops*16)[0:16*nhops]
For i = nHops to 0
M = SPRP_DEC(SK_i, ``PAYLOAD ENCRYPT'',M);
end
// We need here a convention for creating the
// Encryption key in the SURB.
M = SPRP_DEC(seed, "PRIVATE SURB KEY",M);
P = M; exit;
[XXXX We do not actually specify how the TAG field and the BODY are
found, prior to be given to this function. So the above works for
delivery via SMTP but not if the final hop is reached via FWD/IP4. The
options are the following:
- We require the final recipient to have a public/private key
and decode the header to extract the TAG/BODY messages.
- We use symmetric encryption for the final header and extract
the TAG field using a secret key.
Any of the two can be used, and it does not have to be standardized
since this operation only affects the final client, but we can give
some advice. -GD]
[I am a bit uneasy that we have not defined any redundancy, or marker,
that would allow the final recipient to decide if this is a forward
path message or a SURBed message. Also the size of valid bytes has
been scrapped, which does not help in extracting the valid bits from
the junk. -GD I guess this comment is now covered by the E2E document -GD]
%%%%%%
[XXXX This prevents the crossover point from seeing an
unencrypted payload. However, using symmetric crypto requires the
SURB generator to keep SURBs confidential from everyone but their
users. George has suggested that we use PK instead, but generating
a fresh RSA key for each SURB slows SURB generation down by a factor
of 30-70. Perhaps a two-mode system is in order. Hmm. -NM]
[XXXX A bit more discussion on this topic: In fact there is, from an
anonymity point of view, very little to gain from using asymetric
crypto. The original point I had in mind had to do with
observability. If the key is symetric then crooked nodes can do
the following:
- observe all users noting which SURB's they are receiving.
They can then note down the keys used.
- When a message is seen at a cross over point, all keys are
tried on it, making the forward path useless, since it can
be linked to a SURB received by someone specific.
My original idea was to use a key-private cryptosystem, to hide
which key was used to encrypt the payload.
I just realised that the above is rubish. If the SURB is not
delivered to receipient anonymously and privately, then it cannot
be used by the recipient anonymously, since it appears as it is in
the header of the packet at the crossover point. Therefore
symetric cryptography is as good as asymetric, key-privat in that
scenario. But this is a point we need to emphasise, that is
obvious when mentioned:
``Do not use SURB's that have been delivered non anonymously
for establishing sender-anonymous communications with others.''
-GD]
%%%%%%%%%%
\subsection{'Stateless' reply blocks}
4. Stateless replies and SMTP (depends on 2 and 3, if I understand correctly)
If a client does not wish to remember all of her outstanding
reply blocks, she may generate them in 'stateless' mode. She
does so by using an SMTP or MBOX delivery type, and setting
the TAG field to
( Encrypt(KEY, nHops | seed) | padding up to 44b)
[nHops: 1 byte; seed: 16 bytes.]
She uses PRNG(seed, nHops*16) to form the up-to-16 SK's for the reply.
She also uses the seed to create the symetric key present in the SURB
as KEYX = HASH(seed| ``PRIVATE SURB KEY'')[0:16];
To understand a message later, the client need only remember (or be
able to reconstruct) KEY.
[Note 1: It would be best for deniability if KEY were the SHA1 hash of
some secure password. On the other hand, since an adversary could
then mount an off-line passing attack on KEY, and since most people
can't construct or remember a good password, it would probably be
safest to store KEY on disk, password-encrypted. All implementations
of stateless replies must support at least this latter mode.]
[Note 2: 'Encrypt' here is not AES in counter mode; that would be
madness. Instead, we use AES in CBC mode.]
[ XXXX if we use CBC mode we will need to use a different IV everytime -GD]
[ XXXX we now use a hash and therefore this is not relevent any more -GD]
[ XXXX It seems that the TAG generated takes up all the space that the TAG
is allowed to use. Would it be a good idea if we allow the constructor
of the SURB to embed more (even propriatery) information in the TAG?
That would require us to use ENCRYPTION instead of H(.) as in the new
E2E document. -GD]
======================
\section{Appendix: MBOX delivery}
Servers that want to support MBOX delivery have an internal list of
users they accept messages to, and an internal mapping from those
users to some delivery mechanism for each one. Typically, this is a
mapping from 'username' to 'username@localhost', and delivery defaults
to local delivery via sendmail. Servers are free to provide other
implementations for MBOX delivery.
MBOX delivery differs from SMTP delivery in that it is not intended
for addressing messages to arbitrary SMTP addresses.
Servers that support MBOX delivery MAY include a [Delivery/MBOX]
section, containing only the entry "Version: 1.0".
The MBOX routing type is used for messages to be delivered to a local
user. The USER field must be NUL-terminated; the TAG field is
free-form.
\section{Appendix: SMTP delivery}
[XXXX This paragraph is no longer accurate. -NM]
At the final hop, when the delivery mechanism is SMTP, we proceed as
follows. If the message is a series of printable characters followed
by some number of NULs, assume we're delivering in ASCII an ISO-XXXX
character set, and send the text portion of the message as an email.
(Where printable == {all characters but hexadecimal 00-06,0E-1F}).
Otherwise, ASCII-armor the message as in 'email transport exchange
format' below.
[This way, plaintext forward messages are delivered as plaintext,
and tagged messages, reply messages, and non-plaintext messages are
all delivered as junk.]
Servers supporting SMTP MAY include a [Outgoing/SMTP] section,
containing only the entry "Version: 1.0".
Servers SHOULD include a note with every SMTP, explaining that the
message is delivered anonymously, and providing an opt-out address and
an abuse contact.
The EMAIL field in the SMTP routing type should be a valid mailbox
[RFC2821]. A mailbox is the canonical form of the "user@domain"
part of an e-mail address. Mixminion uses only mailboxes, because the
display name and comment parts of an e-mail address could potentially be
different for senders who have obtained an address from different
sources (leading to smaller anonymity sets). The EMAIL field must be
NUL-terminated.
[XXXX Be more specific about which mailboxes we allow -- e.g., forbid
supremor@"whitehouse".gov and friends. -NM]
The TAG field is appended to the message in an X-Remailer-Tag header.
[XXXX Not now it isn't. I need to rewrite this section to reference
the E2E spec. -NM]
[XXXX Until we have better answers about abuse prevention, nobody should
actually implement an SMTP module. :) -NM]
[XXXX Actually, I already did. Oops! Let's say that an SMTP module
should support, at least, the ability for the server operator to
blacklist specific hosts, domains, usernames, and addresses. -NM]
[XXXX We need to reference the E2E spec here to explain how to
encapsulate nonprinting messages consistently. -NM]
===============
X. Open Questions.
[All of these are mentioned in more detail below.]
1. Mail gateways. We should specify these.
[Should go into appendix]
[Agreed. We should make an actual list of appendices that we
need, and start writing them all.-NM]
--- NEW FILE: dir-spec.txt ---
Mix Information Exchange format
In order to automate and standardize directory servers, we provide
a standardized extensible server descriptor format.
All server descriptors and statistics blocks follow a simple
section-based key/value format, with items loosely based on RFC822.
[Section1]
Key: Value
Key: Value
Key: Value
Key: Value
Key: Value
[Section2]
Key: Value
Syntax
(Notation: X*: 0 or more occurrences of X.
X+: 1 or more occurrences of X.
X?: 0 or 1 occurrences of X.
X Y: An occurrence of X followed by an occurrence of Y.
X*{Y}: 0 or more occurrences of X separated by occurrences
of Y.
X|Y: Either an occurrence of X, or an occurrence of Y.)
Descriptor = NL Section+
Doctype = (<any printable character but '-'>)+
Section = SectionLine EntryLine*
SectionLine = '[' Word ']' NL+
EntryLine = Word ':' ' ' Data NL+
Word = (<Any printable, non-space character but ':'>)+
Data = (<Any printable character but NL>)*
[Note: For compatibility across different platforms, implementations must
accept all of CR, LF, and CR-LF style newlines.]
Mixminion descriptor blocks
This section describes the format of server descriptors, as uploaded
to and downloaded from directory servers. A server descriptor is a
promise, by a mix's administrators, to provide a given set of
services, keys, and exit policies over a set period of time.
The first section must be a 'Server' section. This section includes
the entries:
'Descriptor-Version': the string "1.0"
'Nickname': A human-readable identifier for this server. If it
contains any periods, it must be a fully qualified DNS name
which resolves to the provided IP for the entire lifetime of
this Descriptor block. It must be no more than 128
characters. It may contain only the characters
[A-Za-z0-9_@] and '-'. It should be treated as
case-insensitive.
'Identity': The modulus of this Mix node's long-term signing key,
represented in ASN.1, and encoded in BASE64. The modulus of
this key should be at least 2048 bits long and no more than
4096 bits long. The exponent of this key must be 65537.
Clients should at least give a warning if the identity key of
any server should ever change.
'Digest': The digest of this block. See below.
'Signature': The signed digest of this block. See below.
'Published': A date/time, in the form 'YYYY/MM/DD HH:MM:SS',
for when this block was generated.
'Valid-After': A date, in the form 'YYYY/MM/DD'. After midnight GMT
on this date, this server must support the operations listed
in this descriptor.
'Valid-Until': A date, in the form 'YYYY/MM/DD'. Until midnight
GMT on this date, this server must support the operations listed
in this descriptor.
'Contact': An email address that may be used to contact the
administrator of this server. Optional field. Must be no
more than 256 characters.
'Contact-Fingerprint': Fingerprint of the server administrator's
PGP key. Optional field. Must be no more than 128
characters.
'Comments': Human-readable information about this server. Must
be <1024 bytes long. It *must not* be necessary to read this
information to use the server properly.
'Packet-Key': The public key used to encode encode subheaders for
this server, encoded in ASN.1, represented in BASE64.
'Packet-Versions': A comma-separated list of allowable major.minor
versions for packets this server will process. In a
production network, only one value should be used for this
field. [Added in Mixminion 0.0.3 -NM]
'Software': A string description of the software this server is
running. Optional; for debugging purposes only. Must be
less than 256 characters. [Added in Mixminion 0.0.3 -NM]
'Secure-Configuration': "yes" or "no". If true, the server must
not be running in an insecure operating mode. [XXXX list
these modes. Added in Mixminion 0.0.4]
The digest of a descriptor block is computed by removing the contents of the
digest and signature fields, and computing the SHA-1 digest of the resulting
ASCII string. (That is, "Digest: DATADATADATA..." is replaced with
"Digest:".) The signed digest is the OAEP/PKCS1 signature of the digest
with the server's identity key. This value is represented in BASE64.
[Note: before computing the digest, all implementations must
normalize CR and CR-LF style newlines to a single NL, and remove
any spaces and tabs that may have been introduced at the ends of
lines.]
If this server accepts incoming MMTP connections, it MAY have an
'Incoming/MMTP' section, with the following entries:
'Version': The string '1.0'
'IP': An IPv4 address, in dotted-quad format.
'Port': A port at which IP accepts incoming MMTP connections.
'Key-Digest': The KEYID of this server, encoded in BASE64.
'Protocols': A comma-separated list of the protocols this
server accepts.
and any number of entries of the form:
'Allow': Address Pattern
'Deny': Address Pattern
If this server supports outgoing MMTP connections, it MAY have a
'Outgoing/MMTP' section, with one entry each of the form:
'Version': The string '1.0'
'Protocols': A comma-separated list of the protocols this server
supports for outgoing connections.
and any number of entries of the form:
'Allow': Address Pattern
'Deny': Address Pattern
The Address Pattern tokens are of the form:
AddressPattern = (IP ('/' Mask)? | '*') (Port ('-' MaxPort)?)?
'*' is a synonym for '0.0.0.0/0.0.0.0'.
An omitted mask defaults to 255.255.255.255. An omitted portrange
defaults to 48099 on ALLOW and 0-65535 on DENY.
The entries are order-significant; the first one to match wins.
The default policy is 'Deny: *'
If this server supports outgoing delivery for a module ABCD, it will
have a [Delivery/ABCD] section. See appendices for more detail on
specific modules, including SMTP and MBOX.
Other services provided by this server should each have their own section.
(Note: A server need not advertise all of its capabilities; it is
permissible (for example) for a server that supports incoming MMTP
connections to omit the Incoming/MMTP section.)
A client should ignore any keys or sections it does not recognize, but
should not use any service whose sections have an unrecognized
descriptor version.
Directories and Directory servers
A directory is a list of Mixminion servers which are believed to
be operational at a given time.
A directory server provides an HTTP URL for uploading server
descriptors, an HTTP URL for downloading a directory, and a long-term
public key (2048-4096 bits).
To upload a descriptor block, a client performs an HTTP POST request
to the upload URL, with the server block as enclosed entity.
To retrieve the directory, a client performs an HTTP GET request on
the directory URL.
A directory takes the following form:
[Directory]
Version: 1.0
Published: YYYY/MM/DD HH:MM:SS
Valid-After: YYYY/MM/DD
Valid-Until: YYYY/MM/DD
Recommended-Software: Mixminion 0.0.2,Mixminion 0.0.3
[Signature]
DirectoryIdentity: Base64-encoded public key, in ASN.1
DirectoryDigest: Digest of this document.
DirectorySignature: Base64-encoded OAEP/PKCS1 signature of this document, with
the contents of this field removed.
[Recommended-Software]
... (see below)
[Server]
(Server descriptor block)
[Server]
(Server descriptor block)
The 'Recommended-Software' section lists versions of Type-III clients
and servers that are currently recommended. Because the version
numbering scheme will be different for each implementation, lines
within 'Recommended-Software' are version specific. The "Mixminion"
program uses the entries 'MixminionClient' and 'MixminionServer'; each
contains a space-separated list of acceptable version numbers, in
order of decreasing preference. If a client is running an
unrecommended version, it must warn the user. [Policy: A development
version of Mixminion (pre 1.0) will only be declared obsolete when it
is either too insecure or too buggy to use, when backward
compatibility is broken, or when a new stable release comes out.
Stable releases will be taken off the list only for security or
privacy reasons.]
Directory servers change their directories only at midnight GMT. Any
client which has not downloaded a directory since before midnight GMT,
must download a fresh directory before generating any packets.
A directory includes all the servers that were uploaded to the
directory before some cutoff time the previous day, and which proved
upon some random number of tests and probings to have a real Mixminion
server running on them. A directory server periodically re-tests
the servers in its directory to make sure they have not gone down.
Because of possible partitioning attacks related to accidentally or
maliciously unsynchronized servers, the presence of multiple directory
servers presents sever security issues. Since solving these issues is
an active research project, we leave them for a later draft.
[XXXX Issues include: How do directory servers synchronize?
What happens when they disagree? How many servers must a client
contact before he/she has enough information? How do we catch
dishonest directory servers? -NM]
[XXXX We should also specify, perhaps, how directories are to order
the server descriptors; what uniqueness constraints there are, and so
on. -NM]
[XXXX Also: statistics information. Also: we should think about
avoiding catastrophic failure modes if directories _do_ fail or
change. -NM]
X. Open questions
2. Description of mixing algorithm should go in descriptor blocks. -NM
[XXXX Unless the mixing method requires special packaging of the
message we could require the servers to specify the amount
of anonymity that they expect to give to each message. The
information theoretic one presented by myself and Andrei
could do the job quite well. -GD]
1. Statistics Information Exchange format
[Not for first cut]
2. Specify: verification for directories.
[not for first cut]
[Actually, we may need these first two pieces for Mixminion 1.0;
else we won't be deployable. -NM]
5. Automatic retrieval of Server Information
[XXXX I think it is important to have a standard way to query a
server given an IP and a port. -GD]
[Why? What's wrong with having the server upload its information to a
directory server? (Not that I disagree, but I want to know the
application for this. Clients can't use it without leaking which
servers they're interested in, and giving servers the opportunity to
lie to clients. What's the upside?) -NM]
[I believe that this will make it more easy to construct Directory
servers. For some reason I have the feeling that it will scale
better if directory servers know about mixes (and can query them
automatically) rather than the other way around (mixes knowing
about directory servers). This way one can run independently a
directory server, without any collaboration from the mix network
(other than the ability to request info).
Let's not forget that the mixes *sign* their information with a
long term key, therefore after you establish that you trust a
signing key to belong to an honest server, the operation of
querying a directory server for updates is simply a question of
transport and not of trust. Of course you still trust them to give
you a information on a complete set of servers, but this can also
be checked. It is also true that the a client requesting only the
information on the servers it is about the use will ruin its
anonymity. On the other hand if key updates are not frequent, then
the client can slowly update its database in the background.
Even more possibilities open up if each mix server give on request
not only their information but also what they think the state of
other servers, they have contacted in the past, is. This way each
server you might contact will give you a set of other servers,
that can be used by clients to construct a complete picture. Which
ones are to be trusted is of course an orthogonal issue, but once
it is decided the updated information could flow very
quickly. (this is in fact a gossip protocol)
These are the reasons why I think it might be a good idea to have
automatic on request information from servers. -GD]
[Hm. You may have a point. I'm still going to suggest that we do
not do this yet, for a few reasons.
First, we need a way for servers to publish their descriptors to
directory servers. (Otherwise, a directory server couldn't learn
about new servers for the first time.) In other words, a push
mechanism is needed no matter what. On the other hand, directory
servers may or may not need a pull mechanism: we do not yet have a
design that requires this. Let's not build it till we have a use
for it.
Second, it's a bit tricky to specify *which* descriptor a server
should return. Today's? Tomorrow's? All the ones it knows about?
Third, (this is a variant of 'First') because client use of this
feature is prone to misuse, we should only provide it to clients
wrapped in some safe mechanism. That safe mechanism has yet to be
specified.
Therefore, I'm going to suggest that we call this feature possibly
desirable... but that we should first design directories or whatever
other node discovery mechanisms *may* want this feature before we
decide that to implement the feature itself. -NM]
--- NEW FILE: minion-spec.txt ---
(This appears to be a binary file; contents omitted.)
--- NEW FILE: path-spec.txt ---
Choosing Paths in Type-III Remailers
====================================
Overview: This document is an appendix to the Type III remailer spec
(``minion-spec.tex'') that describes how to pick paths. All software
that does any form of path selection must[*] implement this
specification.
This spec reflects the current state of the Mixminion implementation.
It may be grievously wrong. Please critique it.
[*] This spec is not optional; if client A and client B pick paths
differently, an attacker can use this information to distinguish
between them.
NOTATION:
SD_U -- the set of all server descriptors
SD_dir -- the set of all server descriptors in the most recently
downloaded directory.
SD_relay -- the set of all server descriptors for servers that can relay
packets. A server can relay messages if it contains both an
Incoming/MMTP and an Outgoing/MMTP section.
SD_SMTP -- the set of all server descriptors for servers that support SMTP
delivery. A server support SMTP delivery if it has both a
[Delivery/SMTP] and an [Incoming/MMTP] section.
SD_MBOX -- the set of all server descriptors for servers that support MBOX
delivery. A server support MBOX delivery if it has both a
[Delivery/MBOX] and an [Incoming/MMTP] section.
Start(s) -- the time at which server descriptor s becomes valid.
Per the spec, this is rounded to midnight GMT.
End(s) -- the time at which server descriptor s becomes invalid.
Per the spec, this is rounded to midnight GMT.
Nickname(s) -- the nickname of the server descriptor s.
INPUTS:
ExitType: one of 'SMTP', 'MBOX', 'DROP', or 'OTHER'.
PathLength: An integer between 2 and 32 inclusive.
InitialServers: A (possibly empty) list of nicknames of servers to
use at the start of the path.
FinalServers: A (possibly empty) list of nicknames of servers to use
at the end of the path.
SwapPoint: Optionally, an integer between 1 and PathLength.
SendTime: The time at which we expect to send the message.
Defaults to 'now'.
ReceiveTime: The time at which we expect the message to be delivered.
Defaults to SendTime plus 3 hours.
[XXXX This is very dumb. It lets an attacker distinguish sharply
between messages sent before and after 9 GMT. -NM]
ALGORITHM:
1. If we have not downloaded a new directory since before the most
recent midnight, GMT, download a new directory.
2. Find all currently valid server descriptors:
Let SD_current = { s | s is in SD_U and Start(s) < SendTime
and ReceiveTime < End(s) }.
3. Find all _distinct_ currently valid servers.
Let N_current = { Nickname(s) | s in SD_current }
[We assume for now, that distinct nicknames imply distinct servers.]
Find the best server for each nickname. For each nickname N, choose
SD_N from { s | SD_current and Nickname(s)=N }, picking the s with
the latest End(s).
Let Ncurrent_RELAY = { n | n in N_current and SD_n in SD_RELAY }
Let Ncurrrent_SMTP = { n | n in N_current and SD_n in SD_SMTP }
4. Now, resolve all provided servers.
For all N[i] in InitialServers
Let InitialPath[i] = SD_{N[i]}. If InitialPath[i] does not exist,
or is not a member of SD_relay, exit with an error.
For all N[i] in FinalServers
- Let FinalPath[i] = SD_{N[i]}. If FinalPath[i] does not exist,
exit with an error.
- If N[i] is other than the last node in the path, and FinalPath[i]
is not a member of SD_relay, exit with an error.
5. Make sure the final server supports the correct exit type.
If FinalPath is not empty,
- Let sd_Exit be the last element of FinalPath.
- If ExitType is SMTP, and sd_Exit is not in SD_SMTP, exit with
an error.
- If ExitType is MBOX, and sd_Exit is not in SD_MBOX, exit with
an error.
(- If ExitType is DROP, don't worry: all servers support DROP.)
(- If ExitType is OTHER, we have no way of telling whether the
server supports it; carry on.)
If FinalPath is not empty, choose a final node.
- If ExitType is SMTP, let FinalPath[1] = a random member of
Ncurrent_SMTP.
- If ExitType is MBOX, exit with an error; MBOX addresses are
server-specific, and require the user to specify a final server.
- Otherwise, we're fine with relay nodes, or we've already
caught the error. Do nothing.
6. Let N_Used = { Nickname(s) | s in InitialPath OR s in FinalPath }.
Let N_Unused = Ncurrent_RELAY - N_used.
Let PathLeft = PathLength - Len(InitialServers) - Len(FinalServers)
This is how many hops we still have to generate.
7. If PathLeft < #N_Unused,
Let MidServers = PathLeft members of N_unused, chosen at random
without replacement.
8. ELSE, if N_Unused >= 3,
Let MidServers = PathLeft members of N_unused, chosen at random
with replacement such that no two consecutive members are equal;
and the first member of MidServers is not the Nickname of the last
server in InitialServers, and the last member of MidServers is not
the nickname of the first server in FinalServers.
9. ELSE, if N_Unused = 2,
Let MidServers = PathLeft members of N_unused, chosen at random
with replacement such that no two consecutive members are equal.
(If N_Unused = { N1, N2 }, this will be either N1,N2,N1,N2... or
N2,N1,N2,N1...).
10. ELSE, if N_Unused = 1 or 0,
Let MidServers = N_Unused. Warn the user that the path will be
too short.
11. Let Path = InitialServers | MidServers | FinalServers. If
Len(Path) < 4, warn the user. If Len(Path) < 2, exit with an
error.
12. If the swap point is provided, then split the path along the swap
point. Else, swap at server CEIL(Len(Path)/2).
X. Open Questions
7. Path selection algorithm.