[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] Cut, trim, fix



Update of /home/freehaven/cvsroot/doc/pynchon-gate
In directory moria.mit.edu:/tmp/cvs-serv20490

Modified Files:
	pynchon.tex 
Log Message:
Cut, trim, fix

Index: pynchon.tex
===================================================================
RCS file: /home/freehaven/cvsroot/doc/pynchon-gate/pynchon.tex,v
retrieving revision 1.49
retrieving revision 1.50
diff -u -d -r1.49 -r1.50
--- pynchon.tex	18 Sep 2004 01:27:26 -0000	1.49
+++ pynchon.tex	18 Sep 2004 02:08:11 -0000	1.50
@@ -64,11 +64,11 @@
 \maketitle
 
 \begin{abstract}
-We present the Pynchon Gate, a practical pseudonymous message retrieval
+We argue for the Pynchon Gate, a practical pseudonymous message retrieval
 system.  Our design uses a simple distributed-trust private information
 retrieval protocol to prevent adversaries from linking recipients to their
-pseudonyms, even when some of the infrastructure has been compromised.  The
-Pynchon Gate design resists global traffic analysis significantly better than
+pseudonyms, even when some of the infrastructure has been compromised.  This
+approach resists global traffic analysis significantly better than
 existing deployed pseudonymous email solutions, at the cost of additional
 bandwidth---although unlike other high-bandwidth pseudonymity designs,
 the Pynchon Gate allows the costs to be distributed over many servers without
@@ -93,16 +93,15 @@
 an attacker to deduce which users are associated with which pseudonyms.
 These systems can be used specifically to provide a mechanism for a user
 to communicate without revealing her identity, or can be used as a
-building-block for other systems which need a bi-directional communication
+building-block for other systems that need a bi-directional communication
 channel, such as Free Haven~\cite{freehaven-berk}. But, as we will argue
 below, most existing deployed solutions are either vulnerable to traffic
 analysis, or require unacceptably large amounts of bandwidth and storage
 as the number of users and volume of traffic increase.
 
-We propose the Pynchon Gate, a novel design that uses distributed-trust
+We propose the Pynchon Gate, a design that uses
 private information retrieval (PIR)~\cite{pir} primitives to build a secure,
 fault-tolerant pseudonymous mail retrieval system.
-
 In our system, pseudonymous users (or ``nym holders'') use an existing
 anonymous email network (such as Mixmaster~\cite{mixmaster-spec} or
 Mixminion~\cite{mixminion}) to send authenticated requests to a {\it nym
@@ -110,13 +109,13 @@
 administrative commands.  The nym server also receives incoming messages and
 passes them to a {\it collator} component, which encrypts the messages and
 periodically packages them into regular batches.  These batches are
-then replicated at a number of \emph{distributor} servers, which,
-through the use of the \emph{distributed-trust private information retrieval
-protocol}, allow nym owners to receive mail while
+then replicated at a number of \emph{distributor} servers, which
+use a  private information retrieval
+protocol to allow nym owners to receive mail while
 maintaining unlinkability between a message and its recipient.
 
 \subsubsection{Goals}
-First of all, our design must be {\it secure}: we want the Pynchon Gate to
+First, our design must be {\it secure}: we want the Pynchon Gate to
 resist active and passive attacks at least as well as the state of the art
 for forward message anonymity.  Thus, we should protect users' identities
 from a global eavesdropper for as long as possible; should hinder active
@@ -124,14 +123,13 @@
 attacker who has compromised some (but not all) of the servers on the
 network.
 
-In order to provide real security, however, we must ensure that the system is
+In order to provide security, however, we must ensure that the system is
 {\it deployable} and {\it usable}---since anonymity and pseudonymity systems
 hide users among each other, fewer users means less
-protection~\cite{econymics}.  This implies that we should gracefully handle
+protection~\cite{econymics}.  This implies that we should handle
 node failure without loss of mail; that we must not require more bandwidth
-than volunteer servers are able to provide or users are willing to use; and
-that we should not require a complicated interface or special knowledge from
-our users.
+than volunteer servers can provide or users are willing to use; and
+that we should not require a complicated interface.
 
 \subsubsection{In this paper}
 We begin in Section~\ref{sec:background} with a discussion of related work,
@@ -140,11 +138,11 @@
 on the effectiveness of passive traffic analysis against current reply-block
 based nym servers.)  Section~\ref{sec:design} presents the Pynchon Gate in
 more detail, describing its organization, design rationales, and network
-formats.  We describe our simple distributed trust PIR protocol in
+formats.  We describe our simple PIR protocol in
 subsection~\ref{subsec:protocol-design}.  In Section~\ref{sec:security} we
 analyze security, and in Section~\ref{sec:performance} we discuss
 optimizations and compare our performance to that of other pseudonymous
-message systems.)  We close with an evaluation of our success in
+message systems.)  We close with an evaluation of our design in
 Section~\ref{sec:conclusions}.
 
 \section{Background and attacks}
@@ -164,8 +162,8 @@
 Mixminion~\cite{mixminion} anonymous remailer networks.  It is trivial to use
 these systems to {\it send} pseudonymous messages: the sender can make an
 anonymous message pseudonymous by signing it with a public
-key associated with her pseudonym.  Thus, the designs below focus on the
-problem of {\it receiving} messages sent to a pseudonymous address.
+key associated with her pseudonym.  Thus, these designs focus on how to
+{\it receive} messages sent to a pseudonymous address.
 
 \subsubsection{Reply blocks and return addresses.}
 In 1981, Chaum~\cite{chaum-mix} described a method of using \emph{return
@@ -189,44 +187,45 @@
 first generation implementation of Chaum's mix-nets (Type I
 remailers~\cite{hal-remailer}), such as {\tt
 alpha.c2.net}~\cite{alpha-faq} and {\tt
-nym.alias.net}~\cite{nym-alias-net}, implemented a central
+nym.alias.net}~\cite{nym-alias-net}, implement a central
 reply-block repository which allowed pseudonym holders to receive messages
-delivered to a traditional email address. Unfortunately, Type I remailers
+delivered to a email address. Unfortunately, Type I remailers
 allow multiple uses of their reply blocks, which are vulnerable to replay and
 flooding attacks as discussed in~\cite{remailer-attacks,tcmay}
-The Type I anonymous remailer system suffers
-greatly because of this. Type II (Mixmaster) and Type III
+%The Type I anonymous remailer system suffers
+%greatly because of this. 
+Type II (Mixmaster) and Type III
 (Mixminion~\cite{mixminion}) systems do not permit multiple-use reply
-blocks, and contain replay-attack protection mechanisms~\cite{replay}.
+blocks, and prevent replay attacks~\cite{replay}.
 
 \subsubsection{Single-use reply blocks.} 
-While the Type II system does not have any means of support for anonymous
+While the Type II system does not support anonymous
 reply blocks, the Type III (Mixminion) system introduces single-use reply
 blocks
-(SURBs)~\cite{surb} as a means of avoiding the replay attack issues. The
+(SURBs)~\cite{surb} to avoid replay attacks. The
 Type III protocol requires the recipient to create a large number of reply
-blocks to be used by those who wish to send her mail. In practice, this is
+blocks for senders to use. In practice, this is
 likely to be automated by a nym server~\cite{pop-mix} that
 stores a number of SURBs and uses them to deliver pseudonymous mail to
 to the recipient---one such design is Underhill~\cite{underhill-spec}.
-The technique used in Type III also has
+Type III also has
 the property that the forward and reply messages share the same anonymity
-set, which is a significant security improvement over Type I, and recent
+set, and recent
 work has been done by Danezis and Laurie on attack-resistant anonymous
 packet formats suitable for reply messages~\cite{minx}. However, since
-reply blocks are still being used, reliability issues remain.\footnote{If
+reply blocks are still being used, reliability issues remain. (If
 any given node in the pre-selected SURB is defunct at the time mail is
-set to be delivered, the mail will be lost.} Reply block systems are also
-susceptible to intersection attacks~\cite{disad-free-routes}. A global
+set to be delivered, the mail will be lost.) Reply block systems are also
+susceptible to intersection attacks~\cite{disad-free-routes}: A global
 observer can collect data on who is sending and receiving mail, and given
 enough time and data, will be able to reliably determine who is talking to
-whom via statistical correlation~\cite{statistical-disclosure}.
+whom~\cite{statistical-disclosure}.
 
 \subsubsection {Network-level client anonymity.}
-The ZKS Freedom Network~\cite{freedom2-arch} provided anonymous IP access
+The ZKS Freedom Network~\cite{freedom2-arch} provided anonymous access
 to a POP3 server~\cite{freedom2-mail}, enabling its users to maintain
 pseudonyms using standard email protocols. Freedom was discontinued due to
-high operating expenses, especially in bandwidth costs.
+high operating expenses, especially in bandwidth.
 Other network-level anonymity systems, such as
 Pipenet~\cite{pipenet}, Onion Routing~\cite{goldschlag96}, the Java Anon
 Proxy~\cite{jap}, or Tor~\cite{tor-design}, could be used
@@ -268,7 +267,7 @@
 Chaum discusses a traffic-analysis prevention method wherein all reply
 mail in the anonymous mail system is sent to all possible recipients. A
 less invasive optimization has already been implemented in the form of
-Usenet mail drops~\cite{aam}: an anonymous remailer can be instructed to
+Usenet mail drops~\cite{aam}: an anonymous remailer can
 deliver mail to a newsgroup, rather than to an email recipient. Such mail
 can be encrypted to a recipient's private key, and left for her to
 collect from the newsgroup. If recipients use the same
@@ -301,22 +300,23 @@
 The Pynchon Gate is a group of servers that provide anonymous message
 retrieval capabilities (see figure~\ref{fig:pyn-arch}). A nym server receives
 messages for different
-pseudonym accounts via email\footnote{The servers could also receive
+pseudonym accounts via email.\footnote{The servers could also receive
 messages through any suitable medium for message transfer, such as
-``instant message'' systems~\cite{rfc-2779}. We require a
+``instant message'' systems~\cite{rfc-2779}.
+We require a
 forward anonymity protocol to allow the nym holder to communicate
 with the nym server, so at a minimum the nym server must be able to
-receive email in addition to any optional support for other protocols.
-Future developments in forward anonymity protocols may alleviate this
-reliance on email. In our system, the nym server may communicate directly
-with Mixminion nodes via the direct communication mechanism in Mixminion.}.
+receive email in addition to any optional support for other protocols.}
+%Future developments in forward anonymity protocols may alleviate this
+%reliance on email. In our system, the nym server may communicate directly
+%with Mixminion nodes via the direct communication mechanism in Mixminion.
 Once every ``cycle'' (e.g., 24 hours), the nym server passes these messages to
 a collator, which
 batches them into an indexed ``bucket pool,'' and
 and passes these pools to each independently operated distributor node in
-the network.  A Pseudonym holder then makes a series
+the network.  Each pseudonym holder then makes a series
 of requests to $k$ chosen distributor nodes, enabling her to receive her
-messages without the individual distributors determining the identity of the
+messages without the individual distributors determining the
 pseudonym being requested. The protocol used is resistant to collusion:
 even if the adversary controls $(k-1)$ of the chosen distributors, 
 the adversary
@@ -331,9 +331,8 @@
 \end{figure}
 
 This distributed-trust PIR-based message retrieval system lets us keep the
-convenience,
 reliability, and security of the ``send everything everywhere'' method,
-while bringing the system back into the realm of feasibility for users
+while bringing the system into the realm of feasibility for users
 with limited resources.
 %The information theory-based PIR primitive allows
 %us to avoid the use of cryptography wherever possible. A basic analysis of
@@ -347,18 +346,18 @@
 The public-facing side of The Pynchon Gate consists of a nym server that
 sends and receives pseudonymous email. The nym server itself provides no
 sender anonymity; rather, it relies on existing mix
-networks~\cite{mixminion,mixmaster-spec} for forward message anonymity.
+networks~\cite{mixminion,mixmaster-spec}.
 The nym server is visible to external email correspondents, and receives
 messages for the nym owners at their specified email addresses.
 
-Nym servers manage email accounts for pseudonyms (\emph{nym accounts}).
-For each pseudonym, the nym server stores a \emph{long-term
-public key,} used by the nym holder to
+Nym servers manage email accounts for pseudonyms.
+For each pseudonym, the nym server stores a long-term
+public key used by the nym holder to
 encrypt and authenticate outgoing email and administrative messages.
 Similarly, nym server stores a \emph{short-term shared secret}
-associated with each account, used to encrypt messages to the nym holder.
-These keys are created at account creation time; the shared secret can be
-reset by the nym holder after creation.
+for each account, used to encrypt messages to the nym holder.
+This secret can be
+reset by the nym holder after account creation.
 
 The shared secret is updated every cycle, such that, if $S[i]$ is the shared
 secret in cycle $i$, then $S[i+1] = H(S[i]|\mbox{\tt "NEXT CYCLE"})$, where
@@ -371,8 +370,8 @@
   $\SUBKEY(0,i) = H(S[i] | \mbox{\tt "NEXT SECRET"})$.
 
 Once it no longer needs a shared secret or a given subkey, the nym server
-drops it immediately, to limit the impact of key compromise (at the nym
-server or at the client) and help achieve forward security.  We use a
+drops it immediately, to limit the impact of key compromise (at the server or
+client) and improve forward security.  We use a
 separate chain of keys for each cycle so that it is easier for a user to
 resynchronize after missing a few cycles.
 
@@ -386,7 +385,7 @@
 cycle.
 
 Finally, the nymserver also generates an different independent identifier for
-each user, every cycle: $\UserID{}[i] = H(S[i] | \mbox{\tt "USER ID"})$.
+each user every cycle: $\UserID{}[i] = H(S[i] | \mbox{\tt "USER ID"})$.
 
 \subsection{The Collator}
 
@@ -400,11 +399,11 @@
 ordered by $\UserID$.  The index buckets contain, for each $\UserID$ (in
 order), the first message bucket containing that user's messages, and a
 digest of that bucket.  Finally, the meta-index lists, for each index bucket,
-the first and last $\UserID$ in that bucket, and a digest of that bucket.
-(See figure~\ref{fig:tree}.)
+the first and last $\UserID$ in that bucket, and a digest of that bucket
+(ee figure~\ref{fig:tree}).
 The index buckets and the message buckets together comprise the cycle's
-"bucket pool."  To ensure integrity, each bucket contains a hash of the
-next bucket in the pool.
+``bucket pool.''  To ensure integrity, each bucket contains a hash of the
+next.
 
 \begin{figure}
 \begin{center}
@@ -420,24 +419,24 @@
 To prevent an attacker from flooding a nym and observing which user receives
 a large volume of traffic, each nym has a maximum number of message buckets
 that may be filled in a given cycle.  If there are more pending messages than
-will fit in the nym's buckets, the collator defers some messages for a later
-cycle\footnote{As an extension to save bandwidth and prevent denial of
+will fit in the nym's buckets, the collator defers some for a later
+cycle.\footnote{As an extension to save bandwidth and prevent denial of
   service attacks, the
   nym server can build a special ``summary message'' containing the headers of
   pending emails, and their opaque identifiers, and include this message in
   the user's message bucket.  The user can then send a signed control message
   to the nym server requesting that unwanted emails be deleted and desired
-  ones given priority.}.  Because the encrypted messages are prefixed with
+  ones given priority.}  Because the encrypted messages are prefixed with
 $H(\SUBKEY(j,i) | \mbox{\tt "ID"})$, the user can tell which key to use for
-messages that are delivered out of order, or in different cycles.
+messages that are delivered out of order.
 
 \subsection{Distributors and clients}
 Once the collator is done, it relays the signed meta-index, and the entire
 bucket pool, to a set of
 independently operated \emph{distributor nodes}. (This data should be
 transmitted using a bandwidth-sparing protocol such as
-BitTorrent~\cite{bittorrent}, so that the collator does not need to transmit
-the entire set of buckets to each distributor individually.)
+BitTorrent~\cite{bittorrent}, so that the collator does not need to send
+the entire pool to each distributor.)
 
 At this point, clients can download their messages for the cycle.  First, a
 given client downloads the meta-index from a randomly chosen distributor, and
@@ -445,7 +444,7 @@
 and uses the meta-index to tell which index bucket will contain an entry for
 that $\UserID$.
 
-The client then uses the \emph{Pynchon Gate PIR Protocol} (described below)
+The client then uses the PIR protocol described below
 to retrieve the correct index bucket, checks that the bucket's digest is as
 expected, and uses the index bucket to learn which message buckets will
 contain the client's messages.  The client downloads these buckets with PIR,
@@ -458,23 +457,23 @@
 bucket pools for a reasonable window of time, to be sure that all clients
 have time to download their messages.
 
-\subsubsection{The Pynchon Gate PIR Protocol}
+\subsubsection{The PIR Protocol}
 \label{subsec:protocol-design}
-The Pynchon Gate distributed-trust PIR protocol allows a client to retrieve a
+This simple PIR protocol allows a client to retrieve a
 bucket from $k$ chosen distributors, so that an attacker cannot tell which
 bucket the client is retrieving without compromising or controlling all $k$
 of the servers.
 
 The protocol runs as follows: after choosing distributors, the client
-establishes an encrypted session to each
+establishes an encrypted connection to each
 (e.g., using TLS).  These connections must be unidirectionally authenticated
 to prevent man-in-the-middle attacks, and can be made sequentially, or in
 parallel.
 
 The client the client sends a different ``random-looking'' bit vector
 $v_{sb}$ to each distributor $s$, for each bucket $b$ to be retrieved.  Each
-bit vector has a length equal to the number of buckets in the pool.  In
-response, each distributor $s$ performs a linear scan across all buckets, and
+bit vector has a length equal to the number of buckets in the pool.  Each
+distributor $s$ then
 computes $R(v_{sb})$ as the XOR of all buckets whose positions is set to $1$
 in $v_{sb}$.  The resulting value is then returned to the client.
 
@@ -486,8 +485,7 @@
 
 As an optimization, a client may send $k-1$ of the distributors a key for a
 stream cipher instead of a full bit vector. The distributors can use the
-stream in place of the vector.  This reduces the size of the request from
-linear on the number of buckets to fixed~\cite{prng-back}.  One distributor,
+stream in place of the vector~\cite{prng-back}.  One distributor,
 however, will still need to receive a full vector.
 
 %To prevent man-in-the-middle attacks, TLS is used as the protocol's
@@ -605,7 +603,7 @@
 from distributors and respond to garbled data without leaking information
 about which bucket it was requesting.
 
-\subsubsection{"{\it Who am I?}" attack.} 
+\subsubsection{``{\it Who am I?}'' attack.} 
 An attacker may send messages intended for nym Alice to nym Bob instead,
 to confirm that Alice and Bob are the same nym holder~\cite{gd-thesis}.
 
@@ -691,25 +689,24 @@
 but not for other users, and deduce that Alice is likely associated with {\tt
   nym33}.
 
-Recent work~\cite{statistical-disclosure,e2e-traffic} has studied the use of an
-implementation of these intersection attacks called ``Statistical
-Disclosure,'' where an attacker compares network behavior when Alice has sent
+Recent work~\cite{statistical-disclosure,e2e-traffic} has studied an
+implementation of these intersection attacks called statistical
+Disclosure, where an attacker compares network behavior when Alice has sent
 to network when she is absent in order to link an {\it anonymous} sender
 Alice to her regular
 recipients $\mbox{Bob}_1 ...  \mbox{Bob}_N$.   Against {\it pseudonymous}
 recipients, however, these attacks are significantly easier: in the anonymity
 case we assume that many senders may send to any given recipient
-$\mbox{Bob}_i$, but with pseudonymous delivery, we only one
-user sends or receives messages for a given pseudonym.  This additional
-linkability makes the attack converge more quickly.
+$\mbox{Bob}_i$, but with pseudonymous delivery, only one
+user sends or receives messages for a given pseudonym.
 
-To examine this effect, we ran a modified version of the attack simulations
+To examine this effect, we ran a version of the attack simulations
 described in~\cite{e2e-traffic}, assuming a single target pseudonym and $N$
-non-target pseudonyms providing cover.  In order to make the attacker's job
+non-target pseudonyms providing cover.  In order to make the attack
 as difficult as possible (and thus establish an upper bound for security), we
-assume that all users behave identically: they receive messages with equal
+assume that users behave identically: they receive messages with equal
 probability according to independent geometric distributions in each unit of
-time (receiving no messages with probability $1-P_M$); they each use
+time (receiving no messages with probability $1-P_M$); they use
 identical reply blocks with path length $\ell$ through mixes in a steady
 state that delay each message each round with probability $P_D$.
 
@@ -726,9 +723,9 @@
 messages before the attacker guessed correctly.  For an active user, this is
 well within a month's expected traffic.
 
-Although there are various proposals to use dummy traffic to resist
+Although there are ways to use dummy traffic to resist
 statistical disclosure attacks, these difficult to implement perfectly in
-practice (due to outages and so forth) and even slight imperfections render
+practice (due to outages) and even slight imperfections render
 users vulnerable to attack~\cite{e2e-traffic}.
 
 \subsection{Other security concerns}
@@ -766,7 +763,7 @@
 sets would remain separate.)
 
 The PIR algorithm suggested in this paper does
-not have optimal asymptotic performance, especially in terms of bandwidth.
+not have optimal asymptotic performance, especially in bandwidth.
 We present it nonetheless because it
 is easy to explain, implement, and analyze, and offers sufficient
 scalability to be useful. A reasonable engineering plan is to implement
@@ -814,12 +811,12 @@
 %
 %Describe the derivation of each value.
 
-We have evaluated the resource requirements of various nym server systems
+We have evaluated the resource requirements of various pseudonymity systems
 described in Section~\ref{subsec:related-work}, and compare their
 respective performance in figure~\ref{fig:performance}. Bandwidth
 requirements for the independent
 components of the pseudonym system are averages per cycle. We use the term
-"infrastructure" to denote mix nodes in the Type I (Cypherpunk) and Type
+``infrastructure'' to denote mix nodes in the Type I (Cypherpunk) and Type
 III (Underhill~\cite{underhill-spec}) nym server systems, NNTP
 servers~\cite{rfc-1036} for the Usenet news drop, and distributors in
 Pynchon Gate. $N$ is the total number of users in the system. $\Vol_i$ is
@@ -934,7 +931,7 @@
 stronger anonymity assurance and greater robustness than any other
 high-latency pseudonymous message retrieval system theorized or deployed.
 This system, when properly utilized, resists traffic analysis better than
-existing systems, and offers convenient scalability options.
+existing deployed systems, and offers convenient scalability options.
 
 We have proposed a client protocol that does not rely upon an obtrusive
 user interface. Much work remains in the field of effective user interface

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/