[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[freehaven-cvs] more edits after finishing the first read-through

To: freehaven-cvs@freehaven.net
Subject: [freehaven-cvs] more edits after finishing the first read-through
From: arma@seul.org (Roger Dingledine)
Date: Wed, 6 Aug 2003 01:22:26 -0400 (EDT)
Delivered-to: archiver@seul.org
Delivered-to: freehaven-cvs-outgoing@seul.org
Delivered-to: freehaven-cvs@seul.org
Delivery-date: Wed, 06 Aug 2003 01:23:18 -0400
Reply-to: freehaven-dev@freehaven.net
Sender: owner-freehaven-cvs@freehaven.net
Update of /home/freehaven/cvsroot/doc/e2e-traffic
In directory moria.mit.edu:/home2/arma/work/freehaven/doc/e2e-traffic

Modified Files:
	e2e-traffic.tex 
Log Message:
more edits after finishing the first read-through


Index: e2e-traffic.tex
===================================================================
RCS file: /home/freehaven/cvsroot/doc/e2e-traffic/e2e-traffic.tex,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3
--- e2e-traffic.tex	31 Jul 2003 00:53:21 -0000	1.2
+++ e2e-traffic.tex	6 Aug 2003 05:22:24 -0000	1.3
@@ -45,7 +45,7 @@
 Since the introduction of mix-nets \cite{chaum-mix} in 1981, a variety
 of attacks against such anonymity systems has been proposed and
 discussed.  While some of these attacks have been partially addressed
-by advancements in mix-net designs\cite{???}, broad classes of
+by advancements in mix-net designs \cite{???}, broad classes of
 attacks remain effective against current mix-net designs.
 
 One such class of attacks are the {\it long-term intersection attacks}
@@ -127,13 +127,14 @@
 \item The sender's behavior is not consistent over time.  If the
   sender does not maintain a group of regular recipients for the
   required duration, the attacker cannot learn the sender's behavior.
-\item The attacker cannot observe the network's cover behavior.  If
+\item The attacker cannot observe the network's steady-state (cover)
+  behavior.  If
   the sender always sends the same number of messages, the attacker
   may not be able to get a view of the network's exit behavior in the
   sender's absence. % Awkwardly phrased!
 \item The attacker cannot tell when the sender is originating
   messages.  This may be the case if the sender is running her own mix
-  node, and injecting messages to it directly.
+  node and injecting messages to it directly.
 \end{itemize}
 
 We begin in section \ref{previous-work} by presenting a brief
@@ -157,8 +158,8 @@
 [Write an overview of the general history of traffic analysis attacks.]
 
 \subsection{The disclosure attack}
-In 200?, Agrawal, Kesdogan, and Pena presented the disclosure
-attack\cite{limits-open}, a traffic-analysis attack against a single
+In 2002, Agrawal, Kesdogan, and Pena presented the disclosure
+attack \cite{limits-open}, a traffic-analysis attack against a single
 sender on a single batch mix.
 
 The disclosure attack assumes that the target sender (``Alice'') sends
@@ -192,7 +193,7 @@
 be $0$.  We model the behavior of the cover traffic as a known
 vector $\vec{u}$ each of whose $N$ elements is $1/N$.
 
-The attacker derives from each output round an observation vector
+The attacker derives from each output round $i$ an observation vector
 $\vec{o_i}$, each of whose elements corresponds to the probability of
 Alice's having sent a message to a particular recipient in that round.
 Taking the arithmetic mean $\bar{O}$ of a large set of $vec{o_i}$
@@ -222,10 +223,11 @@
 First, we relax the requirements related to sender behavior.  In this
 model, we allow Alice to send to her recipients with non-uniform
 probability, and to send multiple messages in a single batch.  We also
-remove the assumption that the attacker has full knowledge the
+remove the assumption that the attacker has full knowledge of the
 distribution $\vec{u}$ of cover traffic.
 
 We model Alice's behavior as a probability distribution $p_n(n)$ of
+% p_i(n)? Why _n? For that matter, why (n)? -RD
 the number of messages she sends each round, and a non-uniform
 behavior vector $\vec{v}$ of the recipients to which Alice sends.
 Thus, Alice's expected contribution to each round is $p_n(n)\vec{v}$.
@@ -234,13 +236,13 @@
 estimate of $\vec{u}$ by observing a large number $t'$ of batches to
 which Alice has {\it not} contributed any messages.  For each such
 batch $i$, the attacker constructs a vector $\vec{u_i}$ containing the
-$1/b$ for recipients that has received a message in that batch, and
+$1/b$ for recipients that have received a message in that batch, and
 $0$ for recipients that have not.  The attacker then estimates
 $\vec{u}$ as:
 \[\vec{u} \approx \bar{U} = \frac{1}{t'}\sum_{i=1}^{t'}\vec{u_i} \]
 
 The attacker then observes, for each round in which Alice {\it does}
-send a message the number of messages $m_i$ sent by Alice, and
+send a message, the number of messages $m_i$ sent by Alice, and
 computes $\vec{o_i}$ as before.  Computing the arithmetic mean
 gives us
 \[\bar{O} = \frac{\vec{o_i}}{\sum t}
@@ -252,13 +254,13 @@
 \[\vec{v} \approx \frac{b}{\bar{m}} 
             \left[ \frac{\vec{o_i}}{t} - (b-\bar{m})\bar{U} \right] \]
 
-\subsubsection{Complex mix behavior}
+\subsubsection{Complex mixing behavior}
 \label{complex-mix}
 Most mix-net designers have already abandoned fixed-batch mixes in
 favor of other mixing algorithms that better obscure the relation
 between senders and recipients.  Such improved algorithms include
 timed dynamic-pool mixes, generalized mixes, and randomized versions of
-each\cite{trickle02}\cite{pet2003-diaz}.
+each \cite{trickle02}\cite{pet2003-diaz}.
 
 We treat these batching algorithms as follows\footnote{this approach
   is based on D\'iaz and Serjantov's work in \cite{pet2003-diaz}}: a
@@ -267,12 +269,11 @@
 the mix's pool at the end of a round have an equal probability of
 being included in that round's batch.  Thus, we can characterize the
 mix's batching algorithm as a probability distribution $P_{MIX}(b|s)$
--- the probability that the mix relays $b$ messages when it has $s$
+--- the probability that the mix relays $b$ messages when it has $s$
 messages in the pool.  Clearly, $\sum_{b=0}^{s}P_{MIX}(b|s) = 1$, for
 all values of s.
 
-  In the case of a timed dynamic-pool mix, this
-distribution is:
+In the case of a timed dynamic-pool mix, this distribution is:
 \[ P_{TDP}(b|s) = 
   \left\{ \begin{array}{l}
     1 \mbox{\ for\ } if s>\mbox{min-pool}, 
@@ -294,15 +295,16 @@
 rounds after it has arrived.  In future discussion, we assume for the
 sake of simplicity that the mix has reached a steady state, so that
 $P_R(r) = P_D^r$ where $P_D$ is the probability of a message in the
-mix's pool being deferred in any given round.
-\footnote{The attacker can estimate $\bar{P_R}$ by
+mix's pool being deferred in any given round.\footnote{The
+attacker can estimate $\bar{P_R}$ by
 sending occasion test messages through the mix, or by counting the
 messages entering and leaving the mix and deducing the pool size.
 Dummy messages can impair this attack technique to an extent discussed
 in \cite{dummy-msgs}.}
 Now, when Alice sends a message in round
 $i$, the attacker observes rounds $i$ through some later round $i+k$
-with negligible $P_R(i+k)$.  The attacker then weights each of these
+(choose $k$ so that $P_R(i+k)$ is negligible).  The attacker then
+weights each of these
 rounds by the expected number of messages from Alice that will exit
 that round, includes them in an adjusted $\bar{O_w}$, and estimates
 $\vec{v}$ as before. \XXXX{Include an equation?}
@@ -314,7 +316,7 @@
 round contains (on average) a small fraction $p_a>0$ of messages from
 Alice, the attacker will obtain 
 \[\bar{U}' \approx p_a \vec{v} + (1-p_a) \vec{u} \]
-and than thus substitute
+and can thus substitute
 \[\vec{u} \approx \frac{1}{1-p_a}\left[ \vec{v} - \bar{U}' \right] \]
 into the earlier equation for $\bar{O}$ and again solve for $\vec{v}$.
 
@@ -328,12 +330,13 @@
 in section \ref{complex-mix}.
 
 Assume for the sake of simplicity that all mixes share a single
-$P_R$, and that Alice chooses a path of length $l_0$.  The chance of
+$P_R$, and that Alice chooses a path of length $\ell_0$.  The chance of
 the message being delayed by a further $d$ rounds is now
-\[  P_R'(l_0+d) = \binom{l_0+d-1}{d} (1-P_D)^{l_0} P_D^d \]
+\[  P_R'(\ell_0+d) = \binom{\ell_0+d-1}{d} (1-P_D)^{\ell_0} P_D^d \]
 If Alice chooses her path length probabilistically according to
-$P_L(l)$, we have 
-\[ P_R'(r) = \sum_{l=1}^r P_L(l) \binom{r-1}{r-l} (1-P_D)^l P_d^{r-l} \]
+$P_L(\ell)$, we have 
+\[ P_R'(r) =
+   \sum_{\ell=1}^r P_L(\ell) \binom{r-1}{r-\ell} (1-P_D)^\ell P_d^{r-\ell} \]
 
 \XXXX{ How can the attacker estimate $P_L$ for Alice?  What if he can't?}
 
@@ -344,12 +347,12 @@
 recipient; or to periodically send messages into the network that
 arrive at recipients other than those she wants to communicate with.
 
-In the first case, the change in trivial: Alice's behavior vector
+In the first case, the change is trivial: Alice's behavior vector
 $\vec{v}$ no longer adds to $1$, since there is now a probability that
 a message from Alice will not reach any recipient.
 
 On the other hand, suppose that some of Alice's dummy traffic reaches
-recipients other than those she ordinarily communicates\footnote{This
+recipients other than those with whom she ordinarily communicates\footnote{This
   discussion ignores the question of how Alice is to generate the text
   of these cover messages.  If some messages in the mix-net are
   unencrypted, it will be difficult or impossible for Alice to compose
@@ -364,10 +367,10 @@
 $\vec{v}$, the attacker instead learns
 \[ \vec{v'} = (1-P_c)\vec{v} + P_c\vec{v_c} \].
 In other words, although the attacker cannot deduce which recipients
-receive messages from Alice, he can instead learns a superset of
+receive messages from Alice, he can instead learn a superset of
 Alice's recipients.
 
-Clearly, if Alice chooses $\vec{v_c}$ so as to send messages with
+Clearly, if Alice chooses $\vec{v_c}$ such that she sends messages with
 equal probability to all recipients, she can completely hide the
 actual recipients of her messages.  In practice, however, this
 requirement in unrealistic:  doing so will require that she send
@@ -394,7 +397,7 @@
 non-global attacker has different characteristics.  If an attacker
 eavesdrops on a fraction of the {\it mixes} in the system, the
 attacker receives a random sample\footnote{But possibly a biased
-  sample, depending Alice's path selection algorithm} of the
+  sample, depending on Alice's path selection algorithm} of the
 messages entering or leaving the system.  An attacker in this case
 will still converge on the same $\bar{O}$ and thus the same
 estimation of Alice's behavior, but will require more rounds of
@@ -402,7 +405,7 @@
 
 Alternatively, an attacker who eavesdrops on a fraction of the {\it
   users} in the system receives {\it all} of the messages sent to or
-from those users, but no messages to send to other users.  So long as
+from those users, but no messages sent to or from other users.  So long as
 one of these users is Alice, the attack can still proceed: To such an
 attacker, the network is as if the messages sent by Alice to
 unobserved recipients were dummy messages.  Now the attack converges
@@ -419,7 +422,7 @@
 on a $\bar{v}$ for any of them.
 
 On the other hand, if Alice's behavior $\vec{v}$ remains consistent
-while the behavior of the cover traffic $\vec{u}$ changes slowly, the
+while the behavior of the steady-state traffic $\vec{u}$ changes slowly, the
 attacker has some hope.  Rather than estimating a single $\bar{U}$
 from observations to which Alice does not contribute, the attacker
 estimates a series of successive $\bar{u_i}$ values based on the
@@ -446,14 +449,14 @@
 probably receives messages with behavior in rounds from which Bob
 probably doesn't receive messages.  The only complication is that the
 attacker cannot tell in advance when Bob will receive a message.
-Therefore, the attacker maintain accumulate a window of observations at
-all times, in Bob later receives a message that may have come from
-one of the rounds tin that window.
+Therefore, the attacker maintains a window of observations at all times,
+such that if Bob later receives a message, the probability that the
+message was sent before the first round in the window is negligible.
 
 \subsection{Strengthening the attack}
 Our previous extensions to the original statistical disclosure attack
 have broadened the original method in order to reveal sender-recipient
-links in a broader and increasingly complex ranger of circumstances.
+links in a broader and increasingly complex range of circumstances.
 In order to compensate for these circumstances, the attacker has been
 forced to observe an increasingly large number of rounds of traffic.
 
@@ -465,7 +468,7 @@
 The attacker's work can be greatly simplified if some messages leaving
 the system are {\it fully linkable}.  Two messages are said to be {\it
   fully linkable} if an observer can tell, with certainty, that they
-originate from the same sender. One examples of a system will fully
+originate from the same sender. One example of a system with fully
 linkable messages would be a pseudonymity service in which each sender
 has one or more pseudonyms, and all delivered messages can be linked
 to a pseudonym.
@@ -479,7 +482,7 @@
 (e.g. pseudonyms) that receive in round $i$.  Since two distinct
 senders don't produce messages in the same linkability class, the
 elements of Alice's $\vec{v}$ and the background $\vec{u}$ are now
-disjoint, easier for the attacker to separate.
+disjoint, and thus easier for the attacker to separate.
 
 \XXXX{Need to analyze below the circumstances under which this works;
   and to what extent it helps.  It's fine for Nyms, but not for
@@ -487,13 +490,14 @@
 
 \subsubsection{Exploiting partial message linkability}
 % subsumes \subsubsection{Exploiting partitioning attacks}
-In the linkability case above, it's possible that the linkage between
-messages may not be full.  In other words, the attacker may be unable
-to deduce that two messages are likelier than average to have the same
+%In the linkability case above,
+It's possible that the linkage between
+messages may not be full.  In other words, the attacker may be able
+to deduce that two messages are more likely than average to have the same
 sender, without being certain that the senders are the same.  For
 example, two binary documents written in the same version of MS Word
-are likelier to be written by the same sender than two messages
-selected at random.  These linkages may be more abstract: an
+are more likely to be written by the same sender than two messages
+selected at random.  These linkages may be more abstract: a
 sophisticated attacker could check for the presence of certain
 keywords and try to link messages based on their textual content.
 
@@ -511,12 +515,15 @@
 each $\left<\mbox{recipient},\mbox{class}\right>$ tuple.  (If a
 message might belong to multiple classes, the attacker sets the
 corresponding element of each possible class to the possibility of the
-message's being in that class.)  The statistical disclosure attack
+message's being in that class.)
+% i don't understand. you mean just add the tuple to all the classes
+% it could possibly be in? -RD
+The statistical disclosure attack
 proceeds as before, but messages sent to different linkability
 classes no longer provide cover for one another.
 
 \XXXX{ This attack seems not to exploit the fact that if Alice sends
-  to $<$Word6, Bob$>$, she is likelier to send to $<$Word6,
+  to $<$Word6, Bob$>$, she is more likely to send to $<$Word6,
   Carol$<$ than to $<$Word2K, Carol$>$.}
 
 Other opportunities for partitioning include:
@@ -526,7 +533,7 @@
 
 \subsubsection{Exploiting {\it a priori} suspicion}
 Finally, the attacker may have reason to believe that some messages
-are likelier to have been sent by the target user than others.  For
+are more likely to have been sent by the target user than others.  For
 example, if we believe that Alice speaks English but not Arabic, or
 that Alice knows psychology but not astrophysics, then we will
 naturally suspect that an English-language message about psychology
@@ -534,7 +541,7 @@
 about astrophysics.
 
 To exploit this knowledge, an attacker can (as suggested in the
-original Statistical Disclosure paper\cite{statistical-disclosure}),
+original Statistical Disclosure paper \cite{statistical-disclosure}),
 modify the estimated probabilities in $\vec{o_i}$ of Alice having sent
 each delivered message.  For example, if 100 messages are sent in a
 round, and the attacker judges that 50 of them are twice as likely to

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/
Prev by Author: [freehaven-cvs] Partial mixnet observation
Next by thread: [freehaven-cvs] Clarify a few points
Index(es):
- Author
- Thread