[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[freehaven-cvs] Edit paper; fix some math

To: freehaven-cvs@freehaven.net
Subject: [freehaven-cvs] Edit paper; fix some math
From: nickm@seul.org (Nick Mathewson)
Date: Tue, 2 Dec 2003 15:14:50 -0500 (EST)
Delivered-to: archiver@seul.org
Delivered-to: freehaven-cvs-outgoing@seul.org
Delivered-to: freehaven-cvs@seul.org
Delivery-date: Tue, 02 Dec 2003 15:15:33 -0500
Reply-to: freehaven-dev@freehaven.net
Sender: owner-freehaven-cvs@freehaven.net
Update of /home/freehaven/cvsroot/doc/e2e-traffic
In directory moria.mit.edu:/tmp/cvs-serv12779

Modified Files:
	e2e-traffic.tex 
Log Message:
Edit paper; fix some math

Index: e2e-traffic.tex
===================================================================
RCS file: /home/freehaven/cvsroot/doc/e2e-traffic/e2e-traffic.tex,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -d -r1.4 -r1.5
--- e2e-traffic.tex	6 Aug 2003 19:30:42 -0000	1.4
+++ e2e-traffic.tex	2 Dec 2003 20:14:48 -0000	1.5
@@ -9,10 +9,22 @@
 \newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
 \newcommand\XXXX[1]{{\small\bf [XXXX #1]}}
 
+\newcommand\PMIX{P_{\mbox{MIX}}}
+
+\newenvironment{tightlist}{\begin{list}{$\bullet$}{
+  \setlength{\itemsep}{0mm}
+    \setlength{\parsep}{0mm}
+    %  \setlength{\labelsep}{0mm}
+    %  \setlength{\labelwidth}{0mm}
+    %  \setlength{\topsep}{0mm}
+    }}{\end{list}}
+
 \begin{document}
-\title{Practical Intersection Attacks: \\
-       Extending statistical disclosure attacks to real-world mix-nets}
-  
+\title{Practical Traffic Analysis: \\
+       Extending statistical disclosure attacks toward real-world mix-nets}
+% XXXX Okay; maybe these mix-nets aren't quite real-world.  Any suggestion
+% XXXX for a better title?
+
 \author{Nick Mathewson \\ The Free Haven Project
   \\ \emailaddr{nickm@freehaven.net} }
 
@@ -20,28 +32,26 @@
 \centerline{\LARGE\bf *DRAFT* --- not for publication}
 %======================================================================
 \begin{abstract}
-We extend earlier work on long-term end-to-end traffic analysis.
-Whereas earlier work addressed cases in which a global eavesdropper
-was trying to learn sender-receiver connections at a single batch mix
-whose senders had known behavior patterns, this paper describes how an
-analogous attack can reveal sender-receiver connections to a
-non-global eavesdropper without prior knowledge of sender behavior
-patterns on network of timed dynamic-pool mixes with dummy messages.
-Each weakening of earlier assumptions increases the time over which
-the attacker must observe the mix-net, but has little effect on the
-attacker's storage or computational requirements.  Additionally, we
-describe how such an attacker can exploit extra information about
-message linkability in order to reduce the number of rounds required
-to link senders to recipients.  We present analysis of the amount of
-information needed to mount each of these attacks along with
-experimental results from mounting these attacks on a simulated
-mix-net.  Finally, we propose a few mechanisms for resisting these
-attacks.
+We extend earlier work on passive long-term end-to-end traffic analysis.
+Whereas earlier work addressed cases in which a global passive eavesdropper
+was trying to learn sender-receiver connections at a single batch mix whose
+senders had known behavior patterns, this paper describes how an analogous
+attack can reveal sender-receiver connections, even when the eavesdropper is
+non-global, the sender behavior is unknown, the substrate is a network of
+timed dynamic-poll mixes, and senders use padding messages.  Each weakening
+of earlier assumptions increases the time over which the attacker must
+observe the mix-net, but has little effect on the attacker's storage or
+computational requirements.  Additionally, we describe how such an attacker
+can exploit extra information about message linkability in order to reduce
+the number of rounds required to link senders to recipients.  We present
+simulated results for each of our atttacks, focusing on the amount of
+information needed to link senders and recipientes.  Finally, we discuss
+few mechanisms for partially resisting these attacks.
 \end{abstract}
 
 %======================================================================
 \section{Introduction}
-\label{intro}
+\label{sec:intro}
 Since the introduction of mix-nets \cite{chaum-mix} in 1981, many
 attacks against such anonymity systems have been proposed and
 discussed.  While some of these attacks have been partially addressed
@@ -49,12 +59,17 @@
 attacks remain effective against current mix-net designs.
 
 One such class of attacks are {\it long-term intersection attacks}
-wherein an eavesdropper observes a large volume of network traffic,
+wherein a passive eavesdropper observes a large volume of network traffic,
 and notices over time that certain recipients are more likely to
-receive messages when given senders are transmitting messages.
-Categorical defenses against this attack tend to require either an
-impractically large amount of cover traffic or a set of senders with
-perfect uptimes.
+receive messages after given senders have transmitting messages.  Although
+these correlations tend to be slight, given enough time, an attacker can
+use them to deduce which senders are communicating with which recipients.
+Categorical defenses against long-term intersection attacks tend to require
+either an impractically large amount of cover traffic (by requiring senders
+to send dummy messages to every possible recipient whenever they want to send
+a single message to anyone) or a set of senders
+with perfect uptimes who send messages continually (so that the attacker can
+never learn how the network behaves in the absence of particular senders).
 % i would argue they require both? if you do great cover traffic but
 % aren't online much, you lose; and if you're online but not sending
 % much, you also lose.   -RD
@@ -80,43 +95,45 @@
 
 Danezis presents an algorithmically simpler {\it statistical
 disclosure attack} \cite{statistical-disclosure} that requires far
-less computational effort on the behalf of the attacker.  This attack
+less computation on the behalf of the attacker.  This attack
 is easier to describe and implement, but it assumes the same
 restrictive sender and network models as the original disclosure
 attack.
 
-In this paper, we extend Danezis's original statistical disclosure
-attack to work in less restricted circumstances.  Specifically, if a
-given mix-net sender (hereinafter called ``Alice'') maintains a
-consistant behavior pattern over time, we show how an attacker can
-learn Alice's regular recipients, even when:
-\begin{itemize}
-\item Alice chooses non-uniformly among her
-  communication partners, send multiple messages at once, and has
-  some recipients.
+In this paper, we extend Danezis's original statistical disclosure attack to
+work in more real-world circumstances.  Specifically, if a given mix-net
+sender (hereinafter called ``Alice'') maintains a fairly consistant behavior
+pattern over time, we show how an attacker can learn Alice's regular
+recipients, even in the presence of one or more of these deviations from the
+origigal disclosure attack's model:
+\begin{tightlist}
+\item Alice chooses non-uniformly among her communication partners, and sends
+  multiple messages per single batch.
 \item The attacker lacks {\it a priori} knowledge of the network's
   average behavior when Alice is not sending messages.
-\item Mixes use a better batching algorithm, such as the timed
-  dynamic-pool algorithm \cite{trickle02} used by Mixmaster
-  \cite{mixmaster-spec}, or the generalized mix algorithm proposed by
-  \cite{pet2003-diaz}.
-\item Alice uses a path through a mix network instead of just a
-  single mix.
-\item Alice sends some dummy traffic delivered to no particular
-  recipient.
+\item Mixes use a different batching algorithm, such as the timed dynamic-pool
+  algorithm \cite{trickle02} used by Mixmaster \cite{mixmaster-spec}, or the
+  generalized mix algorithm proposed by \cite{pet2003-diaz}.  (With these
+  algorithms, mixes do not relay all of the messages they hold whenever they
+  receive a batch of $b$, but instead hold messages in a ``pool'' for a
+  randomized number of rounds, based on the number of messages in the pool.)
+\item Alice uses a mix network (of any topology) relaying her messages through
+  a succession of mixes, instead of using just a single mix.
+\item Alice sends some intra-network traffic padding delivered to no particular
+  recipient, in order to disguise when she is sending real messages.
 \item The attacker can only view a subset of the messages entering and
   leaving the network, so long as this subset includes some messages
   from Alice and some messages to Alice's recipients.
 \item The cover traffic generated by other senders changes
-  continuously over time.  (We do not address this case completely).
-\end{itemize}
-Each of these variations increases the amount of traffic the attacker
-must observe in order to learn Alice's regular recipients.
+  slowly over time.  (We do not address this case completely).
+\end{tightlist}
+Each of these variations from the original statistical disclosure attack's
+model increases the amount of traffic the attacker must observe in order to
+learn Alice's regular recipients.
 
-Additionally, we show how an attacker can exploit additional
-knowledge about the anonymity system (or external to it) to speed up
-these attacks.  Such information includes:
-\begin{itemize}
+Additionally, we show how an attacker can exploit additional knowledge to
+speed up these attacks.  Such knowledge includes:
+\begin{tightlist}
 \item Statistical linkability between messages.  For example, a pair
   of messages written in the same language is likelier to have been
   written by a single sender than is a pair of messages written in
@@ -126,57 +143,62 @@
   certainly written by a single sender.
 \item {\it A priori} suspicion of certain messages having originated
   or not originated from Alice.  For example, messages written in a
-  language the sender doesn't speak are unlikely to have been written
-  by the sender.
-\end{itemize}
+  language Alice doesn't speak are unlikely to have been written
+  by Alice.
+\end{tightlist}
 
 The attacks in this paper fail to work when:
-\begin{itemize}
+\begin{tightlist}
 \item Alice's behavior is not consistent over time.  If Alice does not
   produce enough traffic with the same group of regular recipients,
   the attacker cannot learn Alice's behavior.
-\item The attacker cannot observe the network's steady-state (cover)
-  behavior.  If Alice
-  always sends the same number of messages, the attacker
-  may not be able to get a view of who receives messages in Alice's
-  absence. % Awkwardly phrased!
+\item The attacker cannot observe how the network behaves when Alice isn't
+  sending messages. If Alice always sends the same number of messages, in
+  every round, forever, the attacker may not be able to learn who receives
+  messages in Alice's absence.
   % Also, is it right to call this ``steady state'' behavior?  It's
   % really `The network's behavior when Alice isn't around'.
 \item The attacker cannot tell when the sender is originating
-  messages.  This may be the case if the sender is running her own mix
-  node and injecting messages into it directly.
-\end{itemize}
+  messages.
+  % This may be the case if the sender is running her own mix
+  %node and injecting messages into it directly.
+\end{tightlist}
 
-We begin in section \ref{previous-work} by presenting a brief
+We begin in section \ref{sec:previous-work} by presenting a brief
 background overview on mix-nets, traffic analysis, the disclosure
 attack, and the statistical disclosure attack.  In section
-\ref{extending}, we present our enhancements to the statistical
-disclosure attack.  We analyze the effectiveness of these enhancements
-in section \ref{analysis}, and present simulated experimental results
-in section \ref{simulation}.  We close in section \ref{conclusion}
+\ref{sec:extending}, we present our enhancements to the statistical
+disclosure attack.  We present simulated experimental results
+in section \ref{sec:simulation}.  We close in section \ref{sec:conclusion}
 with recommendations for resisting this class of attacks, implications
 for mix-net design, and a set of open questions for future work.
 
 %======================================================================
 \section{Previous work}
-\label{previous-work}
+\label{sec:previous-work}
 
 \subsection{Mix-nets}
-[Write something here to tell people what mix-nets are.]
+\label{subsec:mix-nets}
+[Write something here to tell people what mix-nets are; maybe take the
+first draft from the Minion paper.]
 
 \subsection{Traffic analysis}
-[Write an overview of the general history of traffic analysis attacks.]
+\label{subsec:traffic-analysis}
+[Write an overview of the general history of traffic analysis attacks.
+Remember to check out disad-freeroutes and langos02.]
 
 \subsection{The disclosure attack}
+\label{subsec:disclosure-attack}
 In 2002, Agrawal, Kesdogan, and Pena presented the disclosure
 attack \cite{limits-open}, a traffic-analysis attack against a single
 sender on a single batch mix.
 
-The disclosure attack assumes that the target sender (``Alice'') sends
-messages to $m$ recipients; that Alice sends a single message to one
-of them chosen at random per batch of $b$ messages; and that the other
-$b-1$ messages in each batch are chosen at random from the set of $N$
-possible recipients.
+The disclosure attack assumes that the attacker is a global passive
+eavesdropper who is interested in learning the recipients of a single
+targeted sender (``Alice''); that Alice sends messages to $m$ recipients;
+that Alice sends a single message to one of them chosen at random per batch
+of $b$ messages; and that the other $b-1$ messages in each batch are chosen
+at random from the set of $N$ possible recipients.
 
 The attacker observes the messages leaving the mix in each batch and
 constructs sets $R_i$ of recipients receiving messages in batch $i$.
@@ -191,29 +213,35 @@
 \XXXX{The above may not be 100\% accurate; must re-read the paper.}
 
 \subsection{The statistical disclosure attack}
+\label{subsec:statistical-disclosure}
 In 2003, Danezis presented the statistical disclosure
 attack\cite{statistical-disclosure}, which makes the same operational
-assumptions
-as the original disclosure attacks, but is far easier to implement in
-terms of storage, speed, and algorithmic complexity.
+assumptions as the original disclosure attack, but is far easier to
+implement in terms of storage, speed, and algorithmic complexity.
 
 In the statistical disclosure attack, we model Alice's behavior as an
 unknown vector $\vec{v}$ whose elements correspond to the
 probability of Alice sending a message to each of the $N$ recipients
-in the system.  Of these elements, $m$ will be $1/m$, and $N-m$ will
-be $0$.  We model the behavior of the cover traffic as a known
-vector $\vec{u}$ each of whose $N$ elements is $1/N$.
+in the system.  The elements of $\vec{v}$ corresponding to Alice's $m$
+recipients will be $1/m$; the other $N-m$ elements of $\vec{v}$ will
+be $0$.  We model the behavior of the cover traffic sent by other users
+as a known vector $\vec{u}$ each of whose $N$ elements is $1/N$.
 
 The attacker derives from each output round $i$ an observation vector
 $\vec{o_i}$, each of whose elements corresponds to the probability of
-Alice's having sent a message to a particular recipient in that round.
-Taking the arithmetic mean $\bar{O}$ of a large set of $\vec{o_i}$
-gives (by the law of large numbers): 
+Alice's having sent a message to each particular recipient in that round. (In
+a round $i$ where Alice has send a message, each element of $\vec{o_i}$ will
+have value $1/b$ if it corresponds to a recipient who has received a message,
+and $0$ if it does not.
+Taking the arithmetic mean $\bar{O}$ of a large set of the observation
+vectors gives (by the law of large numbers):
 \[\bar{O} = \frac{1}{t}\sum_{i=i}^{t} \vec{o_i} \approx
   \frac{\vec{v} + (b-1)\vec{u}}{b} \]
 From this, the attacker estimates
 \[\vec{v} \approx b\frac{\sum_{i=1}^t \vec{o_i}}{t} - (b-i)\vec{u}\]
 
+% XXXX Maybe add a simgle example?
+
 \XXXX{Should we report George's findings on preconditions and
  required time to succeed?}
 
@@ -221,22 +249,28 @@
 
 %======================================================================
 \section{Extending the statistical disclosure attack}
-\label{extending}
+\label{sec:extending}
 
 \XXXX{All the math in this section is kinda fishy; we need a
     good stats person to phrase this right.  I may be weighting when
-    I shouldn't, modeling things strangely, and so on. ---Nick.}
+    I shouldn't, modeling things strangely, and so on. I am convinced,
+    empirically, that the attacks {\it work}---but I don't know whether
+    my explanation of them is correct. ---Nick.}
 
 \subsection{Broadening the attack}
+\label{subsec:broadening}
 In this subsection, we examine ways to extend Danezis's Statistical
-Disclosure Attack to systems more closely resembling real mix-nets.
+Disclosure Attack to systems more closely resembling real-world mix-nets.  In
+section \ref{sec:simulation}, we discuss the additional time and information
+required for each attack to succeed.
 
-\subsubsection{Complex sender behavior}
-First, we relax the requirements related to sender behavior.  In this
-model, we allow Alice to send to her recipients with non-uniform
+\subsubsection{Complex senders, unknown background traffic}
+\label{subsubsec:complex-senders}
+First, we relax the requirements related to sender behavior.
+We allow Alice to choose among her recipients with non-uniform
 probability, and to send multiple messages in a single batch.  We also
 remove the assumption that the attacker has full knowledge of the
-distribution $\vec{u}$ of cover traffic.
+distribution $\vec{u}$ of cover traffic sent by users other than Alice.
 
 We model Alice's behavior with a probability distribution $p_m$ such
 that in every round Alice sends $n$ messages with a probability
@@ -246,114 +280,130 @@
 
 To mount the attack, the attacker first obtains an
 estimate of $\vec{u}$ by observing a large number $t'$ of batches to
-which Alice has {\it not} contributed any messages.  For each such
-batch $i$, the attacker constructs a vector $\vec{u_i}$ containing the
+which Alice has {\it not} contributed any messages.\footnote{The attack can
+  still proceed if few such Alice-free batches exist, so long as Alice
+  contributes more to some batches than to others.  Specifically, the approach
+  described in section \ref{subsubsec:complex-mix} can exploit differences
+  between low-Alice and high-Alice batches to infer background behavior.}
+For each such
+batch $i$, the attacker constructs a vector $\vec{u_i}$ containing
 $1/b$ for recipients that have received a message in that batch, and
 $0$ for recipients that have not.  The attacker then estimates
 $\vec{u}$ as:
 \[\vec{u} \approx \bar{U} = \frac{1}{t'}\sum_{i=1}^{t'}\vec{u_i} \]
 
-The attacker then observes, for each round in which Alice {\it does}
+The attacker then observes, for each round $i$ in which Alice {\it does}
 send a message, the number of messages $m_i$ sent by Alice, and
 computes $\vec{o_i}$ as before.  Computing the arithmetic mean
 gives us
-\[\bar{O} = \frac{\vec{o_i}}{\sum t}
+\[\bar{O} = \frac{1}{t}\sum_{i=1}^{t}{\vec{o_i}}
             \approx
-              \frac{\bar{m}\vec{v} + (b-\bar{m})\vec{u}}{b} 
+              \frac{\bar{m}\vec{v} + (b-\bar{m})\bar{U}}{b}
               \mbox{\ where\ } \bar{m} = \frac{1}{t}\sum{m_i} \]
 
 From this, the attacker estimates Alice's behavior as
-\[\vec{v} \approx \frac{b}{\bar{m}} 
-            \left[ \frac{\vec{o_i}}{t} - (b-\bar{m})\bar{U} \right] \]
+\[\vec{v} \approx \frac{1}{\bar{m}}
+            \left[ b\bar{O} - (b-\bar{m})\bar{U} \right] \]
 
-\subsubsection{Complex mixing behavior}
-\label{complex-mix}
+\subsubsection{Attacking pool mixes}
+\label{subsubsec:complex-mix}
 Most mix-net designers have already abandoned fixed-batch mixes in
-favor of other mixing algorithms that better obscure the relation
+favor of other mixing algorithms that better hide the relation
 between senders and recipients.  Such improved algorithms include
 timed dynamic-pool mixes, generalized mixes, and randomized versions of
-each \cite{trickle02}\cite{pet2003-diaz}.
-
-We treat these batching algorithms as follows\footnote{this approach
-  is based on D\'iaz and Serjantov's work in \cite{pet2003-diaz}}: a
-mix relays a number of messages at the end of each round, depending
-on the number of messages it is currently storing.  All messages in
-the mix's pool at the end of a round have an equal probability of
-being included in that round's batch.  Thus, we can characterize the
-mix's batching algorithm as a probability distribution $P_{MIX}(b|s)$
---- the probability that the mix relays $b$ messages when it has $s$
-messages in the pool.  Clearly, $\sum_{b=0}^{s}P_{MIX}(b|s) = 1$, for
-all values of s.
-
-In the case of a timed dynamic-pool mix, this distribution is:
-\[ P_{TDP}(b|s) = 
-  \left\{ \begin{array}{l}
-    1 \mbox{\ if\ } s>\mbox{min-pool}, 
-      b = min(s-\mbox{min-pool}, \left\lfloor s*\mbox{rate} \right\rfloor) \\
-    0 \mbox{\ otherwise} \end{array} \right. \]
-For a binomial timed dynamic-pool mix,
-\[ P_{BTDP}(b|s) = 
-   \left\{ \begin{array}{l} 
-           \frac{b_0}{s}^b(1-\frac{b_0}{s})^{s-b} \mbox{\ if\ }
-              P_{TDP}(b_0|s) = 1 \\
-          0 \mbox{\ otherwise} \end{array} \right. \]
+each \cite{trickle02}\cite{pet2003-diaz}.  Rather than reordering and
+relaying all of their messages whenever they have received $b$, these
+algorithms store messages in a {\it pool} as they are received, and after a
+fixed period of time, relay a {\it fraction} of the messages in their pool,
+based on their current pool size.
 
-When attacking such a mix, the attacker no longer knows with certainty
+When attacking such a mix, the attacker no longer knows for certain
 which batches of recipients contain a message from Alice.  Instead,
 the attacker can only estimate, for each batch of exiting messages,
-the probability that the batch includes one of Alice's messages.  
+the probability that the batch includes one of Alice's messages.
 
-We denote by $P_R(r)$ the probability of a message leaving the mix $r$
-rounds after it has arrived.  In future discussion, we assume for the
-sake of simplicity that the mix has reached a steady state, so that
-$P_R(r) = {P_D}^r$ where $P_D$ is the probability of a message in the
-mix's pool being deferred in any given round.\footnote{The
-attacker can estimate $\bar{P_R}$ by
-sending occasion test messages through the mix, or by counting the
+Following D\'iaz and Serjantov's approach in \cite{pet-2003-diaz}, we treat
+these mixmin algorithms generically as follows: a mix relays a
+number of messages at the end of each round, depending on the number of
+messages it is currently storing.  All messages in the mix's pool at the end
+of a round have an equal probability of being included in that round's batch.
+Thus, we can characterize the mix's batching algorithm as a probability
+distribution $\PMIX(b|s)$---the probability that the mix relays $b$ messages
+when it has $s$ messages in the pool.  Clearly, $\sum_{b=0}^{s}\PMIX(b|s) =
+1$, for all values of s.
+
+We use $n_i$ to denote the number of messages entering the mix in round $i$.
+
+%In the case of a timed dynamic-pool mix, this distribution is:
+%\[ P_{TDP}(b|s) =
+%  \left\{ \begin{array}{l}
+%    1 \mbox{\ if\ } s>\mbox{min-pool},
+%      b = min(s-\mbox{min-pool}, \left\lfloor s*\mbox{rate} \right\rfloor) \\
+%    0 \mbox{\ otherwise} \end{array} \right. \]
+%For a binomial timed dynamic-pool mix,
+%\[ P_{BTDP}(b|s) =
+%   \left\{ \begin{array}{l}
+%           \frac{b_0}{s}^b(1-\frac{b_0}{s})^{s-b} \mbox{\ if\ }
+%              P_{TDP}(b_0|s) = 1 \\
+%          0 \mbox{\ otherwise} \end{array} \right. \]
+
+We denote by $P_R^i(r)$ the probability of a message that arrives in round
+$i$ leaving the mix in $r$ rounds later.
+%In future discussion, we assume for
+%the
+%sake of simplicity that the mix has reached a steady state, so that
+%$P_R^i(r) = P_R^j(r)$, for all $i$ and $j$.
+We assume that the attacker has a fair estimate of $P_R$.\footnote{The
+attacker can estimate $P_R$ by
+sending occasional test messages through the mix, or by counting the
 messages entering and leaving the mix and deducing the pool size.
-Dummy messages can impair this attack technique to an extent discussed
+Dummy messages can hamper these estimation techniques to an extent discussed
 in \cite{dummy-msgs}.}
 Now, when Alice sends a message in round
-$i$, the attacker observes rounds $i$ through some later round $i+k$
-(choose $k$ so that $P_R(i+k)$ is negligible).  The attacker then
-weights each of these
-rounds by the expected number of messages from Alice that will exit
-that round, includes them in an adjusted $\bar{O_w}$, and estimates
-$\vec{v}$ as before. \XXXX{Include an equation?}
+$i$, the attacker observes rounds $i$ through some later round $i+k$,
+choosing $k$ so that $P_R^i(k)$ is negligible.  The attacker then uses $P_R$ to
+computes a weighted mean $\bar{O_w}$ of the observations from all of these
+rounds, weighting by the expected number of messages from Alice that will exit
+that round:
+\[ \bar{O_w} = \sum_i \sum_{r=0}^k P_R^i(r) m_i \vec{o_{i+r}}
+   \approx \frac{\bar{m}\vec{v} + (\bar{n}-\bar{m})\vec{u}}{\bar{n}} \]
 
 To estimate $\vec{u}$ in this case, the simplest approach is to
 average $\vec{u_i}$ from batches with a negligible probability of
 including messages from Alice.  Such batches, however, are not
 essential: If the attacker chooses a set of $\vec{u_i}$ such that each
-round contains (on average) a small fraction $p_a>0$ of messages from
-Alice, the attacker will obtain 
-\[\bar{U}' \approx p_a \vec{v} + (1-p_a) \vec{u} \]
+round contains (on average) a small number $\delta_a>0$ of messages from
+Alice, the attacker will obtain
+\[\bar{U'} \approx \frac{\delta_a}{\bar{n}} \vec{v} +
+                   \frac{1-\delta_a}{\bar{n}} \vec{u} \]
 and can thus substitute
-\[\vec{u} \approx \frac{1}{1-p_a}\left[ \vec{v} - \bar{U}' \right] \]
-into the earlier equation for $\bar{O}$ and again solve for $\vec{v}$.
+\[\vec{u} \approx \frac{1}{1-\delta_a}
+               \left[ \bar{n} \bar{U'} - \delta_a \vec{v} \right] \]
+into the earlier equation for $\bar{O_w}$ and again solve for $\vec{v}$.
 
-\subsubsection{Mix-nets}
+\subsubsection{Attacking mix-nets}
+\label{subsubsec:mix-nets}
 Another way senders behave differently from the model of the original
 disclosure attack is by directing their messages through a chosen path
 in a network of mixes.  While using a mix-net increases the effort an
 attacker must spend to observe all messages leaving the system, it
 has no additional effect on intersection attacks beyond changing the
 delaying characteristics $P_R$ of the anonymity system as introduced
-in section \ref{complex-mix}.
+in section \ref{subsubsec:complex-mix}.
 
 Assume for the sake of simplicity that all mixes share a single
 $P_R$, and that Alice chooses a path of length $\ell_0$.  The chance of
 the message being delayed by a further $d$ rounds is now
 \[  P_R'(\ell_0+d) = \binom{\ell_0+d-1}{d} (1-P_D)^{\ell_0} P_D^d \]
 If Alice chooses her path length probabilistically according to
-$P_L(\ell)$, we have 
+$P_L(\ell)$, we have
 \[ P_R'(r) =
    \sum_{\ell=1}^r P_L(\ell) \binom{r-1}{r-\ell} (1-P_D)^\ell P_d^{r-\ell} \]
 
 \XXXX{ How can the attacker estimate $P_L$ for Alice?  What if he can't?}
 
 \subsubsection{Dummy traffic}
-\label{dummy-traffic}
+\label{subsubsec:dummy-traffic}
 One strategy for reducing the impact of traffic analysis is for Alice
 to periodically send messages into the network that arrive at no
 recipient, or to periodically send messages into the network that
@@ -362,7 +412,11 @@
 
 In the first case, the change is trivial: Alice's behavior vector
 $\vec{v}$ no longer adds to $1$, since there is now a probability that
-a message from Alice will not reach any recipient.
+a message from Alice will not reach any recipient.  Aside from this, the
+attack can procede before, so long as Alice sends more messages (including
+dummies) in some rounds than in others.
+
+\XXXX{Is the rest of this section even vaguely germane or realistic?}
 
 On the other hand, suppose that some of Alice's dummy traffic reaches
 recipients other than those with whom she ordinarily communicates\footnote{This
@@ -381,7 +435,7 @@
 \[ \vec{v'} = (1-P_c)\vec{v} + P_c\vec{v_c} \]
 In other words, although the attacker cannot deduce which recipients
 receive messages from Alice, he can instead learn a superset of
-Alice's recipients.
+Alice's recipients. 
 
 Clearly, if Alice chooses $\vec{v_c}$ such that she sends messages with
 equal probability to all recipients, she can completely hide the
@@ -393,9 +447,10 @@
 the system, and thus scales poorly.
 
 \XXXX{say more here.  Dummy policies other than N-squared may help a
-  little.}
+  little, by making it harder to learn $\vec{u}$.}
 
 \subsubsection{Partial observation}
+\label{subsubsec:partial}
 Until now, we have required that the attacker, as a global passive
 adversary, observe all the messages entering and leaving the system.
 (At least, all the messages sent by Alice, and all the messages
@@ -426,7 +481,8 @@
 recipients:  the attacker learns which of the observed recipients
 receive messages from Alice, and which do not.
 
-\subsubsection{Time-variant traffic}
+\subsubsection{Time-variant background traffic}
+\label{subsubsec:time-variant}
 \XXXX{This section is very handwavy.}
 
 If Alice's behavior changes completely and radically over time, a long
@@ -436,24 +492,25 @@
 
 On the other hand, if Alice's behavior $\vec{v}$ remains consistent
 while the behavior of the steady-state traffic $\vec{u}$ changes slowly, the
-attacker has some hope.  Rather than estimating a single $\bar{U}$
+attacker still has some hope.  Rather than estimating a single $\bar{U}$
 from observations to which Alice does not contribute, the attacker
 estimates a series of successive $\bar{u_i}$ values based on the
 average behavior of the network during a comparatively shorter
 duration of time.  Now the attacker observes $\vec{o_i}$ as before and
 computes the average of $\vec{o_i} - \vec{u_i}$, as before.  Now,
-\[ \vec{v} \propto \frac{1}{t}\sum_{i=1}^t \vec{o_i} - \bar{u_i} 
-\]   
+\[ \vec{v} \propto \frac{1}{t}\sum_{i=1}^t \vec{o_i} - \bar{u_i}
+\]
 So if an attacker can get good local estimates to $\vec{u}$, the
 intersection attack proceeds as before.
 
 \XXXX{(Okay, not quite -- there's a constant factor there too.)}
 
 \subsubsection{Attacking recipients}
+\label{subsubsec:recipients}
 As a final (and somewhat anticlimactic) extension to the original
 attack, we note that an attacker can mount attacks against recipients
 as well as senders with only slightly higher storage, and no increase
-in computational cost. 
+in computational cost.
 
 In this model, the attacker wishes to know which senders are sending
 anonymous messages to a given recipient Bob.  The analysis remains the
@@ -466,6 +523,7 @@
 message was sent before the first round in the window is negligible.
 
 \subsection{Strengthening the attack}
+\label{subsec:strenghtening}
 Our previous extensions to the original statistical disclosure attack
 have shown how to reveal sender-recipient
 links in a broader and increasingly complex range of circumstances.
@@ -477,20 +535,22 @@
 incorporating additional information the attacker may have.
 
 \subsubsection{Exploiting full message linkability}
+\label{subsubsec:full-linkability}
 The attacker's work can be greatly simplified if some messages leaving
 the system are {\it fully linkable}.  Two messages are said to be {\it
   fully linkable} if an observer can tell, with certainty, that they
 originate from the same sender. One example of a system with fully
 linkable messages would be a pseudonymity service in which each sender
-has one or more pseudonyms and all delivered messages can be linked
+has one or more pseudonyms and all delivered messages are associated
 to a pseudonym.
 
-Clearly, the attacker in such a system can trivially link senders to
-recipients by linking senders to their pseudonyms.  One way to do this
+Clearly, an eavesdropper attacker who could link senders to their pseudonyms
+could trivially use this information to link senders and recipients.
+One way to do this
 is by treating classes of fully linkable messages as virtual message
 destinations.  Instead of collecting observations $\vec{o_i}$ of
 recipients who receive messages in round $i$, the attacker now
-collects observations $\vec{o_i}'$ of linkable classes
+collects observations $\vec{o_i}$ of linkable classes
 (e.g. pseudonyms) that receive in round $i$.  Since two distinct
 senders don't produce messages in the same linkability class, the
 elements of Alice's $\vec{v}$ and the background $\vec{u}$ are now
@@ -501,7 +561,7 @@
   everything.}
 
 \subsubsection{Exploiting partial message linkability}
-% subsumes \subsubsection{Exploiting partitioning attacks}
+\label{subsubsec:partial-linkability}
 It's possible that the linkage between
 messages may not be full.  In other words, the attacker may be able
 to deduce that two messages are more likely than average to have the same
@@ -544,6 +604,7 @@
 \end{itemize}
 
 \subsubsection{Exploiting {\it a priori} suspicion}
+\label{subsubsec:suspicion}
 Finally, the attacker may have reason to believe that some messages
 are more likely to have been sent by the target user than others.  For
 example, if we believe that Alice speaks English but not Arabic, or
@@ -568,43 +629,52 @@
 \end{itemize}
 
 %======================================================================
-\section{Analysis}
-\label{analysis}
-
-\XXXX{We need a stats person to analyze how strong the signal must be,
-  and how long an attacker should expect to have to observe in each
-  case.}
+%\section{Analysis}
+%\label{analysis}
+%
+%\XXXX{We need a stats person to analyze how strong the signal must be,
+%  and how long an attacker should expect to have to observe in each
+%  case.}
 
 %======================================================================
 \section{Simulation results}
-\label{simulation}
+\label{sec:simulation}
 
-\XXXX{I'm trying to come up with these now.}
+\XXXX{I'm coming up with these now.}
 
-\subsection{Complex sender behavior}
+\subsection{The original statistical disclosure attack}
+\label{subsec:sim-statistical-disclosure}
 
-\subsection{Attacking a timed dynamic-pool mix}
+\subsection{Complex sender behavior and unknown background traffic}
+\label{subsec:sim-complex-senders}
 
-\subsection{Attacking a mix-net}
+\subsection{Attacking pool mixes and mix-nets}
+\label{subsec:sim-complex-mixes}
 
-\subsection{Attacking a mix-net with dummy traffic}
+\subsection{The impact of dummy traffic}
+\label{subsec:sim-dummies}
 
-\subsection{The effects of partial observation}
+\subsection{The impact of partial observation}
+\label{subsec:sim-partial}
 
 \subsection{Attacking a nym service (full linkability)}
+\label{subsec:sim-full-linkability}
 
 \subsection{Attacking a fragmented community (partial linkability)}
+\label{subsec:sim-partial-linkability}
 
 \subsection{Attacking a suspicious user}
+\label{subsec:sim-suspicion}
 
 % Not simulated: time-variant traffic, attacks against multiple
 % senders and recipients, partitioning attack
 
 %======================================================================
 \section{Conclusions}
-\label{conclusion}
+\label{sec:conclusion}
 
 \subsection{Implications for mix-net design}
+\label{subsec:implications}
 \XXXX{I think I'll hold off drawing any conclusions until we have
   simulated/analytic results.  If we're lucky, the results will
   suggest some method that resists intersection attacks without
@@ -615,6 +685,7 @@
   out their $P_R$ from round to round; and ...?}
 
 \subsection{Future work}
+\label{subsec:future-work}
 \XXXX{One big question I want answered is: will any of the
   information-theoretic measurements turn out to be useful in
   analyzing this attack?  If not, why not?}

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/
Next by Author: [freehaven-cvs] Whitespace fixes; revised approach to mix breaking t...
Next by thread: [freehaven-cvs] Whitespace fixes; revised approach to mix breaking t...
Index(es):
- Author
- Thread