[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] Finish editing the attacks section. Time to write a ...



Update of /home/freehaven/cvsroot/doc/e2e-traffic
In directory moria.mit.edu:/tmp/cvs-serv25928

Modified Files:
	e2e-traffic.tex 
Log Message:
Finish editing the attacks section. Time to write a skeleton results section

Index: e2e-traffic.tex
===================================================================
RCS file: /home/freehaven/cvsroot/doc/e2e-traffic/e2e-traffic.tex,v
retrieving revision 1.15
retrieving revision 1.16
diff -u -d -r1.15 -r1.16
--- e2e-traffic.tex	21 Jan 2004 04:08:25 -0000	1.15
+++ e2e-traffic.tex	21 Jan 2004 05:02:11 -0000	1.16
@@ -156,7 +156,8 @@
 Additionally, we show how an attacker can exploit additional knowledge to
 speed up these attacks.  Such knowledge includes:
 \begin{tightlist}
-\item Linkability between messages. The attacker can take into account
+\item Linkability between messages. For example, the attacker can take into
+  account
   whether messages are written in the same language or signed by the
   same pseudonym.
 \item {\it A priori} suspicion of certain messages having originated
@@ -376,30 +377,33 @@
 remove the assumption that the attacker has full knowledge of the
 distribution $\V{u}$ of cover traffic sent by users other than Alice.
 
-We model Alice's behavior with a probability function $P_m$ such
+To model Alice's varying number of messages, we use  a probability function
+$P_m$ such
 that in every round Alice sends $n$ messages with a probability
-$P_m(n)$, and with a behavior vector $\V{v}$ of the recipients to
-which Alice sends.  Thus, Alice's expected contribution to each round
-is $\V{v} \sum_{n=0}^{\infty} n P_m(n)$.
+$P_m(n)$.  We still use a behavior vector $\V{v}$ of the recipients to
+which Alice sends, but we no longer require Alice's recipients to have a
+uniform $1/m$ probability.   Alice's expected contribution to each round
+is now $\V{v} \sum_{n=0}^{\infty} n P_m(n)$.
 
 To mount the attack, the attacker first obtains an
-estimate of $\V{u}$ by observing a large number $t'$ of batches to
+estimate of the background distribution $\V{u}$ by observing a large number
+$t'$ of batches to
 which Alice has {\it not} contributed any messages.\footnote{The attack can
   still proceed if few such Alice-free batches exist, so long as Alice
   contributes more to some batches than to others.  Specifically, the approach
   described in Section~\ref{subsubsec:complex-mix} can exploit differences
   between low-Alice and high-Alice batches to infer background behavior.}
 For each such
-batch $i$, the attacker constructs a vector $\V{u_i}$ containing
+batch $i$, the attacker constructs a vector $\V{u_i}$, whose elements are
 $1/b$ for recipients that have received a message in that batch, and
 $0$ for recipients that have not.  The attacker then estimates
-$\V{u}$ as:
+the background distribution $\V{u}$ as:
 \[\V{u} \approx \B{U} = \frac{1}{t'}\sum_{i=1}^{t'}\V{u_i} \]
 
 The attacker then observes, for each round $i$ in which Alice {\it does}
 send a message, the number of messages $m_i$ sent by Alice, and
-computes $\V{o_i}$ as before.  Computing the arithmetic mean
-gives us
+computes $\V{o_i}$ as before.  Computing the arithmetic mean of these
+$\V{o_i}$ gives us
 \[\B{O} = \frac{1}{t}\sum_{i=1}^{t}{\V{o_i}}
             \approx
               \frac{\B{m}\V{v} + (b-\B{m})\B{U}}{b}
@@ -409,17 +413,17 @@
 \[\V{v} \approx \frac{1}{\B{m}}
             \left[ b\B{O} - (b-\B{m})\B{U} \right] \]
 
-\subsubsection{Attacking pool mixes}
+\subsubsection{Attacking pool mixes and mix-nets}
 \label{subsubsec:complex-mix}
-Most mix-net designers have already abandoned fixed-batch mixes in
-favor of other mixing algorithms that better hide the relation
+Most mix-net designs have already abandoned fixed-batch mixes in
+favor of other algorithms that better hide the relation
 between senders and recipients.  Such improved algorithms include
 timed dynamic-pool mixes, generalized mixes, and randomized versions of
 each \cite{trickle02}\cite{pet2003-diaz}.  Rather than reordering and
-relaying all of their messages whenever they have received $b$, these
-algorithms store messages in a {\it pool} as they are received, and after a
-fixed period of time, relay a {\it fraction} of the messages in their pool,
-based on their current pool size.
+relaying all of their messages whenever they have received a fixed number
+$b$, these algorithms store received messages in a {\it pool}, and at fixed
+intervals relay a {\it fraction} of the pooled messages, based on the pool's
+current size.
 
 When attacking such a mix, the attacker no longer knows for certain
 which batches of recipients contain a message from Alice.  Instead,
@@ -436,8 +440,6 @@
 when it has $s$ messages in the pool.  Clearly, $\sum_{b=0}^{s}\PMIX(b|s) =
 1$, for all values of s.
 
-We use $n_i$ to denote the number of messages entering the mix in round $i$.
-
 %In the case of a timed dynamic-pool mix, this distribution is:
 %\[ P_{TDP}(b|s) =
 %  \left\{ \begin{array}{l}
@@ -451,29 +453,30 @@
 %              P_{TDP}(b_0|s) = 1 \\
 %          0 \mbox{\ otherwise} \end{array} \right. \]
 
-We denote by $P_R^i(r)$ the probability of a message that arrives in round
-$i$ leaving the mix in $r$ rounds later.
-%In future discussion, we assume for
-%the
-%sake of simplicity that the mix has reached a steady state, so that
-%$P_R^i(r) = P_R^j(r)$, for all $i$ and $j$.
+We denote by $P_R^i(r)$ the probability that a message arriving in round
+$i$ leaves the mix $r$ rounds later.
+%%In future discussion, we assume for
+%%the sake of simplicity that the mix has reached a steady state, so that
+%%$P_R^i(r) = P_R^j(r)$, for all $i$ and $j$.
 We assume that the attacker has a fair estimate of $P_R$.\footnote{The
 attacker can estimate $P_R$ by
-sending occasional test messages through the mix, or by counting the
+sending test messages through the mix, or by counting the
 messages entering and leaving the mix and deducing the pool size.
-Dummy messages can hamper these estimation techniques to an extent discussed
+Dummy messages can hamper these techniques to an extent discussed
 in \cite{dummy-msgs}.}
 Now, when Alice sends a message in round
 $i$, the attacker observes rounds $i$ through some later round $i+k$,
-choosing $k$ so that $P_R^i(k)$ is negligible.  The attacker then uses $P_R$ to
+choosing $k$ so that $\sum_{j=k+1}^{\infty} P_R^i(j)$ is negligible.
+The attacker then uses $P_R$ to
 computes a weighted mean $\B{O_w}$ of the observations from all of these
 rounds, weighting by the expected number of messages from Alice that will exit
 that round:
 \[ \B{O_w} = \sum_i \sum_{r=0}^k P_R^i(r) m_i \V{o_{i+r}}
    \approx \frac{\B{m}\V{v} + (\B{n}-\B{m})\V{u}}{\B{n}} \]
 
-To estimate $\V{u}$ in this case, the simplest approach is to
-average $\V{u_i}$ from batches with a negligible probability of
+To solve for Alice's behavior $\V{v}$, the attacker now needs a good estimate
+for estimate for the background $\V{u}$.  The attacker gets this by
+averaging observations $\V{u_i}$ from batches with a negligible probability of
 including messages from Alice.  Such batches, however, are not
 essential: If the attacker chooses a set of $\V{u_i}$ such that each
 round contains (on average) a small number $\delta_a>0$ of messages from
@@ -485,15 +488,13 @@
                \left[ \B{n} \B{U'} - \delta_a \V{v} \right] \]
 into the earlier equation for $\B{O_w}$ and again solve for $\V{v}$.
 
-\subsubsection{Attacking mix-nets}
-\label{subsubsec:mix-nets}
 Another way senders behave differently from the model of the original
 disclosure attack is by directing their messages through a chosen path
 in a network of mixes.  While using a mix-net increases the effort an
 attacker must spend to observe all messages leaving the system, it
 has no additional effect on intersection attacks beyond changing the
 delaying characteristics $P_R$ of the anonymity system as introduced
-in Section~\ref{subsubsec:complex-mix}.
+above.
 
 Assume for the sake of simplicity that all mixes share a single
 $P_R$, and that Alice chooses a path of length $\ell_0$.  The chance of
@@ -504,7 +505,7 @@
 \[ P_R'(r) =
    \sum_{\ell=1}^r P_L(\ell) \binom{r-1}{r-\ell} (1-P_D)^\ell P_d^{r-\ell} \]
 
-\XXXX{ How can the attacker estimate $P_L$ for Alice?  What if he can't?}
+% \XXXX{ How can the attacker estimate $P_L$ for Alice?  What if he can't?}
 
 \subsubsection{Dummy traffic}
 \label{subsubsec:dummy-traffic}
@@ -514,44 +515,46 @@
 arrive at recipients other than those with whom she wants to
 communicate.
 
-In the first case, the change is trivial: Alice's behavior vector
-$\V{v}$ no longer adds to $1$, since there is now a probability that
-a message from Alice will not reach any recipient.  Aside from this, the
-attack can proceed as before, so long as Alice sends more messages (including
-dummies) in some rounds than in others.
-
-\XXXX{Is the rest of this section even vaguely germane or realistic?}
-
-On the other hand, suppose that some of Alice's dummy traffic reaches
-recipients other than those with whom she ordinarily communicates\footnote{This
-  discussion ignores the question of how Alice is to generate the text
-  of these cover messages.  If some messages in the mix-net are
-  unencrypted, it will be difficult or impossible for Alice to compose
-  plausible unencrypted cover texts for most possible recipients.  On
-  the other hand, if all messages in the mix-net are encrypted, Alice
-  can easily give her dummy messages junk padding, encrypted so that
-  only the recipient of each dummy message can distinguish it from a
-  real message.}.  Suppose that every message Alice sends has a
-probability $P_c$ of being a cover message, and that recipients for
-cover messages are chosen from a distribution vector $\V{v_c}$.
-Now, instead of converging on Alice's true pattern of recipients
-$\V{v}$, the attacker instead learns
-\[ \V{v'} = (1-P_c)\V{v} + P_c\V{v_c} \]
-In other words, although the attacker cannot deduce which recipients
-receive messages from Alice, he can instead learn a superset of
-Alice's recipients.
-
-Clearly, if Alice chooses $\V{v_c}$ such that she sends messages with
-equal probability to all recipients, she can completely hide the
-actual recipients of her messages.  In practice, however, this
-requirement in unrealistic:  doing so will require that she send
-messages to {\it every} participant in the system --- potentially as
-many recipients as there are active email addresses.  Such a padding
-regime results in traffic quadratic in the number of participants in
-the system, and thus scales poorly.
+Although these methods can succeed in slowing or stopping the attacker (as
+discussed below in section \ref{sec:simulation}), the change in the attack
+itself is trivial: Alice's behavior
+vector $\V{v}$ no longer adds to $1$, since there is now a probability that a
+message from Alice will not reach any recipient.  Aside from this, the
+attacker can proceed as before, so long as Alice sends more messages
+(including dummies) in some rounds than in others.
 
-\XXXX{say more here.  Dummy policies other than N-squared may help a
-  little, by making it harder to learn $\V{u}$.}
+%%\XXXX{Is the rest of this section even vaguely germane or realistic?}
+%%
+%%On the other hand, suppose that some of Alice's dummy traffic reaches
+%% recipients other than those with whom she ordinarily communicates\footnote{This
+%%  discussion ignores the question of how Alice is to generate the text
+%%  of these cover messages, if messages typically leave the mix-net
+%%  unencrypted.}
+%% %% If some messages in the mix-net are
+%% %%  unencrypted, it will be difficult or impossible for Alice to compose
+%% %%  plausible unencrypted cover texts for most possible recipients.  On
+%% %%  the other hand, if all messages in the mix-net are encrypted, Alice
+%% %%  can easily give her dummy messages junk padding, encrypted so that
+%% %%  only the recipient of each dummy message can distinguish it from a
+%% %%  real message.}.  
+%% Suppose that every message Alice sends has a
+%% probability $P_c$ of being a cover message, and that recipients for
+%% cover messages are chosen from a distribution vector $\V{v_c}$.
+%% Now, instead of converging on Alice's true pattern of recipients
+%% $\V{v}$, the attacker instead learns
+%% \[ \V{v'} = (1-P_c)\V{v} + P_c\V{v_c} \]
+%% In other words, although the attacker cannot deduce which recipients
+%% receive messages from Alice, he can instead learn a superset of
+%% Alice's recipients.
+%%
+%% Clearly, if Alice chooses $\V{v_c}$ such that she sends messages with
+%% equal probability to all recipients, she can completely hide the
+%% actual recipients of her messages.  In practice, however, this
+%% requirement in unrealistic:  doing so will require that she send
+%% messages to {\it every} participant in the system---potentially as
+%% many recipients as there are active email addresses.  Such a padding
+%% regime results in traffic quadratic in the number of participants in
+%% the system, and thus scales poorly.
 
 \subsubsection{Partial observation}
 \label{subsubsec:partial}
@@ -569,8 +572,8 @@
 non-global attacker has different characteristics.  If an attacker
 eavesdrops on a fraction of the {\it mixes} in the system, the
 attacker receives a random sample\footnote{But possibly a biased
-  sample, depending on Alice's path selection algorithm} of the
-messages entering or leaving the system.  An attacker in this case
+  sample, depending on Alice's path selection algorithm.} of the
+messages entering or leaving the system.  Such an attacker
 will still converge on the same $\B{O}$ and thus the same
 estimation of Alice's behavior, but will require more rounds of
 observation to do so.
@@ -587,28 +590,24 @@
 
 \subsubsection{Time-variant background traffic}
 \label{subsubsec:time-variant}
-\XXXX{This section is very handwavy.}
-
-If Alice's behavior changes completely and radically over time, a long
-term intersection cannot proceed: the attacker cannot make enough
+If Alice's behavior changes completely and radically over time, a long-term
+intersection attack cannot proceed: the attacker cannot make enough
 observations of any version of Alice's behavior in order to converge
 on a $\B{v}$ for any of them.
 
 On the other hand, if Alice's behavior $\V{v}$ remains consistent
-while the behavior of the steady-state traffic $\V{u}$ changes slowly, the
+while the behavior of the background traffic $\V{u}$ changes slowly, the
 attacker still has some hope.  Rather than estimating a single $\B{U}$
 from observations to which Alice does not contribute, the attacker
-estimates a series of successive $\B{u_i}$ values based on the
+estimates a series of successive $\B{U_i}$ values based on the
 average behavior of the network during a comparatively shorter
 duration of time.  Now the attacker observes $\V{o_i}$ as before and
-computes the average of $\V{o_i} - \V{u_i}$, as before.  Now,
-\[ \V{v} \propto \frac{1}{t}\sum_{i=1}^t \V{o_i} - \B{u_i}
+computes the average of $\V{o_i} - \B{U_i}$, as before.  Now,
+\[ \V{v} \propto \frac{1}{t}\sum_{i=1}^t \V{o_i} - \B{U_i}
 \]
 So if an attacker can get good local estimates to $\V{u}$, the
 intersection attack proceeds as before.
 
-\XXXX{(Okay, not quite -- there's a constant factor there too.)}
-
 \subsubsection{Attacking recipients}
 \label{subsubsec:recipients}
 As a final (and somewhat anticlimactic) extension to the original
@@ -622,33 +621,37 @@
 probably receives messages with behavior in rounds from which Bob
 probably doesn't receive messages.  The only complication is that the
 attacker cannot tell in advance when Bob will receive a message.
-Therefore, the attacker maintains a window of observations at all times,
-such that if Bob later receives a message, the probability that the
-message was sent before the first round in the window is negligible.
+Therefore, the attacker must remember a window of recent observations
+at all times,
+such that if Bob later receives a message, the probability is negligible that
+the message was sent before the first round in the window.
 
 \subsection{Strengthening the attack}
 \label{subsec:strenghtening}
 Our previous extensions to the original statistical disclosure attack
-have shown how to reveal sender-recipient
-links in a broader and increasingly complex range of circumstances.
-In order to compensate for these circumstances, the attacker has been
+have shown how to reveal sender/recipient links in a broader range of
+circumstances.
+In order to compensate for these circumstances (as will be shown below in
+section \ref{sec:simulation}) the attacker has been
 forced to observe an increasingly large number of rounds of traffic.
 
-In this section, rather than broadening the attack at the expense of
-increased difficulty, we discuss ways to strengthen the attack by
-incorporating additional information the attacker may have.
+In this section, rather than broadening the attack at the expense of needing
+increased traffic, we discuss ways to reduce the amount of traffic required
+for the attack by incorporating additional information.
 
-\subsubsection{Exploiting full message linkability}
+\subsubsection{Exploiting message linkability}
 \label{subsubsec:full-linkability}
-The attacker's work can be greatly simplified if some messages leaving
-the system are {\it fully linkable}.  Two messages are said to be {\it
-  fully linkable} if an observer can tell, with certainty, that they
-originate from the same sender. One example of a system with fully
-linkable messages would be a pseudonymity service in which each sender
-has one or more pseudonyms and all delivered messages are associated
-to a pseudonym.
+The attacker's work can be greatly simplified if some messages leaving the
+system are {\it linkable}.  Two messages are said to be {\it linkable} if
+they are likelier to originate from the same sender than are two randomly
+chosen messages.  Messages are {\it fully linkable} if an observer can tell
+with certainty that they have the same sender, as in a
+pseudonymity service in which each sender has one or more pseudonyms and
+all delivered messages are associated with a pseudonym.
 
-Clearly, an eavesdropper attacker who could link senders to their pseudonyms
+% Should we call the case we handle 'parametric linkability' or something?
+
+Clearly, an eavesdropper attacker who can link senders to their pseudonyms
 could trivially use this information to link senders and recipients.
 One way to do this
 is by treating classes of fully linkable messages as virtual message
@@ -660,14 +663,10 @@
 elements of Alice's $\V{v}$ and the background $\V{u}$ are now
 disjoint, and thus easier for the attacker to separate.
 
-\XXXX{Need to analyze below the circumstances under which this works;
-  and to what extent it helps.  It's fine for Nyms, but not for
-  everything.}
-
 \subsubsection{Exploiting partial message linkability}
 \label{subsubsec:partial-linkability}
 It's possible that the linkage between
-messages may not be full.  In other words, the attacker may be able
+messages may not be full---the attacker may be able
 to deduce that two messages are more likely than average to have the same
 sender, without being certain that the senders are the same.  For
 example, two binary documents written in the same version of MS Word
@@ -676,19 +675,13 @@
 sophisticated attacker could check for the presence of certain
 keywords and try to link messages based on their textual content.
 
-\XXXX{Can we do anything with non-parametric linkages?  IOW, can we
-  proceed given an
-  arbitrary collection of $P($sender-of-msg$_i$=sender-of-msg$_j)$?}
-
-\XXXX{This section is really about distinguishability, perhaps.}
-
 To exploit these partial linkages, the attacker begins as above by
 choosing a set of $c$ `linkability classes' (such as languages,
-patterns of usage, etc.), and assigning to each observed output
+patterns of usage, or so on), and assigning to each observed output
 message a probability of belonging to each class.  The attacker then
-proceeds as before, but instead of collecting $N$-element observation
+proceeds as before, but instead of collecting observation
 vectors with elements corresponding the recipients, the attacker now
-builds $cN$-element vectors whose elements correspond to number of
+collects observation vectors whose elements correspond to number of
 messages received in each round by each
 $\left<\mbox{recipient},\mbox{class}\right>$ tuple.  (If a message
 might belong to multiple classes, the attacker sets the corresponding
@@ -698,22 +691,17 @@
 proceeds as before, but messages sent to different linkability
 classes no longer provide cover for one another.
 
-\XXXX{ This attack seems not to exploit the fact that if Alice sends
-  to $<$Word6, Bob$>$, she is more likely to send to $<$Word6,
-  Carol$>$ than to $<$Word2K, Carol$>$.}
-
-Other opportunities for partitioning include:
-\begin{itemize}
-\item \XXXX{list some.}
-\end{itemize}
+%\XXXX{ This attack seems not to exploit the fact that if Alice sends
+%  to $<$Word6, Bob$>$, she is more likely to send to $<$Word6,
+%  Carol$>$ than to $<$Word2K, Carol$>$.}  -NM
 
 \subsubsection{Exploiting {\it a priori} suspicion}
 \label{subsubsec:suspicion}
 Finally, the attacker may have reason to believe that some messages
 are more likely to have been sent by the target user than others.  For
-example, if we believe that Alice speaks English but not Arabic, or
+example, if we believe that Alice speaks Urdu but not Arabic, or
 that Alice knows psychology but not astrophysics, then we will
-naturally suspect that an English-language message about psychology
+naturally suspect that an Urdu-language message about psychology
 is more likely to come from Alice than is an Arabic-language message
 about astrophysics.
 
@@ -727,24 +715,10 @@
 message, and $1/150$ to each element corresponding to an unlikely
 message.
 
-Other opportunities for gaining suspicion include:
-\begin{itemize}
-\item \XXXX{list some}
-\end{itemize}
-
-%======================================================================
-%\section{Analysis}
-%\label{analysis}
-%
-%\XXXX{We need a stats person to analyze how strong the signal must be,
-%  and how long an attacker should expect to have to observe in each
-%  case.}
-
 %======================================================================
 \section{Simulation results}
 \label{sec:simulation}
 
-\XXXX{I'm coming up with these now.}
 
 \subsection{The original statistical disclosure attack}
 \label{subsec:sim-statistical-disclosure}

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/