[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] Check in preliminary, iffy draft of some parts of e2...



Update of /home/freehaven/cvsroot/doc/e2e-traffic
In directory moria.mit.edu:/tmp/cvs-serv26088

Added Files:
	.cvsignore Makefile e2e-traffic.bib e2e-traffic.tex 
Log Message:
Check in preliminary, iffy draft of some parts of e2e-traffic paper

--- NEW FILE: .cvsignore ---
*.pdf
*.ps
*.log
*.dvi
*.blg
*.bbl
*.aux

--- NEW FILE: Makefile ---


all: e2e-traffic.pdf e2e-traffic.ps

e2e-traffic.dvi: e2e-traffic.tex
	latex e2e-traffic.tex

e2e-traffic.pdf: e2e-traffic.dvi
	dvipdf e2e-traffic.dvi e2e-traffic.pdf

e2e-traffic.ps: e2e-traffic.dvi
	dvips e2e-traffic.dvi -o e2e-traffic.ps

macview: e2e-traffic.pdf
	open e2e-traffic.pdf

clean:
	rm -f e2e-traffic.ps e2e-traffic.ps e2e-traffic.aux e2e-traffic.log
	rm -f e2e-traffic.bbl e2e-traffic.blg e2e-traffic.dvi
	rm -f *~

words:
	wc -w e2e-traffic.tex
--- NEW FILE: e2e-traffic.bib ---



@Article{chaum-mix,
   author =      {David Chaum},
   title =       {Untraceable electronic mail, return addresses, and digital 
                  pseudo-nyms},
   journal =     {Communications of the ACM},
   year =        {1982},
   volume =      {4},
   number =      {2},
   month =       {February},
   note =        {\url{http://www.eskimo.com/~weidai/mix-net.txt}},
}

@inproceedings{pet2003-diaz,
  title = {Generalising Mixes}, 
  author = {Claudia D\'iaz and Andrei Serjantov}, 
  booktitle = {Proceedings of the Privacy Enhancing Technologies workshop (PET 2003)},
  year = {2003}, 
  month = {March}, 
  editor = {Roger Dingledine}, 
  publisher = {Springer-Verlag, LNCS 2760}, 
  www_ps_gz_url = {http://www.esat.kuleuven.ac.be/~cdiaz/papers/DS03.ps.gz}, 
  www_important = {1}, 
  www_section = {Anonymous communication}, 
}

@inproceedings{trickle02,
  title = {From a Trickle to a Flood: Active Attacks on Several Mix Types}, 
  author = {Andrei Serjantov and Roger Dingledine and Paul Syverson}, 
  booktitle = {Proceedings of Information Hiding Workshop (IH 2002)}, 
  year = {2002}, 
  month = {October}, 
  editor = {Fabien Petitcolas}, 
  publisher = {Springer-Verlag, LNCS 2578}, 
  www_pdf_url = {http://freehaven.net/doc/batching-taxonomy/taxonomy.pdf}, 
  www_ps_url = {http://freehaven.net/doc/batching-taxonomy/taxonomy.ps}, 
  www_section = {Traffic analysis}, 
}

@inproceedings{limits-open,
  title = {Limits of Anonymity in Open Environments}, 
  author = {Dogan Kesdogan and Dakshi Agrawal and Stefan Penz}, 
  booktitle = {Proceedings of Information Hiding Workshop (IH 2002)}, 
  year = {2002}, 
  month = {October}, 
  editor = {Fabien Petitcolas}, 
  publisher = {Springer-Verlag, LNCS 2578}, 
  www_section = {Traffic analysis}, 
}

@misc{mixmaster-spec,
  title = {Mixmaster {P}rotocol --- {V}ersion 2}, 
  author = {Ulf M\"oller and Lance Cottrell}, 
  year = {2000}, 
  month = {January}, 
  howpublished = {Unfinished draft}, 
  www_section = {Anonymous communication}, 
  www_txt_url = {http://www.eskimo.com/~rowdenw/crypt/Mix/draft-moeller-mixmaster2-protocol-00.txt},
}

@article{statistical-disclosure,
  title = {Statistical Disclosure Attacks: Traffic Confirmation in Open Environments},
  author = {George Danezis},
  year = {2003},
}
--- NEW FILE: e2e-traffic.tex ---
% $Id: e2e-traffic.tex,v 1.1 2003/07/30 18:46:58 nickm Exp $
\documentclass{article}
\usepackage{times}
\usepackage{url}
\usepackage{amsmath}
\usepackage{fullpage}

\renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
\newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
\newcommand\XXXX[1]{{\small\bf [XXXX #1]}}

\begin{document}
\title{Practical Intersection Attacks: \\
       Extending statistical disclosure attacks to real-world mix-nets}
  
\author{Nick Mathewson \\ The Free Haven Project
  \\ \emailaddr{nickm@freehaven.net} }

\maketitle
\centerline{\LARGE\bf *DRAFT* --- not for publication}
%======================================================================
\begin{abstract}
We extend earlier work on end-to-end long-term traffic analysis.
Whereas earlier work addressed cases in which a global eavesdropper
was trying to learn sender-receiver connections at a single batch mix
whose senders had known behavior patterns, this paper describes how an
analogous attack can reveal sender-receiver connections to a
non-global eavesdropper without prior knowledge of sender behavior
patterns on network of timed dynamic-pool mixes with dummy messages.
Each weakening of earlier assumptions increases the time over which
the attacker must observe the mix-net, but has little effect on the
attacker's storage or computational requirements.  Additionally, we
describe how such an attacker can exploit additional information about
message linkability in order to reduce the number of rounds required
to link senders to recipients.  We present analysis of the amount of
information needed to mount each of these attacks, and provide
experimental results from mounting these attacks on a simulated
mix-net.  Finally, we propose a few mechanisms for resisting these
attacks.
\end{abstract}

%======================================================================
\section{Introduction}
\label{intro}
Since the introduction of mix-nets \cite{chaum-mix} in 1981, a variety
of attacks against such anonymity systems has been proposed and
discussed.  While some of these attacks have been partially addressed
by advancements in mix-net designs\cite{???}, broad classes of
attacks remain effective against current mix-net designs.

One such class of attacks are the {\it long-term intersection attacks}
wherein an eavesdropper observes a large volume of network traffic,
and notices that, over time, that certain recipients are likelier to
receive messages when given senders are transmitting messages.
Categorical defenses against this attack tend to require either an
impractically large amount of cover traffic, or a set of senders with
near-perfect uptimes.

An example of a long-term intersection attack is proposed by Agrawal,
Kesdogan, and Penz\cite{limits-open}.  Their {\it disclosure attack}
assumes a fairly strict model of sender behavior, and works against
only a single batch mix (one that waits until it receives $b$
messages, then reorders and retransmits them all).  Additionally, the
disclosure attack requires an attacker to mount a computationally
difficult and algorithmically complex attack in order to reveal the
connections between senders and recipients.

Danezis presents an algorithmically simpler {\it statistical
  disclosure attack}\cite{statistical-disclosure} that requires far
less computational effort on the behalf of the attacker.  This attack
is far easier to describe and implement, but it assumes the same
restrictive sender and network models as the original disclosure
attack.

In this paper, we extend Danezis's original statistical disclosure
attack to work in less restricted circumstances.  Specifically, if a
given mix-net sender maintains a consistant behavior pattern over
time, we show how an attacker can learn that sender's regular
recipients, even when:
\begin{itemize}
\item The target sender chooses non-uniformly among their
  communication partners, send multiple messages at once, and has
  some some non-repeated recipients.
\item The attacker lacks {\it a priori} knowledge of the network's
  average behavior in the sender's absence.
\item Mixes in the system use a better batching algorithm, such as 
  the timed dynamic-pool algorithm\cite{trickle02} used by
  Mixmaster\cite{mixmaster-spec}, or the generalized mix algorithm
  proposed by \cite{pet2003-diaz}.
\item The sender uses a path through a mix network, instead of just a
  single mix.
\item The sender sends some dummy traffic, delivered to no particular
  recipient.
\item The attacker can only view a subset of the messages entering and
  leaving the network, but this subset includes some messages from
  the sender, and some messages to the sender's recipients.
\item The behavior of the cover traffic generated by other senders
  changes continuously over time.  (We do not address this case
  completely).
\end{itemize}
Each of these variations increases the amount of time the attacker
must spend observing traffic in order to learn the sender's regular
recipients.

Additionally, we show how an attacker can exploit additional
knowledge about the anonymity system (or external to it) to speed up
these attacks.  Such information includes:
\begin{itemize}
\item Statistical linkability between messages.  For example, a pair
  of messages written in the same language is likelier to have been
  written by a single sender than is a pair of messages written in two
  different languages.
\item Full linkability between messages.  For example, if messages
  are pseudonymous, all messages from the same pseudonym are almost
  certainly written by a single sender.
\item {\it A priori} suspicion of certain messages having originated
  or not originated from the target sender.  For example, messages
  written in a language the sender doesn't speak are unlikely to have
  been written by the sender.
\end{itemize}

The attacks in this paper fail to work when:
\begin{itemize}
\item The sender's behavior is not consistent over time.  If the
  sender does not maintain a group of regular recipients for the
  required duration, the attacker cannot learn the sender's behavior.
\item The attacker cannot observer the network's cover behavior.  If
  the sender always sends the same number of messages, the attacker
  may not be able to get a view of the network's exit behavior in the
  sender's absence. % Awkwardly phrased!
\item The attacker cannot tell when the sender is originating
  messages.  This may be the case if the sender is running her own mix
  node, and injecting messages to it directly.
\end{itemize}

We begin in section \ref{previous-work} by presenting a brief
background overview on mix-nets, traffic analysis, the disclosure
attack, and the statistical disclosure attack.  In section
\ref{extending}, we then present our enhancements to the statistical
disclosure attack.  We analyze the effectiveness of these enhancements
in section \ref{analysis}, and present simulated experimental results
in section \ref{simulation}.  We close in section \ref{conclusion}
with recommendations for resisting this class of attacks, implications
for mix-net design, and a set of open questions for future work.

%======================================================================
\section{Previous work}
\label{previous-work}

\subsection{Mix-nets}
[Write something here to tell people what mix-nets are.]

\subsection{Traffic analysis}
[Write an overview of the general history of traffic analysis attacks.]

\subsection{The disclosure attack}
In 200?, Agrawal, Kesdogan, and Pena presented the disclosure
attack\cite{limits-open}, a traffic-analysis attack against a single
sender on a single batch mix.

The disclosure attack assumes that the target sender (``Alice'') sends
messages to $m$ recipients; that Alice sends a single message to one
of them chosen at random per batch of $b$ messages; and that the other
$b-1$ messages in each batch are chosen at random from the set of $N$
possible recipients.

The attacker observes the messages leaving the mix in each batch, and
constructs sets $R_i$ of recipients receiving messages in batch $i$.
The attacker then performs an NP-complete computation to identify $m$
mutually disjoint recipient sets $R_i$, so that each of Alice's
recipients is necessarily contained in exactly one of sets.
Intersecting these sets with future recipient sets reveals Alice's
recipients.

\XXXX{Give the result formulas in the disclosure paper.}

\XXXX{The above may not be 100\% accurate; must re-read the paper.}

\subsection{The statistical disclosure attack}
In 2003, Danezis presented the statistical disclosure
attack\cite{statistical-disclosure}, which makes the same assumptions
as the original disclosure attacks, but is far easier to implement in
terms of storage, speed, and algorithmic complexity.

In the statistical disclosure attack, we model Alice's behavior as an
unknown vector $\vec{v}$ with elements corresponding to the
probability of Alice sending a message to each of the $N$ recipients
in the system.  Of these elements, $m$ will be $1/m$, and $N-m$ will
be $0$.  We model the behavior of the cover traffic as a known
vector $\vec{u}$ each of whose $N$ elements is $1/N$.

The attacker derives from each output round an observation vector
$\vec{o_i}$, each of whose elements corresponds to the probability of
Alice's having sent a message to a particular recipient in that round.
Taking the arithmetic mean $\bar{O}$ of a large set of $vec{o_i}$
gives (by the law of large numbers): 
\[\frac{\vec{v} + (b-1)\vec{u}}{b} \]
From this, the attacker estimates
\[\vec{v} = b\frac{\sum_{i=1}^t \vec{o_i}}{t} - (b-i)\vec{u}\]

\XXXX{Should we report George's findings on preconditions and
 required time to succeed?}

\XXXX{mention that this doesn't take much space or time.}

%======================================================================
\section{Extending the statistical disclosure attack}
\label{extending}

\XXXX{All the math in this section is kinda fishy; we need a
    good stats person to phrase this right.  I may be weighting when
    I shouldn't, modeling things strangely, and so on. ---Nick.}

\subsection{Broadening the attack}
In this subsection, we examine ways in order to extend this attack to
models more closely resembling real mix-nets.

\subsubsection{Complex sender behavior}
First, we relax the requirements related to sender behavior.  In this
model, we allow Alice to send to her recipients with non-uniform
probability, and to send multiple messages in a single batch.  We also
remove the assumption that the attacker has full knowledge the
distribution $\vec{u}$ of cover traffic.

We model Alice's behavior as a probability distribution $p_n(n)$ of
the number of messages she sends each round, and a non-uniform
behavior vector $\vec{v}$ of the recipients to which Alice sends.
Thus, Alice's expected contribution to each round is $p_n(n)\vec{v}$.

To mount the attack in this case, the attacker first obtains an
estimate of $\vec{u}$ by observing a large number $t'$ of batches to
which Alice has {\it not} contributed any messages.  For each such
batch $i$, the attacker constructs a vector $\vec{u_i}$ containing the
$1/b$ for recipients that has received a message in that batch, and
$0$ for recipients that have not.  The attacker then estimates
$\vec{u}$ as:
\[\vec{u} \approx \bar{U} = \frac{1}{t'}\sum_{i=1}^{t'}\vec{u_i} \]

The attacker then observes, for each round in which Alice {\it does}
send a message the number of messages $m_i$ sent by Alice, and
computes $\vec{o_i}$ as before.  Computing the arithmetic mean
gives us
\[\bar{O} = \frac{\vec{o_i}}{\sum t}
            \approx
              \frac{\bar{m}\vec{v} + (b-\bar{m)}\vec{u}}{b} 
              \mbox{\ where\ } \bar{m} = \frac{1}{t}\sum{m_i} \]

From this, the attacker estimates Alice's behavior as
\[\vec{v} \approx \frac{b}{\bar{m}} 
            \left[ \frac{\vec{o_i}}{t} - (b-\bar{m})\bar{U} \right] \]

\subsubsection{Complex mix behavior}
\label{complex-mix}
Most mix-net designers have already abandoned fixed-batch mixes in
favor of other mixing algorithms that better obscure the relation
between senders and recipients.  Such improved algorithms include
timed dynamic-pool mixes, generalized mixes, and randomized versions of
each\cite{trickle02}\cite{pet2003-diaz}.

We treat these batching algorithms as follows\footnote{this approach
  is based on D\'iaz and Serjantov's work in \cite{pet2003-diaz}}: a
mix relays a number of messages at the end of each round, depending
on the number of messages it is currently storing.  All messages in
the mix's pool at the end of a round have an equal probability of
being included in that round's batch.  Thus, we can characterize the
mix's batching algorithm as a probability distribution $P_{MIX}(b|s)$
-- the probability that the mix relays $b$ messages when it has $s$
messages in the pool.  Clearly, $\sum_{b=0}^{s}P_{MIX}(b|s) = 1$, for
all values of s.

  In the case of a timed dynamic-pool mix, this
distribution is:
\[ P_{TDP}(b|s) = 
  \left\{ \begin{array}{l}
    1 \mbox{\ for\ } if s>\mbox{min-pool}, 
      b = min(s-\mbox{min-pool}, \left\lfloor s*\mbox{rate} \right\rfloor) \\
    0 \mbox{\ otherwise} \end{array} \right. \]
For a binomial timed dynamic-pool mix,
\[ P_{BTDP}(b|s) = 
   \left\{ \begin{array}{l} 
           \frac{b_0}{s}^b(1-\frac{b_0}{s})^{s-b} \mbox{\ if\ }
              P_{TDP}(b_0|s) = 1 \\
          0 \mbox{\ otherwise} \end{array} \right. \]

When attacking such a mix, the attacker no longer knows with certainty
which batches of recipients contain a message from Alice.  Instead,
the attacker can only estimate, for each batch of exiting messages,
the probability that the batch includes one of Alice's messages.  

We denote by $P_R(r)$ the probability of a message leaving the mix $r$
rounds after it has arrived.  In future discussion, we assume for the
sake of simplicity that the mix has reached a steady state, so that
$P_R(r) = P_D^r$ where $P_D$ is the probability of a message in the
mix's pool being deferred in any given round.
\footnote{The attacker can estimate $\bar{P_R}$ by
sending occasion test messages through the mix, or by counting the
messages entering and leaving the mix and deducing the pool size.
Dummy messages can impair this attack technique to an extent discussed
in \cite{dummy-msgs}.}
Now, when Alice sends a message in round
$i$, the attacker observes rounds $i$ through some later round $i+k$
with negligible $P_R(i+k)$.  The attacker then weights each of these
rounds by the expected number of messages from Alice that will exit
that round, includes them in an adjusted $\bar{O_w}$, and estimates
$\vec{v}$ as before. \XXXX{Include an equation?}

To estimate $\vec{u}$ in this case, the simplest approach is to
average $\vec{u_i}$ from batches with a negligible probability of
including messages from Alice.  Such batches, however, are not
essential: If the attacker chooses a set of $\vec{u_i}$ such that each
round contains (on average) a small fraction $p_a>0$ of messages from
Alice, the attacker will obtain 
\[\bar{U}' \approx p_a \vec{v} + (1-p_a) \vec{u} \]
and than thus substitute
\[\vec{u} \approx \frac{1}{1-p_a}\left[ \vec{v} - \bar{U}' \right] \]
into the earlier equation for $\bar{O}$ and again solve for $\vec{v}$.

\subsubsection{Mix-nets}
Another way senders behave differently from the model of the
disclosure attack is by directing their messages through a chosen path
in a network of mixes.  While using a mix-net increases the effort an
attacker must spend to observe all messages leaving the system, it
has no additional effect on intersection attacks beyond changing the
delaying characteristics $P_R$ of the anonymity system as introduced
in section \ref{complex-mix}.

Assume for the sake of simplicity that all mixes share a single
$P_R$, and that Alice chooses a path of length $l_0$.  The chance of
the message being delayed by a further $d$ rounds is now
\[  P_R'(l_0+d) = \binom{l_0+d-1}{d} (1-P_D)^{l_0} P_D^d \]
If Alice chooses her path length probabilistically according to
$P_L(l)$, we have 
\[ P_R'(r) = \sum_{l=1}^r P_L(l) \binom{r-1}{r-l} (1-P_D)^l P_d^{r-l} \]

\XXXX{ How can the attacker estimate $P_L$ for Alice?  What if he can't?}

\subsubsection{Dummy traffic}
\label{dummy-traffic}
One strategy for reducing the impact of traffic analysis is for Alice
to periodically send messages into the network that arrive at no
recipient; or to periodically send messages into the network that
arrive at recipients other than those she wants to communicate with.

In the first case, the change in trivial: Alice's behavior vector
$\vec{v}$ no longer adds to $1$, since there is now a probability that
a message from Alice will not reach any recipient.

On the other hand, suppose that some of Alice's dummy traffic reaches
recipients other than those she ordinarily communicates\footnote{This
  discussion ignores the question of how Alice is to generate the text
  of these cover messages.  If some messages in the mix-net are
  unencrypted, it will be difficult or impossible for Alice to compose
  plausible unencrypted cover texts for most possible recipients.  On
  the other hand, if all messages in the mix-net are encrypted, Alice
  can easily give her dummy messages junk padding, encrypted so that
  only the recipient of each dummy message can distinguish it from a
  real message.}.  Suppose that every message Alice sends has a
probability $P_c$ of being a cover message, and that recipients for
cover messages are chosen from a distribution vector $\vec{v_c}$.
Now, instead of converging on Alice's true pattern of recipients
$\vec{v}$, the attacker instead learns
\[ \vec{v'} = (1-P_c)\vec{v} + P_c\vec{v_c} \].
In other words, although the attacker cannot deduce which recipients
receive messages from Alice, he can instead learns a superset of
Alice's recipients.

Clearly, if Alice chooses $\vec{v_c}$ so as to send messages with
equal probability to all recipients, she can completely hide the
actual recipients of her messages.  In practice, however, this
requirement in unrealistic:  doing so will require that she send
messages to {\it every} participant in the system --- potentially as
many recipients as there are active email addresses.  Such a padding
regime results in traffic quadratic in the number of participants in
the system, and thus scales prohibitively poorly.

\XXXX{say more here.  Dummy policies other than N-squared may help a
  little.}

\subsubsection{Partial observation}
Until now, we have required that the attacker, as a global passive
adversary, observe all the messages entering and leaving the system.
(At least, all the messages sent by Alice, and all the messages
reaching Alice's recipients.)  This requirement is not so difficult
as it might seem: in order to be a ``global'' adversary against
Alice, an attacker need only eavesdrop upon Alice, and upon the mixes
that deliver messages to recipients. (Typically, not all mixes do
so.)  In this section, we examine the consequences for a non-global
attacker.

Depending on which points of the network the attacker can observe, a
non-global attacker has different characteristics.  If an attacker
eavesdrops on a fraction of the {\it mixes} in the system, the
attacker receives a random sample\footnote{But possibly a biased
  sample, depending Alice's path selection algorithm} of the
messages entering or leaving the system.  An attacker in this case
will still converge on the same $\bar{O}$ and thus the same
estimation of Alice's behavior, but will require more rounds of
observation to do so.

Alternatively, an attacker who eavesdrops on a fraction of the {\it
  users} in the system receives {\it all} of the messages sent to or
from those users, but no messages to send to other users.  So long as
one of these users is Alice, the attack can still proceed: To such an
attacker, the network is as if the messages sent by Alice to
unobserved recipients were dummy messages.  Now the attack converges
as before, but with only information concerning the observed
recipients:  the attacker learns which of the observed recipients
receive messages from Alice, and which do not.

\subsubsection{Time-variant traffic}
\XXXX{This section is very handwavy.}

If Alice's behavior changes completely and radically over time, a long
term intersection cannot proceed: the attacker cannot make enough
observations of any version of Alice's behavior in order to converge
on a $\bar{v}$ for any of them.

On the other hand, if Alice's behavior $\vec{v}$ remains consistent
while the behavior of the cover traffic $\vec{u}$ changes slowly, the
attacker has some hope.  Rather than estimating a single $\bar{U}$
from observations to which Alice does not contribute, the attacker
estimates a series of successive $\bar{u_i}$ values based on the
average behavior of the network during a comparatively shorter
duration of time.  Now the attacker observes $\vec{o_i}$ as before and
computes the average of $\vec{o_i} - \vec{u_i}$, as before.  Now,
\[ \frac{1}{t}\sum_{i=1}^t \vec{o_i} - \bar{u_i} 
    \propto \vec{v}
\]   
So if an attacker can get good local estimates to $\vec{u}$, the
intersection attack proceeds as before.

\XXXX{(Okay, not quite -- there's a constant factor there too.)}

\subsubsection{Attacking recipients}
As a final (and somewhat anticlimactic) extension to the original
attack, we note that an attacker can mount attacks against recipients
as well as senders with only slightly higher storage, and no increase
in computational cost. 

In this model, the attacker wishes to know which senders are sending
anonymous messages to a given recipient Bob.  The analysis remains the
same: the attacker compares sender behavior in rounds from which Bob
probably receives messages with behavior in rounds from which Bob
probably doesn't receive messages.  The only complication is that the
attacker cannot tell in advance when Bob will receive a message.
Therefore, the attacker maintain accumulate a window of observations at
all times, in Bob later receives a message that may have come from
one of the rounds tin that window.

\subsection{Strengthening the attack}
Our previous extensions to the original statistical disclosure attack
have broadened the original method in order to reveal sender-recipient
links in a broader and increasingly complex ranger of circumstances.
In order to compensate for these circumstances, the attacker has been
forced to observe an increasingly large number of rounds of traffic.

In this section, rather than broadening the attack at the expense of
increased difficulty, we discuss ways to strengthen the attack by
adding additional information the attacker may have.

\subsubsection{Exploiting full message linkability}
The attacker's work can be greatly simplified if some messages leaving
the system are {\it fully linkable}.  Two messages are said to be {\it
  fully linkable} if an observer can tell, with certainty, that they
originate from the same sender. One examples of a system will fully
linkable messages would be a pseudonymity service in which each sender
has one or more pseudonyms, and all delivered messages can be linked
to a pseudonym.

Clearly, the attacker in such a system can trivially link senders to
recipients by linking senders to their pseudonyms.  One way to do this
is by treating classes of fully linkable messages as virtual message
destinations.  Instead of collecting observations $\vec{o_i}$ of
recipients who receive messages in round $i$, the attacker now
collects observations $\vec{o_i}'$ of linkable classes
(e.g. pseudonyms) that receive in round $i$.  Since two distinct
senders don't produce messages in the same linkability class, the
elements of Alice's $\vec{v}$ and the background $\vec{u}$ are now
disjoint, easier for the attacker to separate.

\XXXX{Need to analyze below the circumstances under which this works;
  and to what extent it helps.  It's fine for Nyms, but not for
  everything.}

\subsubsection{Exploiting partial message linkability}
% subsumes \subsubsection{Exploiting partitioning attacks}
In the linkability case above, it's possible that the linkage between
messages may not be full.  In other words, the attacker may be unable
to deduce that two messages are likelier than average to have the same
sender, without being certain that the senders are the same.  For
example, two binary documents written in the same version of MS Word
are likelier to be written by the same sender than two messages
selected at random.  These linkages may be more abstract: an
sophisticated attacker could check for the presence of certain
keywords and try to link messages based on their textual content.

\XXXX{Can we do anything with non-parametric linkages?  IOW, can we
  proceed given an
  arbitrary collection of $P($sender-of-msg$_i$=sender-of-msg$_j$)?}

\XXXX{This section is really about distinguishability, perhaps.}

To exploit these partial linkages, the attacker begins as above by
choosing a set of `linkability classes' (such as languages, patterns
of usage, etc.), and assigning to each observed output message a
probability of belonging to each class.  The attack then collects
observations, noting the number of messages received in each round by
each $\left<\mbox{recipient},\mbox{class}\right>$ tuple.  (If a
message might belong to multiple classes, the attacker sets the
corresponding element of each possible class to the possibility of the
message's being in that class.)  The statistical disclosure attack
proceeds as before, but messages sent to different linkability
classes no longer provide cover for one another.

\XXXX{ This attack seems not to exploit the fact that if Alice sends
  to $<$Word6, Bob$>$, she is likelier to send to $<$Word6,
  Carol$<$ than to $<$Word2K, Carol$>$.}

Other opportunities for partitioning include:
\begin{itemize}
\item \XXXX{list some.}
\end{itemize}

\subsubsection{Exploiting {\it a priori} suspicion}
Finally, the attacker may have reason to believe that some messages
are likelier to have been sent by the target user than others.  For
example, if we believe that Alice speaks English but not Arabic, or
that Alice knows psychology but not astrophysics, then we will
naturally suspect that an English-language message about psychology
is more likely to come from Alice than is an Arabic-language message
about astrophysics.

To exploit this knowledge, an attacker can (as suggested in the
original Statistical Disclosure paper\cite{statistical-disclosure}),
modify the estimated probabilities in $\vec{o_i}$ of Alice having sent
each delivered message.  For example, if 100 messages are sent in a
round, and the attacker judges that 50 of them are twice as likely to
have been sent by Alice than the other 50, the attacker assigns
$2/150$ to each element of $\vec{o_i}$ corresponding to a likely
message, and $1/150$ to each element corresponding to an unlikely
message.

Other opportunities for gaining suspicion include:
\begin{itemize}
\item \XXXX{list some}
\end{itemize}

%======================================================================
\section{Analysis}
\label{analysis}

\XXXX{We need a stats person to analyze how strong the signal must be,
  and how long an attacker should expect to have to observe in each
  case.}

%======================================================================
\section{Simulation results}
\label{simulation}

\XXXX{I'm trying to come up with these now.}

\subsection{Complex sender behavior}

\subsection{Attacking a timed dynamic-pool mix}

\subsection{Attacking a mix-net}

\subsection{Attacking a mix-net with dummy traffic}

\subsection{The effects of partial observation}

\subsection{Attacking a nym service (full linkability)}

\subsection{Attacking a fragmented community (partial linkability)}

\subsection{Attacking a suspicious user}

% Not simulated: time-variant traffic, attacks against multiple
% senders and recipients, partitioning attack

%======================================================================
\section{Conclusions}
\label{conclusion}

\subsection{Implications for mix-net design}
\XXXX{I think I'll hold off drawing any conclusions until we have
  simulated/analytic results.  If we're lucky, the results will
  suggest some method that resists intersection attacks without
  sending every single round or $N^2$ padding, and we can tell people
  to do that.  If not, we'll need to suggest that users and networks
  be designed to maximize the number of observations an attacker
  needs; that users need to send dummy messages often enough to smooth
  out their $P_R$ from round to round; and ...?}

\subsection{Future work}
\XXXX{One big question I want answered is: will any of the
  information-theoretic measurements turn out to be useful in
  analyzing this attack?  If not, why not?}

%======================================================================
\bibliographystyle{plain} \bibliography{e2e-traffic}

\end{document}

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/