[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] add in vitaly"s model section



Update of /home/freehaven/cvsroot/doc/sync-batching
In directory moria.mit.edu:/home2/arma/work/freehaven/doc/sync-batching

Modified Files:
	sync-batching.pdf sync-batching.tex 
Added Files:
	badmix.eps badmix.pdf goodmix.eps goodmix.pdf model.tex 
Log Message:
add in vitaly's model section


--- NEW FILE: badmix.eps ---
%!PS-Adobe-2.0 EPSF-2.0
%%Title: /export/u1/homes/shmat/work/mixnets/badmix.eps
%%Creator: XV Version 3.10a  Rev: 12/29/94  -  by John Bradley
%%BoundingBox: 192 235 504 497
%%Pages: 1
%%DocumentFonts:
%%EndComments
%%EndProlog

%%Page: 1 1

% remember original state
/origstate save def

% build a temporary dictionary
20 dict begin

% define string to hold a scanline's worth of data
/pix 936 string def
[...7136 lines suppressed...]
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff


showpage

% stop using temporary dictionary
end

% restore original state
origstate restore

%%Trailer

--- NEW FILE: badmix.pdf ---
(This appears to be a binary file; contents omitted.)

--- NEW FILE: goodmix.eps ---
%!PS-Adobe-2.0 EPSF-2.0
%%Title: /export/u1/homes/shmat/work/mixnets/goodmix.eps
%%Creator: XV Version 3.10a  Rev: 12/29/94  -  by John Bradley
%%BoundingBox: 192 235 504 497
%%Pages: 1
%%DocumentFonts:
%%EndComments
%%EndProlog

%%Page: 1 1

% remember original state
/origstate save def

% build a temporary dictionary
20 dict begin

% define string to hold a scanline's worth of data
/pix 936 string def
[...7136 lines suppressed...]
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff


showpage

% stop using temporary dictionary
end

% restore original state
origstate restore

%%Trailer

--- NEW FILE: goodmix.pdf ---
(This appears to be a binary file; contents omitted.)

--- NEW FILE: model.tex ---

\newcommand{\ie}{\emph{i.e.}}
\newcommand{\eg}{\emph{e.g.}}
\newcommand{\etc}{\emph{etc.}\xspace}

\newcommand{\PP}{\mathsf{p}}
\newcommand{\EE}{\mathcal{E}}

\section{Modeling Methodology}

The basic model underlying our comparative study of mix network topologies
is \emph{mixing as probabilistic permutation}.  At the cost of making a
few simplifying, yet realistic assumptions about distribution of message
traffic in the network, we obtain a tractable Markov chain model,
and use a fully automated probabilistic model checking techniques to
compute probability distributions for different network topologies and
configurations.  We use \emph{entropy} of each topology's respective
distribution as our comparison metric, in the spirit of~\cite{DS02}.


\subsection{Mixing as permutation}

Consider a single batch of messages entering the mix network in the same
batch period.  We can view each message $m_1,\ldots,m_N$ as occupying
a certain position in a (virtual) input array of length $N$, where $N$
is the size of the batch.  Suppose the adversary targets a particular
message $m$ in position $i$.  Without loss of generality, assume that
$i=1$ (we can always re-number the input array so that the targeted
message is in the first slot).

Having passed the network, all $N$ messages re-appear and may be observed
by the adversary again.  Of course, if some of the network nodes have
been compromised by the adversary, the adversary will have access to their
observations, too.  Let $m'_1,\ldots,m'_N$ be the (virtual) output array.
Due to the mixing performed by the network, it may or may not be the case
that $m'_i=m_i$, \ie, the messages have been probabilistically permuted
by the network.  We will refer to the following discrete probability
distribution as the \emph{mixing distribution} of the network:
\begin{equation}
\label{maindist}
\begin{array}{l}
\PP_1\quad\ldots\quad\PP_N \\
\mbox{where}\ \PP_i=\mathit{Prob}(m'_i=m)
\end{array}
\end{equation}
Informally, each $\PP_i$ is the probability that the targeted
message $m$ re-appears in the $i$th position of the output buffer.

In our basic model, we assume that the network doesn't lose messages
(this restriction is not critical and may be relaxed, if necessary).
Therefore, $\sum_{1 \leq i \leq N}\PP_i = 1$, and $\PP_i$ form a proper
discrete probability distribution.  Following~\cite{DS02}, we calculate
\emph{entropy} of this distribution as
\[
\EE = \sum_{1 \leq i \leq N} \PP_i \log_2(\PP_i)
\]
Very informally, entropy is a measure of ``randomness'' in a distribution.
Other things being equal, network topologies that provide mixing
distributions associated with higher entropy values are considered
preferable.


\subsection{Overview of the model}

We use the standard techniques of probabilistic verification and model the
mixnet network as a discrete-time Markov chain.  Formally, a \emph{Markov
chain} consists in a finite set of states $S$, the initial state $s_0$,
the transition relation $T:S\times S\rightarrow[0,1]$ such that 
$\forall s\in S\ \sum_{s'\in S} T(s,s')=1$, and a labeling function.

In our model, the states of the Markov chain will represent the position
of the targeted message $m$ in the (virtual) buffer of $N$ messages as
$m$ moves through the network.  The initial state $s_0$ corresponds
to the message being in the first slot of the input array
prior to entering the mix network.
Every probabilistic state transition $s \rightarrow s'$ is associated with
$m$ passing through a single mix within the network.  Intuitively, $s$
can be interpreted as $m$'s position before passing through the mix,
and $s'$ as its position afterwards.

For the purposes of computing distribution (\ref{maindist}), we are
interested in deadlock states, \ie, those corresponding to the situation
in which $m$ has passed through all mixes in its path and exited the
mix network with no further transitions possible.  Suppose a special
predicate $\mathit{done}$ is true in such states.  Then $\PP_i$ is simply
$\mathit{Prob}[\mathcal{U}(s=i\ \wedge \mathit{done})]$ evaluated
in the initial state $s_0$.  (Informally, formula $\mathcal{U}\varphi$
holds if $\varphi$ eventually becomes true.)

We use a probabilistic model checker called PRISM~\cite{Pri} to compute
these probabilities automatically.  To simplify exposition, we omit
the details of the underlying model checking algorithms.  A detailed
explanation of how probabilistic model checking is used to analyze
randomized routing protocols can be found in~\cite{S04}.


\subsection{Single-mix model}
\label{singlemix}

Consider a single mix receiving a batch of $K$ messages, including the
targeted message $m$.  Assume that the mix has not been compromised and
that it collects all $K$ messages before distributing them to their
respective destinations.  In this case, the mixing performed by the
mix can be interpreted as permutation in a virtual buffer of size $K$.
In particular, the targeted message $m$ appears in any of the $K$
output positions with equal probability after passing through the mix.
Therefore, each mix can be modeled by the following simple Markov chain
(recall that state $s$ represents the current position of message $m$,
and let $t$ be the sequential number of the current hop):

\includegraphics[scale=0.3]{goodmix}

If the mix has been compromised, we assume that it performs no mixing
at all, and thus does not change the position of any message it processes:

\includegraphics[scale=0.3]{badmix}


\subsection{Network model}

We consider several mix network topologies, and compare them under various
assumptions about the density of hostile mixes in the network.  Instead of
assuming a fixed number of hostile mixes, in each scenario we will assume
a fixed \emph{probability} that a randomly selected mix is hostile.

For each topology, the behavior of a single node is modeled as described
in section~\ref{singlemix}.  The main difference between topologies
is how the targeted message moves through the network, resulting in
different mixing distributions~(\ref{maindist}).



\subsubsection{Assumptions.}

We assume that the adversary observes the edge of the network and thus
knows the first mix chosen by the targeted message.  This means that the
randomness of mix selection is ignored for the first hop.  Formally,
we make probability $\PP_i$ conditional on selection of a particular
first mix.  Instead of computing
$\mathit{Prob}[\mathcal{U}(s=i\ \wedge \mathit{done})]$,
we compute
$\mathit{Prob}[\mathcal{U}(s=i\ \wedge \mathit{done}\ |\
        \mbox{mix $x$ was selected as entry point}
        )]$.

We simplify the model by assuming that message load on each internal link
within the network is exactly equal to the statistically expected load
given a particular network topology.  This assumption is approximated
with very high probability when the number of messages in a single batch
is significantly higher than the number of network nodes, and randomness
involved in message routing is obtained from unbiased coin tosses in the
same way for all messages.  Intuitively, suppose there are four mixes in
the first layer of the network, and batch size is 128.  We will analyze
the average-case behavior of the network, \ie, we will assume that each
of the mixes receives exactly 32 messages, even though it is possible
(albeit highly improbable) that in some batch all 128 senders will
randomly choose the same entry mix.

% An alternative to this is to assume instead that the adversary cannot
% observe communication between nodes that he does not control.  Under
% either assumption, 

Under the equal loading assumption, we treat the size of the input/output
buffer for each mix (see section~\ref{singlemix}) as a constant which is
determined only by batch size and network topology, and is independent
of the actual random distribution of a given batch through the network.


\subsubsection{Cascade.}

Consider a 2x2 cascade network as in fig.~\ref{fig:casc-2x2}.  Assume
there are 128 messages in a batch, and that any node has $\frac{1}{4}$
chance of having been compromised.  Let $m$ be the targeted message,
and suppose the adversary observed that the sender of $m$ chose $A$
as the first mix.  Under the equal loading assumption, there are 63
other messages in addition to $m$ that chose $A$ as the first mix.
Without loss of generality, we may assume that $m$ occupies the first
position in $A$'s input buffer of length 64.

With probability $\frac{1}{4}$, mix $A$ is hostile.  In this case,
$m$ remains in the first position after passing through the mix.
With probability $\frac{3}{4}$, mix $A$ is honest.  In this case, $m$
appears in any of the 64 output positions of this mix with probability
$\frac{1}{64}$ (note that $m$ may not appear in the output buffer of
mix $B$).  The resulting probability distribution for $m$'s position
after passing through $A$ is
\begin{equation}
\label{eq:cascfirst}
\begin{array}{c@{\hspace{1em}}c@{\hspace{1em}}c}
\underbrace{\frac{67}{256}}_{\frac{1}{4} \cdot 1 + 
                             \frac{3}{4}\cdot\frac{1}{64}}
&
\underbrace{\frac{3}{256}}_{\frac{3}{4}\cdot\frac{1}{64}}\
\ldots 
& 
0\ 0\ \ldots\ 0
\\
\mbox{position}\ 1 &
\mbox{positions}\ 2..64 &
\mbox{positions}\ 65..128
\end{array}
\end{equation}

Next mix $C$ is pre-determined by network topology, and the distribution
it produces on its messages is the same as~(\ref{eq:cascfirst}).
Combining two distributions, we obtain that $m$ appears in the cascade's
output buffer with following probabilities:
\begin{equation}
\label{eq:cascsecond}
\begin{array}{c@{\hspace{1em}}c@{\hspace{1em}}c}
\underbrace{\frac{5056}{65536}}_{
            \frac{67}{256}\cdot\frac{67}{256} + 
            \frac{3}{256}\cdot(63\cdot\frac{3}{256})}
&
\underbrace{\frac{960}{65536}}_{
            \frac{67}{256}\cdot\frac{3}{256} + 
            \frac{3}{256}\cdot(\frac{67}{256}+62\cdot\frac{3}{256})}\
\ldots
& 
0\ 0\ \ldots\ 0
\\
\mbox{position}\ 1 &
\mbox{positions}\ 2..64 &
\mbox{positions}\ 65..128
\end{array}
\end{equation}

Entropy of this distribution is approximately $5.9082$.  Effective
anonymity set provided by a 2x2 cascade with 25\% density of hostile
nodes and 128 messages per batch is 60 messages.


\subsubsection{Stratified array.}

The procedure for calculating mixing distribution for a stratified array
is essentially the same as for a cascade, but there is an additional
probabilistic choice.  After the message passes through a mix, the next
mix is selected randomly among all mixes in the next layer.

Consider a 2x2 stratified array as in fig.~\ref{fig:sa-2x2}.  Again,
assume there are 128 messages in a batch, $\frac{1}{4}$ chance that
a node is hostile, and that $A$ was selected (in a manner visible
to the adversary) as the first mix.  The mixing performed by any
single mix is exactly the same as in the cascade case, thus mixing
distribution~(\ref{eq:cascfirst}) after the first hop is the same in a
stratified array as in a cascade.  After the first hop, however, mix $C$
is selected only with probability $\frac{1}{2}$, while $D$ may also be
selected with probability $\frac{1}{2}$ (by contrast, in a cascade $C$
is selected with probability $1$).  Distribution~(\ref{eq:cascsecond})
has to be adjusted accordingly:
\[
\begin{array}{c@{\hspace{1em}}c@{\hspace{1em}}c}
\underbrace{\frac{2528}{65536}}_{
            \frac{1}{2}\cdot\frac{5056}{65536}}
&
\underbrace{\frac{480}{65536}}_{
            \frac{1}{2}\cdot\frac{960}{65536}}\
\ldots
& 
\underbrace{\frac{1}{128}}_{
            \frac{1}{2}\cdot\frac{1}{64}}\
\ldots
\\
\mbox{position}\ 1 &
\mbox{positions}\ 2..64 &
\mbox{positions}\ 65..128
\end{array}
\]

Entropy of this distribution is approximately $6.9541$.  Effective
anonymity set provided by a 2x2 stratified array with 25\% density of
hostile nodes and 128 messages per batch is 124 messages.


\subsubsection{Free-route network.}

Probability distribution is computed in exactly the same way for a
free-route network as for a stratified array, except that the entire set
of mixes is treated as a layer.  Consider a 4x2 free-route network as in
fig.~\ref{free-4x2}.  With 128 messages per batch, the buffer for each
mix is 32 messages.  If $A$ is the first mix selected (and is hostile
with probability $\frac{1}{4}$), the probability distribution after the
message passes through $A$ is
\[
\begin{array}{c@{\hspace{1em}}c@{\hspace{1em}}c}
\underbrace{\frac{35}{128}}_{\frac{1}{4} \cdot 1 + 
                             \frac{3}{4}\cdot\frac{1}{32}}
&
\underbrace{\frac{3}{128}}_{\frac{3}{4}\cdot\frac{1}{32}}\
\ldots 
& 
0\ 0\ \ldots\ 0
\\
\mbox{position}\ 1 &
\mbox{positions}\ 2..32 &
\mbox{positions}\ 33..128
\end{array}
\]

The next mix is selected from among all four mixes with equal probability,
producing the following probability distribution:
\[
\begin{array}{c@{\hspace{1em}}c@{\hspace{1em}}c}
\underbrace{\frac{376}{16384}}_{
            \frac{1}{4}\cdot(
            \frac{35}{128}\cdot\frac{35}{128} + 
            \frac{3}{128}\cdot(31\cdot\frac{3}{128}))}
&
\underbrace{\frac{120}{16384}}_{
            \frac{1}{4}\cdot(
            \frac{35}{128}\cdot\frac{3}{128} + 
            \frac{3}{128}\cdot(\frac{35}{128}+30\cdot\frac{3}{128}))}\
\ldots
& 
\underbrace{\frac{1}{128}}_{
            \frac{1}{4}\cdot\frac{1}{32}}\
\ldots
\\
\mbox{position}\ 1 &
\mbox{positions}\ 2..32 &
\mbox{positions}\ 33..128
\end{array}
\]

Entropy of this distribution is approximately $6.9855$ and effective
anonymity set is 127 messages~--- very close to that provided by a
perfect mix network.


Index: sync-batching.pdf
===================================================================
RCS file: /home/freehaven/cvsroot/doc/sync-batching/sync-batching.pdf,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -d -r1.3 -r1.4
Binary files /tmp/cvszjQUea and /tmp/cvsGtfoUg differ

Index: sync-batching.tex
===================================================================
RCS file: /home/freehaven/cvsroot/doc/sync-batching/sync-batching.tex,v
retrieving revision 1.24
retrieving revision 1.25
diff -u -d -r1.24 -r1.25
--- sync-batching.tex	23 Jan 2004 08:02:39 -0000	1.24
+++ sync-batching.tex	23 Jan 2004 18:55:37 -0000	1.25
@@ -41,6 +41,7 @@
 %======================================================================
 \begin{abstract}
 
+XXXneeds an intro sentence.
 In a synchronous batching design, each batch of messages enters the
 mix network together, and the messages proceed in lock step through
 the network. We show that a synchronous batching strategy can be used
@@ -364,9 +365,6 @@
 We use the information theory approach from \cite{Diaz02,Serj02} to
 measure expected entropy for each scenario.
 
-\subsection{Scenario 1: network of cascades}
-walk us through calculating entropy for a 2x2 cascade network.
-
 \begin{figure}
 \begin{minipage}[t]{3.5cm}
 \mbox{\epsfig{angle=0,figure=casc-2x2,width=3.5cm}}
@@ -388,13 +386,8 @@
 \hfill
 \end{figure}
 
-\subsection{Scenario 2: stratified network}
-walk us through calculating entropy for a 2x2 stratified (4 nodes)
+\input{model}
 
-\subsection{Scenario 3: free-route}
-walk us through calculating entropy for a 4x2 free-route (4 nodes)
-point out that we could use 16x4 for a fair comparison, or we could
-use $16 \times \ell$ to get more or less anonymity.
 
 
 \subsection{further Assumptions}
@@ -452,6 +445,8 @@
 
 \subsection{Choosing the same node twice in a row}
 
+XXXneed to rewrite this subsection.
+
 Conventional wisdom (see e.g.~\cite{minion-design}) suggests that
 in a free-route network, Alice should never choose the same node twice
 in a row when picking her path. The goal is to reduce the chance that
@@ -614,6 +609,11 @@
 
 \subsection{Robustness of Message Delivery}
 
+XXXPut the graph of 1x16 vs 16x16 entropy, as motivation for robustness
+analysis.
+
+XXXpoint out that we could use 16x4 for a fair comparison, or we could
+use $16 \times \ell$ to get more or less anonymity.
 
 % somewhere, cite the 1990 paper about cascades that can skip nodes
 
@@ -846,7 +846,7 @@
 
 
 %\section*{Acknowledgements}
-David Hopwood for the initial impetus and ideas; Camilla Fox, Richard
+David Hopwood for the initial impetus and ideas; Camilla Fox, Rachel
 Greenstadt, Chris Laas, and Itamar Shtull-Trauring for probability
 discussions;
 

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/