[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[freehaven-cvs] write the Average Entropy vs Actual Entropy subsec (...

To: freehaven-cvs@freehaven.net
Subject: [freehaven-cvs] write the Average Entropy vs Actual Entropy subsec (...
From: arma@seul.org (Roger Dingledine)
Date: Fri, 23 Jan 2004 03:02:41 -0500 (EST)
Delivered-to: archiver@seul.org
Delivered-to: freehaven-cvs-outgoing@seul.org
Delivered-to: freehaven-cvs@seul.org
Delivery-date: Fri, 23 Jan 2004 03:03:11 -0500
Reply-to: freehaven-dev@freehaven.net
Sender: owner-freehaven-cvs@freehaven.net
Update of /home/freehaven/cvsroot/doc/sync-batching
In directory moria.mit.edu:/home2/arma/work/freehaven/doc/sync-batching

Modified Files:
	sync-batching.tex sync-batching.bib 
Log Message:
write the Average Entropy vs Actual Entropy subsec (woo!)

also clean up the abstract, sec1, and much of the 'other analysis' section


Index: sync-batching.tex
===================================================================
RCS file: /home/freehaven/cvsroot/doc/sync-batching/sync-batching.tex,v
retrieving revision 1.23
retrieving revision 1.24
diff -u -d -r1.23 -r1.24
--- sync-batching.tex	23 Jan 2004 00:54:38 -0000	1.23
+++ sync-batching.tex	23 Jan 2004 08:02:39 -0000	1.24
@@ -24,11 +24,10 @@
 %\title{The Disadvantages of Cascade Mix Networks and How to Overcome Them}
 %\title{The Advantages of Free-Route Mix Networks
 %\title{Synchronous Batching:\\Cascade Networks vs Free-Route Networks
-%\title{Synchronous Batching:\\From Cascade Networks to Free Routes
+\title{Synchronous Batching:\\From Cascade Networks to Free Routes
 %\title{Mixing in Step:\\Synchronous Batching from Cascade Networks to
 %  Free Routes
-\title{Groove Mixing:\\Synchronous Batching from Cascade Networks to
-  Free Routes
+%\title{Groove Mixing:\\Synchronous Batching from Cascade Networks to Free Routes
 \thanks{Portions of this paper were inspired by discussions with David
 Hopwood. We would consider him an author, but have been unable to contact
 him since beginning the paper. We'll keep trying.}}
@@ -45,10 +44,11 @@
 In a synchronous batching design, each batch of messages enters the
 mix network together, and the messages proceed in lock step through
 the network. We show that a synchronous batching strategy can be used
-in various topologies including a free-route network (a topology where
-senders choose paths freely), and that such topologies can provide better
-anonymity and better message reliability in the face of node failure
-than a cascade network (where senders choose from a set of fixed paths).
+in various topologies, including a free-route network, in which senders
+choose paths freely, and a cascade network, in which senders choose from
+a set of fixed paths. We show that free-route topologies can provide
+better anonymity as well as better message reliability in the event of
+node failure.
 
 \end{abstract}
 %======================================================================
@@ -61,20 +61,26 @@
 message happens to stand out from the others \cite{disad-free-routes},
 and an active adversary can manipulate the network to separate one
 message from the others via blending attacks \cite{trickle02}.
-Berthold et al argue in \cite{disad-free-routes} that partitioning
+Berthold et al.~argue \cite{disad-free-routes} that partitioning
 opportunities arise because the networks use a \emph{free-route}
 topology---one where the sender can choose the mixes that make up her
 message's path. They suggest instead a \emph{cascade network} topology,
 where all senders choose from a few fixed paths through the mix network.
 
-In this paper we argue that the cascade design resolves the attacks not
-because of its network topology, but because of a property of its batching
-strategy. We find that this \emph{synchronous batching} approach prevents
-the attacks in \cite{disad-free-routes}, even when free routes are used.
-We explore three topologies with synchronous batching---cascade,
-stratified (a restricted route), and free-route---and find that the
-free-route network provides the highest expected anonymity as well as
-the best robustness to node failure.
+%In this paper we argue that the cascade design resolves the attacks not
+%because of its network topology, but because of a property of its batching
+%strategy. 
+%We explore this \emph{synchronous batching} approach in Section 2.
+%We find that this \emph{synchronous batching} approach prevents
+
+In this paper we argue that the cascade design resolves these attacks
+because it uses a \emph{synchronous batching} strategy, not because it
+uses a particular network topology. We show that synchronous batching
+prevents these attacks even when free routes are used. Further, we
+explore three topologies with synchronous batching---cascade, stratified
+(a restricted route), and free-route---and find that the free-route
+network provides the highest expected anonymity as well as the best
+robustness to node failure.
 
 %and provides comparable latency to a cascade topology,
 
@@ -149,7 +155,7 @@
 Define the {\em width} $w$ of a mix-net using synchronous batching to
 be the number of nodes that simultaneously process messages in each hop
 period. (If this is not constant, we can still talk about the maximum,
-minimum and mean width.)
+minimum and mean width.) Thus when $w=1$ we have a cascade.
 
 The latency is between $\ell t_\mathrm{hop}$ and $t_\mathrm{batch} +
 \ell t_\mathrm{hop}$, depending on when the message is submitted.
@@ -322,8 +328,8 @@
 
 %adversaries: weak adversary beats cascades but not free-routes.
 
-Clearly much work has been done to address blending attacks, and the
-current solutions appear applicable to synchronous free-route networks
+Clearly much work has been done to address blending attacks, and many
+solutions appear applicable to synchronous free-route networks
 as well as cascade networks.
 
 \section{Scenarios}
@@ -445,27 +451,24 @@
 \label{sec:other}
 
 \subsection{Choosing the same node twice in a row}
-%{Choose path hops with or without replacement?}
-%%'replacement' implies over the whole path, not just adjacent.
 
-Conventional wisdom (see e.g.~\cite{disad-free-routes}) suggests that
+Conventional wisdom (see e.g.~\cite{minion-design}) suggests that
 in a free-route network, Alice should never choose the same node twice
 in a row when picking her path. The goal is to reduce the chance that
-the path contains only bad nodes. But we find that as the size of the
-network grows, this increased complexity in path selection has little
+the path contains only bad nodes. But we find that for a sufficiently
+large network, this increased complexity in path selection has little
 impact on Alice's entropy.
 
 Intuitively, when the adversary density is low, entropy will be high
 in either case; whereas when most nodes are owned by the adversary,
 the difference between picking between $B$ and $B-1$ bad nodes is slight.
 
-More formally, if there are $G$ good nodes and $B$ bad nodes, then the
-chance of selecting a bad node as your next link is $\frac{B-1}{G+B}$ if
-the current node is bad and $\frac{B}{G+B}$ if your current node is good.
-The difference between the two is only $\frac{1}{G+B}$. In particular,
-the difference does not depend on what fraction of the nodes are bad. For
-reasonably large values of $G+B$ (total number of nodes), the shift
-in probability distribution is negligible.
+More formally, for $G$ good nodes and $B$ bad nodes, the
+chance of selecting a bad node next is $\frac{B-1}{G+B}$ if
+the current node is bad and $\frac{B}{G+B}$ otherwise.
+The difference is only $\frac{1}{G+B}$: it does not depend on what
+fraction of the nodes are bad. As the network size grows,
+the shift in probability distribution becomes negligible.
 
 % Mention that we have that entropy different for each of \ell hops.
 % But it's just \ell times the above negligible difference, right? -RD
@@ -475,65 +478,68 @@
 % reduce entropy in scenarios with low adversary density, because he
 % knows she doesn't exit from her entry node. Hm. -RD
 
-\subsection{Reputations and node preference}
-
-most deployed systems let users choose their preferred entry or
-exit hops, which will mess with our distribution. but this is
-ok because after just a few hops of mixing, the distribution
-of the incoming batch doesn't matter as much anymore.
-
-reputation systems a la mix-acc and casc-rep make people prefer one
-node over another, \emph{throughout} the mix network, not just at entries or
-exits. but we have to tolerate this sort of thing. one approach would be
-to put like-reputationed nodes in the same column of the SA or in the same
-row of the cascade \cite{casc-rep}, so distribution (at each hop or in each
-cascade, respectively) will be more uniform.
-
-\subsection{Not enough input messages}
+\subsection{Average Entropy vs Actual Entropy}
 
-the graphs and analysis we've done above is for expected entropy. think
-of it like the behavior for very large batches. but in reality the batch
-size may be quite small, and we need to consider the variance we might
-get in entropy depending on how the path choices actually turn out.
+The graphs and analysis above are for average entropy---the network's
+expected behavior for very large batches. But in reality the batch size
+may be quite small, and each sender chooses paths independently from the
+others. We must consider the variance we might get in entropy depending
+on how the path choices actually turn out.
 
-section 5.1 of \cite{danezis:pet2003} talks about some approaches to
-arguing that the actual behavior is very likely to approximate the
-expected behavior. in particular, he argues based on the following
-assumptions that foo. these assumptions are close enough to our situation
-because bar. ideally we'll do some numbers here or otherwise argue that
-128 is a big enough number.
+For $m$ messages to $u$ buckets, we find the chance that any bucket
+will have less than $p$ messages based on Maxwell-Boltzmann statistics
+and inclusion-exclusion:
 
-point out that the variance is likely to be higher for free-route because
-the links have a smaller expected number of messages, to variations have
-a bigger impact. i guess stratified is in the middle and cascades are
-pretty good?
+\begin{eqnarray*}
+{u \choose 1} \sum_{i=0}^p (\frac1{u})^i (1-\frac1{u})^{m-i} {m \choose i}
+- {u \choose 2} \sum_{i=0}^p \sum_{j=0}^p
+  (\frac1{u})^i (\frac1{u})^j (1-\frac{2}{u})^{m-i-j} {m \choose i,j} \\
++ {u \choose 3} \sum_{i=0}^p \sum_{j=0}^p \sum_{k=0}^p
+  (\frac1{u})^i (\frac1{u})^j (\frac1{u})^k (1-\frac{3}{u})^{m-i-j-k} {m \choose i,j,k}
+- \dots
+\end{eqnarray*}
 
-george also talks about padding to guarantee one message on each link,
-to prevent the most trivial of traffic analysis attacks. is it worthwhile
-to prevent so trivial an attack? is the slightly-less-trivial attack
-significantly harder?
-talk about the possible uses of some minimum level of padding on each
-link.
+For $m=128$ messages and $u=4$ nodes (i.e.~cascade or stratified network),
+the chance of any node getting less than 16 messages (compared to
+the 32 we expect each to get) is $6*10^{-4}$---meaning with very high
+probability the average entropy approximates the behavior we will see
+in reality. On the other hand, for $u=16$ nodes (free-route), 48\% of
+the time some node will get less than half the expected number. It is
+not until a batch size of 480 that this metric reaches 1\%.
 
-clearly more research remains.
+This result is plausible: each link on a free-route network has a smaller
+expected number of messages, so variations have a bigger impact. Whether
+it is acceptable depends on a number of factors. First, how large do we
+expect batches to be in reality? If 500 messages or more, then we're fine.
+Second, how bad is it when a link varies by half the expected volume? If
+we change our metric to require at least 2 messages on each link for
+$m=128$, then we find that only 1\% of the cases fall outside this value.
 
-\subsection{Flooding attacks to degrade anonymity}
+Danezis also considers this issue of variance from average entropy for his
+mix-net design based on sparse expander graphs~\cite{danezis:pet2003}. He
+argues that having at least one message on each link is sufficient for
+basic protection, and he uses a similar approach to show that his design
+achieves this distribution with high probability. He further raises the
+idea of padding unused links to guarantee one message on each link, with
+the aim of preventing trivial traffic analysis attacks. Is it worthwhile
+to prevent this simple attack? Are all other attacks significantly
+harder? Clearly more research remains.
 
-in section \ref{subsec:blending} we talk about techniques to prevent
-or detect message dropping or message substituting. but what about the
-adversary adding messages to the batch at the beginning?
+\subsection{Flooding attacks to degrade anonymity or service}
 
-argue that if honest senders submit k of the n messages, then alice will
-still be assured she can expect entropy based on k messages. that is,
-the entropy with a baseline network plus hostile messages is the same as
-the entropy of the baseline network by itself. this is different from
-the pooling batching strategy \cite{trickle02}, because there messages
-from the adversary will influence the behavior (and thus entropy) of
-alice's message.
+In section \ref{subsec:blending} we talk about techniques to detect or
+prevent misbehavior by the mixes (dropping or substituting messages). But
+what if the adversary simply submits more messages to the batch?
 
-\subsection{Flooding attacks to degrade service}
+It turns out that as long as $k$ of the $n$ input messages come from
+honest senders, Alice will still be assured that she can expect entropy
+based on a batch of $k$ messages. That is, the entropy of a baseline
+network (all-honest senders) plus hostile messages is at least the entropy
+of the baseline network by itself. This is different from the pooling
+batching strategy \cite{trickle02}, where messages from the adversary
+will influence the behavior (and thus entropy) of Alice's message.
 
-on the other hand, there's also the issue that directed floods can fill up
+On the other hand, there's also the issue that directed floods can fill up
 nodes. we might use techniques where mixes can prove that
 they passed the message on from before \cite{phillipe-manuscript};
 thus we can be certain that floods start at the beginning of the batch.
@@ -609,7 +615,7 @@
 \subsection{Robustness of Message Delivery}
 
 
-
+% somewhere, cite the 1990 paper about cascades that can skip nodes
 
 It might seem from Fig.~\ref{fig:?} that the best anonymity is
 achieved with a 16x16 free-route network. There is almost no falloff
@@ -826,7 +832,23 @@
 Comparison to Mixminion: No need for a replay cache. But less robust to
 transient failures.
 
+Reputations and node preference
+Most deployed systems let users choose their preferred entry or exit
+hops, e.g.~based on trust. which will mess with our distribution. but
+this is ok because after just a few hops of mixing, the distribution of
+the incoming batch doesn't matter as much anymore.
+Reputation systems a la mix-acc and casc-rep make people prefer one node
+over another, \emph{throughout} the mix network, not just at entries or
+exits. but we have to tolerate this sort of thing. one approach would
+be to put like-reputationed nodes in the same column of the SA or in
+the same row of the cascade \cite{casc-rep}, so distribution (at each
+hop or in each cascade, respectively) will be more uniform.
+
+
 %\section*{Acknowledgements}
+David Hopwood for the initial impetus and ideas; Camilla Fox, Richard
+Greenstadt, Chris Laas, and Itamar Shtull-Trauring for probability
+discussions;
 
 %======================================================================
 

Index: sync-batching.bib
===================================================================
RCS file: /home/freehaven/cvsroot/doc/sync-batching/sync-batching.bib,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -d -r1.3 -r1.4
--- sync-batching.bib	22 Jan 2004 09:24:00 -0000	1.3
+++ sync-batching.bib	23 Jan 2004 08:02:39 -0000	1.4
@@ -413,3 +413,14 @@
   www_pdf_url = {http://www.cl.cam.ac.uk/users/gd216/p125_danezis.pdf},
 }
 
+@inproceedings{danezis:pet2003,
+  title = {Mix-networks with Restricted Routes},
+  author = {George Danezis},
+  booktitle = {Proceedings of Privacy Enhancing Technologies workshop (PET 2003)},
+  year = {2003},
+  month = {March},
+  editor = {Roger Dingledine},
+  publisher = {Springer-Verlag, LNCS 2760},
+  www_pdf_url = {http://www.cl.cam.ac.uk/~gd216/ExpMix.pdf},
+}
+

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/
Prev by Author: [freehaven-cvs] a pdf for fig5a
Next by Author: [freehaven-cvs] commit the script i used to calc actual entropy vs a...
Previous by thread: [freehaven-cvs] a pdf for fig5a
Next by thread: [freehaven-cvs] commit the script i used to calc actual entropy vs a...
Index(es):
- Author
- Thread