[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] rearrange a lot and add notes for next writings



Update of /home/freehaven/cvsroot/doc/sync-batching
In directory moria.mit.edu:/home2/arma/work/freehaven/doc/sync-batching

Modified Files:
	sync-batching.tex 
Log Message:
rearrange a lot and add notes for next writings


Index: sync-batching.tex
===================================================================
RCS file: /home/freehaven/cvsroot/doc/sync-batching/sync-batching.tex,v
retrieving revision 1.20
retrieving revision 1.21
diff -u -d -r1.20 -r1.21
--- sync-batching.tex	22 Jan 2004 09:09:32 -0000	1.20
+++ sync-batching.tex	22 Jan 2004 12:57:35 -0000	1.21
@@ -145,59 +145,27 @@
 received, so that it
 cannot be delayed by an attacker.
 
-%The set of messages sent to a node in a given hop period is called a
-%mini-batch, to distinguish it from the set of messages being sent
-%through the network as a whole. % [[need better name?]]
-
 Define the {\em width} $w$ of a mix network using synchronous batching to
 be the number of nodes that simultaneously process messages in each hop
 period. (If this is not constant, we can still talk about the maximum,
 minimum and mean width.)
 
-An important constraint on the network structure is the maximum
-bandwidth (traffic in unit time) through any given node.
-Assume that sending a message over a single hop consumes a fixed
-amount of network traffic; we can then use that as the unit for
-traffic.
-
 The latency is between $\ell t_\mathrm{hop}$ and $t_\mathrm{batch} +
 \ell t_\mathrm{hop}$, depending on when the message is submitted.
-Typically we would have $t_\mathrm{hop} < t_\mathrm{batch}/\ell$, so the
-latency is at most $2t_\mathrm{batch}$, independent of the path length.
 
+we have two choices for $t_\mathrm{hop}$ --- do we make it tbatch/ell,
+or do we make it tbatch? which we want depends on other requirements,
+stay tuned.
 
-%[Or for free-route topology, we could let $t_\mathrm{hop}$ be
-%$t_\mathrm{batch}$, and now
-%it takes $\ell$ batch times to go through, but it mixes with the other
-%batches.]
-
-Let $T_\mathrm{batch}$ be the expected throughput in a single batch period,
-i.e.~the number of messages that go through the network in a batch.
-We can give nodes the maximum opportunity to make use of the available
-bandwidth by setting $t_\mathrm{hop} \simeq t_\mathrm{batch}/\ell$.
-
-If the available nodes were used optimally (this is a big if),
-the bandwidth required through each node in this simplified case
-will be $\frac{T_{batch}}{wt_{hop}} = \frac{nT_{batch}}{wt_{batch}}$.
+[Typically we would have $t_\mathrm{hop} < t_\mathrm{batch}/\ell$, so the
+latency is at most $2t_\mathrm{batch}$, independent of the path length.
+Or for free-route topology, we could let $t_\mathrm{hop}$ be
+$t_\mathrm{batch}$, and now it takes $\ell$ batch times to go through,
+but it mixes with the other batches.]
 
 \section{Related work}
 \label{sec:related}
 
-Which system is advantageous depends on the adversary model and the
-desired security properties. Against an adversary who can observe only
-a relatively small percentage of nodes, links, or mixnet entry and
-exit points a free route has trememdous advantage in protecting
-average expected anonymity over a cascade. This is because the
-adversary will not be able to see as many messages; he is simply
-spread to thin. In a cascade the adversary can focus on the head and tail
-and see all the messages entering or leaving the network. On
-the other hand,
-
-A completely
-free route network will have the advantage in a network where the adversary
-can only choose a small number of points because
-messages can enter and exit the network anywhere. 
-
 \subsection{Sync-batching timing model and protocol}
 
 Dingledine et al.~present in \cite{mix-acc} a mix network that uses
@@ -287,6 +255,7 @@
 environment. We discuss them next.
 
 \subsection{Blending attacks}
+\label{subsec:blending}
 
 Active attacks where the adversary targets a message by manipulating
 the other messages in the system are a widespread problem in mix-based
@@ -303,9 +272,9 @@
 mixing at that node \cite{web-mix}. Using this approach for cascades is
 clearly efficient, because Alice need obtain tickets only for her chosen
 cascade~\cite{disad-free-routes}, but her anonymity set is still limited
-to that one cascade. We suggest that other topologies may be able to
+to that one cascade. We conjecture that other topologies may be able to
 give equivalent anonymity while only requesting tickets from a fraction
-of the mixes, but we leave that proof to future work. A bigger problem
+of the mixes, but we leave that analysis to future work. A bigger problem
 with the ticket scheme, however, is the feasibility of requiring all users
 to register with the mixes: it is hard to imagine that attackers can be
 excluded from being registered in an open network~\cite{sybil}. Other
@@ -355,7 +324,6 @@
 %We further discuss the impact of blending
 %attacks on entropy in Section~\ref{subsec:blending}.
 
-\subsection{Danezis's restricted routes paper}
 
 
 \section{Scenarios}
@@ -364,6 +332,18 @@
 explain threat model: adversary can see all incoming and outgoing
 messages at the network edges
 and has a chance for each node of compromising it
+Which system is advantageous depends on the adversary model and the
+desired security properties. Against an adversary who can observe only a
+relatively small percentage of nodes, links, or mixnet entry and exit
+points a free route has trememdous advantage in protecting average
+expected anonymity over a cascade. This is because the adversary will
+not be able to see as many messages; he is simply spread to thin. In
+a cascade the adversary can focus on the head and tail and see all the
+messages entering or leaving the network. On the other hand,
+A completely free route network will have the advantage in a network
+where the adversary can only choose a small number of points because
+messages can enter and exit the network anywhere.
+
 
 he can't control which node he gets, because we use a
 communal-randomness-based topology randomization a la casc-rep.
@@ -461,7 +441,7 @@
 
 \clearpage
 
-\section{Other considerations}
+\section{Other considerations in the analysis}
 \label{sec:other}
 
 \subsection{Choosing the same node twice in a row}
@@ -487,9 +467,6 @@
 reasonably large values of $G+B$ (total number of nodes), the shift
 in probability distribution is negligible.
 
-
-\subsection{Robustness of Message Delivery}
-
 % Mention that we have that entropy different for each of \ell hops.
 % But it's just \ell times the above negligible difference, right? -RD
 
@@ -498,11 +475,152 @@
 % reduce entropy in scenarios with low adversary density, because he
 % knows she doesn't exit from her entry node. Hm. -RD
 
+\subsection{Reputations and node preference}
+
+most deployed systems let users choose their preferred entry or
+exit hops, which will mess with our distribution. but this is
+ok because after just a few hops of mixing, the distribution
+of the incoming batch doesn't matter as much anymore.
+
+reputation systems a la mix-acc and casc-rep make people prefer one
+node over another, \emph{throughout} the mix network, not just at entries or
+exits. but we have to tolerate this sort of thing. one approach would be
+to put like-reputationed nodes in the same column of the SA or in the same
+row of the cascade \cite{casc-rep}, so distribution (at each hop or in each
+cascade, respectively) will be more uniform.
+
+\subsection{Not enough input messages}
+
+the graphs and analysis we've done above is for expected entropy. think
+of it like the behavior for very large batches. but in reality the batch
+size may be quite small, and we need to consider the variance we might
+get in entropy depending on how the path choices actually turn out.
+
+section 5.1 of \cite{danezis:pet2003} talks about some approaches to
+arguing that the actual behavior is very likely to approximate the
+expected behavior. in particular, he argues based on the following
+assumptions that foo. these assumptions are close enough to our situation
+because bar. ideally we'll do some numbers here or otherwise argue that
+128 is a big enough number.
+
+point out that the variance is likely to be higher for free-route because
+the links have a smaller expected number of messages, to variations have
+a bigger impact. i guess stratified is in the middle and cascades are
+pretty good?
+
+george also talks about padding to guarantee one message on each link,
+to prevent the most trivial of traffic analysis attacks. is it worthwhile
+to prevent so trivial an attack? is the slightly-less-trivial attack
+significantly harder?
+talk about the possible uses of some minimum level of padding on each
+link.
+
+clearly more research remains.
+
+\subsection{Flooding attacks to degrade anonymity}
+
+in section \ref{subsec:blending} we talk about techniques to prevent
+or detect message dropping or message substituting. but what about the
+adversary adding messages to the batch at the beginning?
+
+argue that if honest senders submit k of the n messages, then alice will
+still be assured she can expect entropy based on k messages. that is,
+the entropy with a baseline network plus hostile messages is the same as
+the entropy of the baseline network by itself. this is different from
+the pooling batching strategy \cite{trickle02}, because there messages
+from the adversary will influence the behavior (and thus entropy) of
+alice's message.
+
+\subsection{Flooding attacks to degrade service}
+
+on the other hand, there's also the issue that directed floods can fill up
+nodes. we might use techniques where mixes can prove that
+they passed the message on from before \cite{phillipe-manuscript};
+thus we can be certain that floods start at the beginning of the batch.
+
+point out that this flooding issue is a problem for sync-batching in any
+topology. argue that fraction of adversary messages is proportional to
+the max size of the flooding attack.
+
+but it's a hard problem.
+
+
+\section{Other metrics for comparison}
+\label{sec:metrics}
+
+\subsection{Capacity and throughput}
+
+...
+
+An important constraint on the network structure is the maximum
+bandwidth (traffic in unit time) through any given node.
+Assume that sending a message over a single hop consumes a fixed
+amount of network traffic; we can then use that as the unit for
+traffic.
+
+Let $T_\mathrm{batch}$ be the expected throughput in a single batch period,
+i.e.~the number of messages that go through the network in a batch.
+We can give nodes the maximum opportunity to make use of the available
+bandwidth by setting $t_\mathrm{hop} \simeq t_\mathrm{batch}/\ell$.
+
+If the available nodes were used optimally (this is a big if),
+the bandwidth required through each node in this simplified case
+will be $\frac{T_{batch}}{wt_{hop}} = \frac{nT_{batch}}{wt_{batch}}$.
+
+[Should note somewhere that stratified is like classic systolic arrays
+in that a layer can process a new batch as soon as it has sent the
+last with no additional overhead. That's not true of free routes.]
+
+\subsection{Mixing with previous and future batches}
+% not really a metric, but fits nicely after 'capacity' discussion. -RD
+
+The free-route topology can add a new batch at every hop,
+increasing the anonymity everybody gets. maybe.
+
+[like S-G mixes, the sender chooses to delay at certain nodes, which
+improves anonymity. no reason why we couldn't too, really.]
+
+Possible extension: for the purpose of discussion, let's use
+$t_\mathrm{batch}$ = 3 hours and $t_\mathrm{hop}$ = 1 hour / n. Then
+the latency is between
+1 and 4 hours, and the anonymity set is the set of messages sent in a
+3 hour period.
+
+We can improve this by having a "delay until next batch" option in
+the message subheader; that would delay the message by $t_\mathrm{batch}$ and
+re-introduce it at the same point in the path. If each message is either
+delayed once or not delayed, that gives us a latency of 1 to 4 hours for
+non-delayed messages, 4 to 7 hours for delayed messages, and a 6-hour
+anonymity set (unless the attacker knows that someone never sends or
+receives delayed messages, in which case the anonymity set for those
+users is still 3 hours).
+
+Can anyone attack the "delay until next batch" option? I don't think
+the fact that those messages can be distinguished by the node is any
+more harmful than Mixminion's "swap" operation, etc. [It seems a bit
+more harmful: in Mixminion, all messages have a swap point somewhere in
+the path. In this scheme, messages that ask to be delayed are inherently
+more interesting. -RD]
+
+Possible extension: Note that synchronous batching does not necessarily
+mean a "one latency fits all" approach. People who want a larger anonymity
+set can agree to send their messages only at intervals of $k*t_\mathrm{batch}$,
+so those batches will have more messages (but at the expense of the
+other batches).
+
+\subsection{Robustness of Message Delivery}
+
 [Would a graph or three help illustrate these robustness points? -PS]
+[you won't get robustness graphs at this point. but you will get to
+start with a graph with 16x16 and 1x16 showing great anonymity, to
+motivate your discussion. perhaps you want to add a few robustness
+tables, to show off the numbers? make sure you're comparing similar
+situations between topologies, so the comparisons mean something. -RD]
 
 It might seem from Fig.~\ref{fig:?} that the best anonymity is
 achieved with a 16x16 free-route network. There is almost no falloff
-in entropy until nearly ninety percent of the nodes are compromised.
+in entropy until each node has a ninety percent chance of being
+compromised.
 But this ignores robustness of message delivery. (Robustness of
 anonymity is discussed below.) With only a single node failure, for
 randomly chosen routes through this mixnet, nearly two thirds of
@@ -520,8 +638,8 @@
 block all of the messages.
 
 In a 4x4 stratified topology, a single failed node also stops a
-quarter of the messages.  Given a balanced distribution across all
-layers of the array, a second failed node will always affect more
+quarter of the messages.  Given a balanced message distribution across all
+layers of the network, a second failed node will always affect more
 messages. But, there is only a ten percent chance that two node
 failures chosen at random will block half the messages. And, with a
 quarter of the nodes gone there is only a $5.5 \times 10^{-4}$
@@ -529,10 +647,14 @@
 messages.
 
 Of the scenarios we have considered, a 16x4 free route is the most
-robust.  For randomly chosen routes, a single failed node can be
+robust to message loss.  For randomly chosen routes, a single failed node can be
 expected to block delivery of only 6.7 percent of the messages.  Four
 failed nodes can be expected to block delivery on only a third of the
-messages. At least as significant is that this is the expected
+messages.
+% Comparing apples to oranges here? You say 1/3 here, and 10^-4 above.
+% The small chance sounds more impressive, but it's of a different
+% thing? -RD
+At least as significant is that this is the expected
 fraction of messages blocked regardless of which four nodes fail.
 
 \subsection{Robustness of Anonymity}
@@ -541,10 +663,11 @@
 as these can take on such a variety of forms. In the simplest case
 though, we can consider the effect on anonymity of simple node
 crash, since this is the most straightforward way to actively shrink
-anonymity. Also, as discussed in Section~\ref{?}, there are techniques
+anonymity. Also, as discussed in Section~\ref{subsec:blending}, there
+are techniques
 to detect and/or punish more selective attacks.
 
-The threat model we consider in this section is a varient of the one
+The threat model we consider in this section is a variant of the one
 we have been using all along. Senders of messages input to the mixnet
 and receivers of messages output from the mixnet can be observed. But
 hostile nodes may now also crash, failing to deliver any of their
@@ -562,11 +685,11 @@
 visible to the adversary.  A combination of active attacks and
 observations (including some internal observations) should prove the
 most devastating to anonymity. However, we leave examination of this
-this for future work.
+for future work.
 
 Anonymity of cascades are unaffected by this threat model.  Since each
-cascade batch is independent of another, any node that crashes will wipe
-out all the messages in the anonymity set.  Anonymity robustness of
+cascade batch is independent of the others, any node that crashes will wipe
+out all the messages in that anonymity set.  Anonymity robustness of
 stratified and free-route topologies is more complex.
 
 For a stratified network, if any entry node fails, the number of
@@ -574,7 +697,7 @@
 If two entry nodes fail, the entropy drops by 1. If 3 entry nodes
 fail, entropy drops by 2. If all fail, entropy remains the same as if
 none fail.  If a second layer node fails, assuming a balanced
-layer-two distribution, anonymity of all messages are unaffected since
+layer-two distribution, anonymity of all messages is unaffected since
 there is no change to the probability that an exiting message was any
 incoming message. Note this is so even if the distribution of messages
 across entry nodes is highly skewed. If the layer-two distribution is
@@ -594,7 +717,7 @@
 may be gained. Add to this that, as the ratio of input messages to
 width of a layer shrinks, it becomes more likely that nodes at a given
 layer will only receive messages from a subset of nodes at the
-previous layer or, in any case, that the layer distribution will
+previous layer or, in any case, that the layer distribution will be
 unbalanced between nodes to a significant degree. On the other hand,
 because nodes are recycled for use at multiple layers, it is much
 harder to plan an attack.  Crashing an entry node to reduce the
@@ -626,127 +749,12 @@
 topologies investigated appears to be the free route, and worst is the
 cascade.
 
-Change terminology: 
-
-[Should note somewhere that stratified is like classic systolic arrays
-in that a layer can process a new batch as soon as it has sent the
-last with no additional overhead. That's not true of free routes.]
-
-
-
------------------------------\\
-Below was previously in robustness subsec.
+\subsection{Robustness in async vs sync}
 
-Two issues: one is robustness with respect to a single message
-going through. How likely is it to arrive on time?
 But there's a deeper issue: in asynchronous-batching designs, a
 late message still arrives. in this system, a late message is lost.
 this is not very convenient for the user.
 
-\subsection{Mixing with previous and future batches}
-The free-route topology can add a new batch at every hop,
-increasing the anonymity everybody gets. maybe.
-
-\subsection{S-G mixes}
-
-[the sender chooses to delay at certain nodes, which improves anonymity.
-no reason why we couldn't too, really.]
-
-Possible extension: for the purpose of discussion, let's use
-$t_\mathrm{batch}$ = 3 hours and $t_\mathrm{hop}$ = 1 hour / n. Then
-the latency is between
-1 and 4 hours, and the anonymity set is the set of messages sent in a
-3 hour period.
-
-We can improve this by having a "delay until next batch" option in
-the message subheader; that would delay the message by $t_\mathrm{batch}$ and
-re-introduce it at the same point in the path. If each message is either
-delayed once or not delayed, that gives us a latency of 1 to 4 hours for
-non-delayed messages, 4 to 7 hours for delayed messages, and a 6-hour
-anonymity set (unless the attacker knows that someone never sends or
-receives delayed messages, in which case the anonymity set for those
-users is still 3 hours).
-
-Can anyone attack the "delay until next batch" option? I don't think
-the fact that those messages can be distinguished by the node is any
-more harmful than Mixminion's "swap" operation, etc. [It seems a bit
-more harmful: in Mixminion, all messages have a swap point somewhere in
-the path. In this scheme, messages that ask to be delayed are inherently
-more interesting. -RD]
-
-A generalisation of this is a "source-timed" network: use the timing model
-from \cite{mix-acc} (with or without receipts), where $t_\mathrm{hop}$
-corresponds
-to one time unit, but in each subheader specify "drop if not received in
-period t1" and "send in period t2".
-
-Even more general would be "drop if not received between
-$t1_\mathrm{start}$ and $t1_\mathrm{end}$" and "send between
-$t2_\mathrm{start}$ and $t2_\mathrm{end}$, chosen uniformly at
-random".  This can support almost all batching designs.
-
-
-\subsection{Not enough input messages}
-does number of messages influence things? that is, does too few messages
-screw things up more for one topology than for another?
-
-More specifically, show that the influence from the percent of corrupt
-nodes is the critical factor on entropy, and that the influence from the
-random path choices of the other senders doesn't matter much on entropy.
-
-For large batch sizes like n=128, I think we all believe it. Paul and I
-just want to find a principled way of arguing it, so the paper doesn't
-leave the issue open.
-
-And for small batch sizes like n=2, I think it's clear that the other
-path influences Alice's entropy a whole lot.
-
-So where is the break-even point -- the point where the influence
-from corrupt nodes matches the influence from path choices? And is our
-intuition correct that as you move away from the break-even point the
-influence from the other paths goes quickly to 0?
-
-The reason why it matters is because there's a significant camp in the
-anonymity community that believes cascades are the way to go. This paper
-disputes that idea, so they're going to be suspicious of it. And the way
-we're comparing SA's to cascades neglects the issue of entropy loss by
-bad luck in the paths chosen by the other senders. In effect, they might
-argue we're comparing apples to oranges and have no result at all.
-
-I just reread George's paper, particularly section 5.1. I think he has
-the beginnings of a solution there.
-http://freehaven.net/anonbib/\#danezis:pet2003
-
-\subsection{Blending and flooding attacks}
-active attacks can mess up the calculations? or is the entropy with a
-baseline network plus hostile messages the same as the entropy of the
-baseline network by itself? that would be neat.
-in any case, there's also the issue that directed floods can fill up
-nodes. this is very bad.
-
-talk about the possible uses of some minimum level of padding on each
-link.
-
-\subsection{Reputations and node preference}
-
-most deployed systems let users choose their preferred entry or
-exit hops, which will mess with our distribution. but this is
-ok because after just a few hops of mixing, the distribution
-of the incoming batch doesn't matter as much anymore.
-
-reputation systems a la mix-acc and casc-rep make people prefer one
-node over another, \emph{throughout} the mix network, not just at entries or
-exits. but we have to tolerate this sort of thing. one approach would be
-to put like-reputationed nodes in the same column of the SA or in the same
-row of the cascade \cite{casc-rep}, so distribution (at each hop or in each
-cascade, respectively) will be more uniform.
-
-\subsection{Comparison to Mixminion}
-
-No need for a replay cache.
-
-But less robust to transient failures.
-
 \section{Extensions and Future Work}
 \label{sec:conclusion}
 
@@ -763,11 +771,8 @@
 routes, will generally be able to achieve stronger and more easily
 analysed security properties than less constrained approaches.
 
-Possible extension: Note that synchronous batching does not necessarily
-mean a "one latency fits all" approach. People who want a larger anonymity
-set can agree to send their messages only at intervals of $k*t_\mathrm{batch}$,
-so those batches will have more messages (but at the expense of the
-other batches).
+Comparison to Mixminion: No need for a replay cache. But less robust to
+transient failures.
 
 %\section*{Acknowledgements}
 

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/