[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Batching Strategy and Network Structure



On Mon, 6 May 2002, Roger Dingledine wrote:

> [George: you should read this and give your opinion, even though you're
> busy. :) ]

Ok, I have read and the high level bit of my comments is that it is too 
new to be included as it is in the paper itself. I think some of the 
issues are very important and need to be discussed on the list.

> >\subsection{Batching Strategy and Network Structure}
> >\label{subsec:batching}
> >
> >% This needs more work. I hadn't realised how close we were to the
> >% deadline, but I'll commit it anyway for now. -DH
> >
> >Ideal security for a MIX-net is achieved when each message leaving the
> >network, could correspond to any message entering the network during a
> >period approximately equal to the maximum network latency, and vice-versa.
> 
> An alternate ideal is when each message leaving the network could
> correspond to any message that ever entered the network. How do you
> decide which ideal is better?
> 

I think that anonymity is not about messages themselves but about their 
metadata. Therefore I would say that perfect anonymity is achieved when a 
message received by a principal is equally likely to have been sent by any 
other principal in the universe (maybe we should restrict that to: that 
uses mixminion).

> This is something that needs more research; but I'm not sure we've got
> the space in the mixminion paper, or that the mixminion paper is the
> right place for an in-depth analysis. So I'm tempted to try to scale
> this subsec back to a "discussion and current thoughts" rather than
> "this is our plan" or even "this is our recommendation".
> 

Agree.

> [Also, this topic overlaps in part with another paper I'm writing with
> Andrei Serjantov and Paul Syverson, on working out a taxonomy of various
> batching and pooling strategies and how well each resists trickle/flooding
> attacks. But from the looks of it it doesn't overlap that far.]
> 

I will be interested to see a version of this when it becomes "public".

> >This ideal is not achieved by protocols like Mixmaster that use random
> >delays [[explain why]].
> >If such networks also use free routes, they are subject to attacks
> >described in \cite{disad-free-routes}. [[summarize them]]
> 
> They're also subject to flooding/trickle attacks, where an attacker
> targeting a specific message going into a mix can manipulate the batch
> of messages entering that mix so the only unknown message in the batch
> is the target message.
> 

Some of the attacks in the Paper form Dresden on free routes relies on the 
fact that the maximum route length is small (which we can set to be long 
like 16 hops + 16 hops = 4kb), the actual route length is known by 
intermediate nodes as well as their position in the chain. We have 
engineered mixminion not to leak such information to intermediary nodes 
exactly for these reasons so I need to have a careful look at the paper 
again before to see what attacks are still applicable.

> In fact, I'll paste a couple more paragraphs I wrote yesterday about
> this:
> [large chunk of cool stuff that I will not comment on]

> >Messages entering the network in each batch period are queued until
> >the beginning of the next period. They are then sent through the MIX-net
> >synchronously, at a rate of one hop per {\em hop period}. All paths are
> >a fixed length $n$ hops, so that if no messages are dropped, the messages
> >introduced in a given batch will progress through their routes in
> >lock-step, and will all be transmitted to their final destinations $n$
> >hop periods later.
> 
> This is a very tempting idea. It resolves a lot of the
> diverse-user-behavior problems. On the other hand, it puts a lower
> bound on the performance we're going to get from the system. So we need
> to run it by the remailer folks: Len? What do you think about having
> all messages always take n days to traverse the mixnet? It reduces the
> number of actual users we'll get (possibly by a whole lot), and it may
> kill deployment. Even though we talk about having the best known designs
> against traffic analysis, we have to moderate that with choosing a design
> that people will actually use.
> 
> Hm.
> 

Hm. I quite like mixing strategies that are adaptive to the volume of 
traffic so that if there are enough messages to build good anonymity sets 
you do not need to wait 10 days. Another issue is that it is not only the 
volume that counts but also the destination: a mix should take into 
account how many destinations are the messages going to before it sends 
the batch (in a free route case).

> >The latency is between $nt_{hop}$ and $t_{batch} + nt_{hop}$, depending
> >on when the message was submitted.
> >
> >The set of messages sent to a node in a given hop period is called a
> >mini-batch, to distinguish it from the set of messages being sent
> >through the network as a whole. [[need better name?]]
> >
> >[[Messages are tied to a given time period so that they cannot be delayed.
> >Explain why this prevents the attacks in [disad-free-routes], even
> >for free-route networks.
> 
> I'd be curious to see the explanation for this one. The attacks in
> disad-free-routes are really nasty when there are fixed-length paths
> and the adversary owns most of the nodes.
> 
> > Also explain why we need to use a hybrid
> >free-route/cascade approach (otherwise the anonymity set is limited by
> >the bandwidth that can be handled by a single cascade).]]
> 
> There are lots of tradeoffs in both directions between free-route
> and cascade. Another big one is that link padding at an O(n) scale
> (e.g. cascade) is plausible, but at an O(n^2) scale (eg free-route)
> is nuts.
> 

One can impose a network structure without it being a cascade or a free 
routing network. For example each remailer communicates only to 3 others.
It is easy using graph theory to analyze such a network and find how many 
hops a message needs to be in order not to be correlated any more to any 
input.

> >Define the {\em width}, $w$ of a MIX network using synchronous batching to
> >be the number of nodes that simultaneously process messages in each
> >hop period. (If this is not constant, we can still talk about the
> >maximum, minimum and mean width.)
> >
> >An important constraint on the network structure is the maximum
> >bandwidth (traffic in unit time) through any given node.
> >Assume that sending a message over a single hop consumes a fixed
> >amount of network traffic; we can then use that as the unit for
> >traffic.
> >
> >Let $T_{batch}$ be the expected throughput in a single batch period,
> >i.e. the number of messages that go through the network in a batch.
> >
> >We can give nodes the maximum opportunity to make use of the available
> >bandwidth by setting $t_{hop} \simeq t_{batch}/n$.
> >
> >%If the available nodes were used optimally (this is a big if),
> >%the bandwidth required through each node in this simplified case
> >%will be $\frac{T_{batch}}{wt_{hop}} = \frac{nT_{batch}}{wt_{batch}}$.
> >
> >For any particular network structure, bandwidth constraints, and
> >assumed traffic levels, it is straightforward to calculate the
> >minimum width of the network, and therefore the maximum sizes of
> >mini-batches.
> >
> >For example in the limiting case $w = 1$, the network consists of a
> >single MIX cascade (this is the situation considered in Section ??
> >of \cite{babel}).
> >
> >Let $N$ be the number of nodes in the network. An interesting case
> >is $w = n = N$; in that case we could use a set of $N$ cascades
> >arranged as a Latin square.\footnote{[[define Latin square]]}
> >This ensures that all nodes make full use of the available bandwidth;
> >however it gives users no choice in the set of nodes used.
> >
> >In practice, there are other considerations beside maximizing the
> >size of mini-batches:
> >
> >\begin{enumerate}
> >\item reliability
> 
> I think this is going to be the killer. If we want to impose a structure
> on the messages and MIX-net topology it's got to be based on reliability
> so we can arrange appropriate redundancy and node grouping. It's going
> to be really hard. It would be a nice thing to investigate once we've
> got Mixminion going -- so then we can start trying out new things like
> reputation systems and alternate network topologies.
> 
> >\item giving users a choice of node sets?
> >\item anything else?
> >\end{enumerate}
> >
> >Further analysis of the trade-offs will be needed before choosing a
> >final network structure. Note that a planned structure, where each
> >users' software follows the plan consistently when constructing
> >routes, will generally be able to achieve stronger and more easily
> >analysed security properties than less constrained approaches.
> 
> Agreed.
> 


I cannot stress enough how much I agree with the last statement! I also 
agree with the reverse: I prefer to do things in a simple way that is well 
understood rather than in a complicated way that we do not understand but 
that seems more secure. This has been the case for not relying on 
"dummies" to provide any property.

I am sure we will have lots of fun in the next few week talking about:
- network structure (reliability vs. bandwidth organization, cascade vs. 
free vs. constrained network ...).
- Mixing strategies.
- Analysis of the above.

> --Roger
> 

Yours,

George