[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Batching Strategy and Network Structure



On Mon, 6 May 2002, Roger Dingledine wrote:

> David Hopwood just committed a very controversial subsec to the paper. I'm
> going to bring the thread over to the -dev list so we can try to hash
> it out farther.

I'll respond quickly now, and perhaps in more depth this evening after
work.

> An alternate ideal is when each message leaving the network could
> correspond to any message that ever entered the network. How do you
> decide which ideal is better?
>
> This is something that needs more research; but I'm not sure we've got
> the space in the mixminion paper, or that the mixminion paper is the
> right place for an in-depth analysis. So I'm tempted to try to scale
> this subsec back to a "discussion and current thoughts" rather than
> "this is our plan" or even "this is our recommendation".

I think that this is something that definitely needs more discussion. I
would hold off on including it in the Mixminion paper initially, since
we haven't discussed it much.

Also, "ideal" for anonymity may be infeasible if it makes the system too
inefficient for its users. If people have to wait more than a few hours
for their messages to clear the system, they will most likely not use the
remailer network.

> [Also, this topic overlaps in part with another paper I'm writing with
> Andrei Serjantov and Paul Syverson, on working out a taxonomy of various
> batching and pooling strategies and how well each resists trickle/flooding
> attacks. But from the looks of it it doesn't overlap that far.]

Well, that seems to be the better place for this.

> >This ideal is not achieved by protocols like Mixmaster that use random
> >delays [[explain why]].
> >If such networks also use free routes, they are subject to attacks
> >described in \cite{disad-free-routes}. [[summarize them]]
>
> They're also subject to flooding/trickle attacks, where an attacker
> targeting a specific message going into a mix can manipulate the batch
> of messages entering that mix so the only unknown message in the batch
> is the target message.

Well, Mixmaster doesn't simply use random delays. The entire mixnet isn't
synchronized, but for each message entering a node, one message will exit.
Within Mixmaster there is a pool of minimum size POOLSIZE. Mixmaster checks
the pool to see if it is larger than POOLSIZE every SENDPOOLTIME time
interval. If it is, n minus POOLSIZE messages are chosen at random to be
sent.

The exception to that occurs if the node is being flooded.

Mixmaster will send n minus POOLSIZE messages, unless unless n minus
POOLSIZE is greater than RATE percent of the pool. If so, RATE percent of
the pool is sent instead, and then the calculation is done again after
SENDPOOLTIME.

Does this make sense?

> In fact, I'll paste a couple more paragraphs I wrote yesterday about
> this:
>
> |The tools that we give honest MIXes to protect against these blending
> |attacks are precisely the tools that dishonest MIXes use to launch
> |them. \emph{Uncertainty} comes from not letting observers know when
> |a given message might leave the MIX; \emph{inexactness} comes from
> |not letting observers know how many messages are currently inside the
> |MIX. This flexibility allows a MIX to prevent the adversary from learning
> |the details of message flow, but at the same time it allows a compromised
> |MIX to manipulate message flow, for example by delaying a message until
> |it's ready to launch a trickle attack on the downstream hop.
> |
> |Many basic MIX verification schemes, such as that proposed in
> |\cite{chaum81untraceable} where senders observe the batches exiting
> |a MIX to confirm it processed their messages, require schemes where
> |the messages come out with the next MIX flush. More complex
> |robustness schemes \cite{flash-mix} enforce timely message processing
> |if the adversary does not own a threshold of participating MIXes. The
> |receipt and witness scheme of \cite{mix-acc} works because a MIX loses
> |reputation if it hasn't obtained a receipt from the next hop within
> |the allowed timeframe. All of these systems are designed to ensure that
> |messages will leave each MIX predictably.
> |
> |In some topologies, such as free-route MIX-nets, a widespread delaying
> |attack is difficult to perform because the adversary must control all
> |MIXes that are sending messages to the target MIX. In this case we may
> |gain more by allowing MIXes to protect against blending attacks than we
> |lose by permitting compromised nodes to manipulate messages. At the other
> |extreme, in a cascade topology, a single compromised MIX has absolute
> |control over the downstream message batch. In this case verification of
> |correct processing is critical, to prevent wholescale (unobservable!)
> |anonymity compromise. Methods to detect an attack afterwards, e.g. by
> |checking for successful delivery of test messages at the end of the
> |cascade \cite{casc-rep}, can help detect misbehaving MIXes --- but they
> |still allow the adversary to uncover a few messages before he's noticed.
> |As we design new protocols and batching approaches, we must consider
> |this tradeoff between protection from blending attacks and protection
> |from verifiability.
>
> The work on evaluating resistance to the $n-1$ attack came in part out of
> the work done for pet2002 on an information-theoretic model for anonymity:
> Andrei and George had a paper there showing that pool mixes provide
> 'more' anonymity than threshold mixes (under some plausible assumptions).
>
> So I think there's a tension between hiding message flow (with pools)
> and ensure predictable message flow. I haven't made a decision yet which
> way I stand.
>
> >To both maximize anonymity sets and prevent the above attacks, we
> >propose to use a strategy called {\em synchronous batching}.
> >The network has a fixed {\em batch period}, $t_{batch}$, which is closely
> >related to the maximum desired latency; a typical value could be 24 hours.
> >Messages entering the network in each batch period are queued until
> >the beginning of the next period. They are then sent through the MIX-net
> >synchronously, at a rate of one hop per {\em hop period}. All paths are
> >a fixed length $n$ hops, so that if no messages are dropped, the messages
> >introduced in a given batch will progress through their routes in
> >lock-step, and will all be transmitted to their final destinations $n$
> >hop periods later.

This is somewhat similar to a batching proposal that Bram had mentioned to
me, I believe.

> This is a very tempting idea. It resolves a lot of the
> diverse-user-behavior problems. On the other hand, it puts a lower
> bound on the performance we're going to get from the system. So we need
> to run it by the remailer folks: Len? What do you think about having
> all messages always take n days to traverse the mixnet? It reduces the
> number of actual users we'll get (possibly by a whole lot), and it may
> kill deployment. Even though we talk about having the best known designs
> against traffic analysis, we have to moderate that with choosing a design
> that people will actually use.

If messages can't reliably make it through the remailer network in 1 to 4
hours, I think we will not have any users who aren't on this mailing list.

Certainly, if a message takes n days to traverse the mixnet, we will have
a pitiful anonymity set.


--Len.