[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: header-swap isn't perfectly indistinguishable (was: problem in 3.2 "Replies")

To: mixminion-dev@freehaven.NET
Subject: Re: header-swap isn't perfectly indistinguishable (was: problem in 3.2 "Replies")
From: David Hopwood <david.hopwood@zetnet.co.uk>
Date: Sun, 12 May 2002 21:47:21 +0000
Delivered-To: archiver@seul.org
Delivered-To: mixminion-dev-outgoing@seul.org
Delivered-To: mixminion-dev@seul.org
Delivery-Date: Sun, 12 May 2002 20:34:48 -0400
References: <Pine.LNX.4.44.0205070824100.14302-100000@blackbird.lcs.mit.edu> <E175CCa-0003Jr-00@imp> <3CD85B28.6D65BC57@zetnet.co.uk> <E175OHl-0003aa-00@imp>
Reply-To: mixminion-dev@freehaven.net
Sender: owner-mixminion-dev@freehaven.net

-----BEGIN PGP SIGNED MESSAGE-----

Zooko wrote:
> David Hopwood wrote:
> > I'm not convinced there is really any problem with forcing all chains to be
> > the same length; the extra hops for forward messages still contribute to
> > anonymity, so they are not wasted.
> [...]
> > Consider any synchronous design (e.g. a pure cascade, or a batch synchronous
> > design [...]
> > Note also that increasing the path length in this case does not have much
> > effect on latency, which is primarily determined by the batch period (and how
> > many times a message is delayed to the next batch). It does have an effect on
> > reliability for forward-only messages, but the reliability is no worse than
> > for replies.
> 
> [...]
> From your comments about latency it appears that you are envisioning a
> batching strategy that enforces a very high added constant latency to each
> message.

Don't pay too much attention to the specific times I gave (perhaps I'm just
more paranoid, and/or thinking of applications where that much latency would
be acceptable).

What matters is the trade-off between latency and anonymity - I'm claiming
that synchronous networks can offer an improved trade-off, because the
latency distribution can be made closer to uniform, and is independent of
the number of nodes on the path. For asynchronous networks with random
deferment, the latency follows approximately a normal distribution, which
is not really what we want.

[Following the Babel paper, I'll use the verb "defer" for when a message
is deliberately deferred to a later batch, and "delay" only when a message
is removed and then reintroduced later by an attacker.]

> I hope that is not necessary, as I wish to use Mixminion for
> applications with latency on the order of minutes, not of days.

If the number of messages going through the network in a few minutes is large
enough to provide a reasonable anonymity set, you can set t_batch to a few
minutes.

For the source-timed variant, it's possible to use the same network with
different latencies for different applications, although that obviously splits
the anonymity sets to some extent.

Let's compare an asynchronous network (using random deferment), with a simple
synchronous batching network (not using deferment, to begin with). Assume that
paths are free routes chosen by the same method.

The latency distribution in the asynchronous network is approximately a normal
distribution (it is the sum of several uniform-random deferment times). In the
synchronous network it is uniform on [\ell*t_hop, t_batch + \ell*t_hop], where
\ell is the number of nodes on a path, and \ell*t_hop < t_batch.

Let this distribution be L(t) = Pr[latency = t]. I think that when users
judge how slow a network is, they will consider the worst case (or the 90th
percentile of L, say) to be a more significant measure than the average case.
Anyway, suppose that parameters are chosen so that users perceive the two
networks to be equally fast or slow.

Then:

 - the asynchronous network will be vulnerable to the attacks in Section 4
   of [disad-free-routes]; the synchronous network will not.

 - for the synchronous network, users know in advance exactly when their
   message will be sent from the last hop, which is a significant practical
   advantage. For the asynchronous network, users never know exactly how
   long their messages will take to get through the network.

 - consider a global passive adversary, Eve, who watches the network inputs
   and outputs. Assume that messages enter the network at a roughly constant
   rate. Eve sees a message M enter at time t_enter. It could leave the
   network at any later time up to the maximum latency - but all latencies
   are not equally likely. If Eve has no other information, then if she sees
   k messages that *could* correspond to M leave the network at times t_1..t_k,
   she will weight the probabilities that they actually correspond to M
   proportionally to L(t_i - t_enter). Intuitively, the "flatter" the
   distribution L is for a given maximum latency, the less information Eve
   gains - which favours the synchronous network.

   (We could make this more precise by considering how much Shannon
   information Eve receives, or some similar metric.)

 - in both cases the bandwidth usage is (network throughput * \ell), and
   the reliability is p^\ell.

   (OTOH, the synchronous network needs the path length \ell to be twice
   as long for forward-only messages, since it has to be the same as for
   anonymised replies. Note that this does not affect latency, only
   bandwidth and reliability. Also, the security benefit of the longer
   path is not lost.)

 - for the asynchronous network, an extra mechanism like "swap", "strip",
   or "two payloads" is needed to provide indistinguishability between
   forward messages and replies. For the synchronous network this comes
   for free: only first-half and last-half messages can be distinguished,
   but the security analysis assumes that anyway, and it is not a problem.

   (SPRP encryption of payloads should be used in either case. For the
   synchronous network, it is not relied on as much, because a tagging
   attack on a forward message will not work on the first half of the
   path, which is long enough to provide anonymity. However, it should be
   used anyway if we want the security benefit of the longer path.)

The above is for a simple synchronous network without deferment. More
generally, we can use a source-timed network, where each hop header specifies
the time that a message must be received from the previous hop, and the time
it must be sent to the next hop.

(Actually it is which messages appear in which mini-batches that matters,
not the precise timing, but it is simpler to label the mini-batches with
timestamps.)

Source-timed deferment allows different users to make different latency/
anonymity trade-offs. In fact *any* latency distribution can be chosen,
not just normal, binomial or uniform distributions. To show that a network
using source-timed deferment is a strict improvement over one with random
deferment, note that it can approximate the latter by having the client
choose deferment times independently for each hop (as the nodes would do
for random deferment), and by setting other parameters the same.
In that case there are only two differences:

1. The network with random deferment is subject to the [disad-free-routes]
   attacks, and the source-timed network isn't. (This is because an attacker
   cannot delay messages in a source-timed network, it can only permanently
   remove them.)

2. A source-timed network cannot use threshold batching strategies (where
   the timing of batch flushes and/or the number of messages that are
   retained in a pool depends on the number of messages received). However,
   that's not really a problem, because it can use the following strategy
   ("interval batching with minimum output set") instead:

     If the number of messages in a batch, k, is less than a bound k_min,
     then the node adds k_min - k dummy messages to the output. These dummies
     follow the remainder of a path for this batch, chosen in the same way
     as a normal message path, except that they are discarded at the last
     node.

   This prevents simple blocking attacks, and cases where too few messages
   are sent without any active attack, but it does not prevent blending
   attacks. However, threshold batching doesn't prevent blending attacks
   either, so nothing has been lost.

   (There is still a blocking attack if the attacker owns all nodes on the
   path after the one that has its input blocked - but as long as at least
   two nodes on the path are honest, this attack fails.)

A source-timed network with deferment is not quite as easy to analyse
as one where batches are completely independent:

 - it's more difficult to make sure that changes to the network meta-
   information appear to take effect instantly. For example, suppose that
   this information consists simply of a global set of reliable nodes that
   are used to construct paths at random. Also suppose that a node N is
   removed from this set starting from batch T. Any messages still going
   through the network that reach N in batches >= T, must have been deferred
   from a previous batch, and so their anonymity set is reduced.

   This could be solved by saying that changes in the network meta-
   information must be signalled long enough in advance that they can take
   effect for all messages in the same period (i.e. one maximum latency
   time in advance). I don't think this would be a significant limitation
   in practice.

 - depending on the chosen latency distribution, the probability of a
   message being deferred at one node of a path might not be independent
   of the probability that it is deferred at other nodes. For instance,
   suppose the strategy is to first choose from a uniform distribution
   how many times the message will be deferred, and then decide at random
   how those deferments are assigned to nodes on the path. In this case
   if a node sees a deferred message, it knows that it has a higher
   probability of being deferred at other nodes (and conversely, if it is
   not deferred at this node, then it has a lower probability of being
   deferred at other nodes).

   I don't think this leads to any significant attack, though.

So, my suggestion is that we should use:

 - source-timed synchronous batching with deferment (and discuss
   further what the latency distribution should be),

 - interval batching with minimum output set, and

 - the simplest option for doing replies, as described in
   <http://archives.seul.org/mixminion/dev/Apr-2002/msg00079.html> and
   <http://archives.seul.org/mixminion/dev/Apr-2002/msg00084.html>
   (what we called "distinguishable" before, although that's not an
   accurate name in this context).

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPN7iuTkCAxeYt5gVAQHlKQgAh67HyU9G2fmnr4vxR1HnJBh2G2Dp2dx7
sBfQTn9vOamPLf5HMNNNm4gAymSN/obkYqlUy3nS9ZBBiKrLsrXx+ztlpax6Rf3w
sh9rqIAc0JV9z66f4lY8HmVSmqGIbmWKvSF323BoqWWgghBcDcMa6DFMLMhqkABT
lzhC5Y4YKUGP0eTHZ7skDYyNyEEcYtVTGi+vbIlgv6fVmTCrsOr69YQPFTkmWpBF
nZb/SR83++8qupCW+y2wQLgojSPzgwlwoaBFJWF/65eeMSZ/TGZ2wfvmjEsx+Sny
utFc0Bmx/4KQnPUUXMPKxuh6D8T2vyj3Mue1VtNZenGh/Uy4cEbzeA==
=NqPh
-----END PGP SIGNATURE-----

Follow-Ups:
- Source timing and replies
  - From: David Hopwood <david.hopwood@zetnet.co.uk>

References:
- re: problem in 3.2 "Replies"
  - From: George Danezis <gd@theory.lcs.mit.edu>
- header-swap isn't perfectly indistinguishable (was: problem in 3.2 "Replies")
  - From: Zooko <zooko@zooko.com>
- Re: header-swap isn't perfectly indistinguishable (was: problem in 3.2 "Replies")
  - From: David Hopwood <david.hopwood@zetnet.co.uk>
- header-swap isn't perfectly indistinguishable (was: problem in 3.2 "Replies")
  - From: Zooko <zooko@zooko.com>

Prev by Date: Re: Minion-design.pdf is submitted
Next by Date: Source timing and replies
Prev by thread: header-swap isn't perfectly indistinguishable (was: problem in 3.2 "Replies")
Next by thread: Source timing and replies
Index(es):
- Date
- Thread