[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
efficiency and tag-prevention (was: Forward and reply messages)
Welcome back George!
George wrote:
>
> I agree with both Zooko and David that the "swap" techniques are no silver
> bullet. Indeed under attack the anonymity set is reduced (by two? not
> sure) and therefore, in theory we would need double sized paths.
>
> Note however that this penalty only happens when the network is under
> attack using a VERY low bandwidth, VERY high probability of not succeeding,
> VERY difficult to coordinate attack. In fact I have to confess that in
> real terms the tagging attack is soo difficult to perform, that while I
> agree we should offer options to protect against it, it is in no
> real threat model a realistic threat. I think that requiring all messages
> to be double the size or all the messages to be distinguishable as forward
> or reply is a bit of an over reaction, since it imposes a penalty all the
> time.
Hm, so are you proposing that in a "two-legged" technique such as the
two-headers-swap technique, the tagging attack against it to be so expensive
and unlikely that our effective safety is close to the length of our complete
path instead of close to the length of one leg? Then we could use paths of
the same length, or possible only *slightly* longer, while considering
ourselves about as safe.
That seems like a reasonable position to take. Is that what you are thinking?
Now I'd like to think more precisely about the penalties imposed by the
alternative techniques.
I'm going to scribble some rough numbers about efficiency of the various
techniques. For now, I'm assuming that a two-legged-path technique like two-
headers-swap will have twice as many hops per message as a one-legged
technique like two-payloads or plain-old-distinguishable.
In the following, I'll use `K' as the number of hops in a one-legged technique
(e.g. I think that a typical value for K is 4), and `2K' as the number of hops
in a two-legged technique.
The relative efficiency depends on the size of the actual message. I'm going
to look at the first four cases one at a time then do the general table. You
can skip down to the end, but please to read the paragraph about retrying in
the second case below!
case 1: If the size of the actual message is <= 1/2 of the payload capacity
`C' (e.g. a typical value for payload capacity 30 KB, so this case is if the
size of the actual message is <= 15 KB):
Plain-old-distinguishable uses CK underlying bandwidth. It has latency
proportional to K and reliability proportional to P^K for some probability P.
Two-payloads uses CK underlying bandwidth. It has latency proportional to
K and reliability proportional to P^K. (It is identical in all respects to
plain-old for this case.)
Two-legged uses 2CK underlying bandwidth. It has latency proportional to 2K
and reliability proportional to P^2K.
case 2: If the size of the actual message is > 1/2 C, and <= C:
Plain-old-distinguishable uses CK underlying bandwidth. It has latency
proportional to K and reliability proportional to P^K.
Two-payloads uses 2CK underlying bandwidth (because it must fragment the
message into two payloads and send each separately). It has latency
proportional to K (they can be sent in parallel) and reliability proportional
to P^2K (each one might fail -- each one has reliability P^K).
Two-legged uses 2CK underlying bandwidth. It has latency proportional to 2K
and reliability proportional to P^2K.
Note that there is an important difference between the two-payloads and the
two-legged in this case, but this difference isn't reflected in these numbers.
The difference has to do with retrying: if the message fails in the two-legged
technique, then the message has to be retried across the whole path, incurring
the same reliability risk and incurring the same bandwidth usage. In the two-
payloads case, if one of the packets fails and the other succeeds, then only
the lost one need be retried, thus incurring less risk of failure, less
underlying bandwidth usage, and less (!!!???) risk of anonymity-piercing.
That last point is where I get confused and scared -- what is the effect on
anonymity of retrying, fragmenting and so forth?
case 3: If the size of the actual message is > C and <= 3/2 C:
Plain-old-distinguishable uses 2CK underlying bandwidth. It has latency
proportional to K and reliability proportional to P^2K.
Two-payloads uses 3CK underlying bandwidth (3 fragments). It has latency
proportional to K and reliability proportional to P^3K.
Two-legged uses 4CK underlying bandwidth (2 fragments, each one travels
2K hops). It has latency proportional to 2K and reliability proportional
to P^4K.
case 4: If the size of the actual message is > 3/2 C and <= 2C:
Plain-old-distinguishable uses 2CK underlying bandwidth. It has latency
proportional to K and reliability proportional to P^2K.
Two-payloads uses 4CK underlying bandwidth (4 fragments). It has latency
proportional to K and reliability proportional to P^4K.
Two-legged uses 4CK underlying bandwidth (2 fragments, each one travels
2K hops). It has latency proportional to 2K and reliability proportional to
P^4K.
Okay here's the general table for a message that requires N payloads, that is
messagesize > (N-1)C and <= NC. The two-payloads technique has two measures
-- for when the last part of the message can fit into one half-payload and for
when it can't, that is: (messagesize-1%C)<(C/2) or not.
plain-old
bandwidth NCK
reliability ~P^NK
latency ~K
two-payloads
bandwidth (2N-1)CK or 2NCK
reliability ~P^(2N-1)K or ~P^2NK
latency ~K
two-legged
bandwidth 2NCK
reliability ~P^2NK
latency ~2K
I'm surprised to see that two-payloads is actually more efficient than two-
legged in half of the cases.
Note that of the three measures here one of them (reliability) is non-linear
in N.
Here are some numbers plugged in for reliability. (The script that outputs
this table is at `http://zooko.com/calcreliability.py'.) The capacity C is
fixed at 30 KB for these numbers, and the path length K is fixed at 4 (meaning
8 for two-legged technique).
The result of this table is the reliability (chance of success) for plain-old,
two-payloads, and two-legged techniques respectively, for given P and S.
S = size of message in KB
| P = probability of hop success ->
| 0.5 0.6 0.7 0.8 0.9
v --- --- --- --- ---
15 0.1/0.1/0.0 0.1/0.1/0.0 0.2/0.2/0.1 0.4/0.4/0.2 0.7/0.7/0.4
30 0.1/0.0/0.0 0.1/0.0/0.0 0.2/0.1/0.1 0.4/0.2/0.2 0.7/0.4/0.4
45 0.0/0.0/0.0 0.0/0.0/0.0 0.1/0.0/0.0 0.2/0.1/0.0 0.4/0.3/0.2
60 0.0/0.0/0.0 0.0/0.0/0.0 0.1/0.0/0.0 0.2/0.0/0.0 0.4/0.2/0.2
75 0.0/0.0/0.0 0.0/0.0/0.0 0.0/0.0/0.0 0.1/0.0/0.0 0.3/0.1/0.1
90 0.0/0.0/0.0 0.0/0.0/0.0 0.0/0.0/0.0 0.1/0.0/0.0 0.3/0.1/0.1
105 0.0/0.0/0.0 0.0/0.0/0.0 0.0/0.0/0.0 0.0/0.0/0.0 0.2/0.1/0.0
Now, if, as we discussed at the beginning of this message, we decide that a
two-legged technique with a number of hops < 2K can be considered safe enough
compared to a two-payloads technique with a number of hops K, then the two-
legged technique is no longer as expensive.
Even in that case, the importance of shorter paths on latency and reliability
is worth keeping in mind. What it actually works out to quantitatively
depends on our retrying and fragmentation policy, which raises the important
question of how retrying and fragmentation interact with anonymity.
Regards,
Zooko
Zooko.Com -- Security and Distributed Systems Engineering