[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

efficiency and tag-prevention (was: Forward and reply messages)




Welcome back George!

 George wrote:
>
> I agree with both Zooko and David that the "swap" techniques are no silver 
> bullet. Indeed under attack the anonymity set is reduced (by two? not 
> sure) and therefore, in theory we would need double sized paths. 
> 
> Note however that this penalty only happens when the network is under 
> attack using a VERY low bandwidth, VERY high probability of not succeeding, 
> VERY difficult to coordinate attack. In fact I have to confess that in 
> real terms the tagging attack is soo difficult to perform, that while I 
> agree we should offer options to protect against it, it is in no 
> real threat model a realistic threat. I think that requiring all messages 
> to be double the size or all the messages to be distinguishable as forward 
> or reply is a bit of an over reaction, since it imposes a penalty all the 
> time.

Hm, so are you proposing that in a "two-legged" technique such as the 
two-headers-swap technique, the tagging attack against it to be so expensive 
and unlikely that our effective safety is close to the length of our complete 
path instead of close to the length of one leg?  Then we could use paths of 
the same length, or possible only *slightly* longer, while considering 
ourselves about as safe.

That seems like a reasonable position to take.  Is that what you are thinking?


Now I'd like to think more precisely about the penalties imposed by the 
alternative techniques.

I'm going to scribble some rough numbers about efficiency of the various 
techniques.  For now, I'm assuming that a two-legged-path technique like two-
headers-swap will have twice as many hops per message as a one-legged 
technique like two-payloads or plain-old-distinguishable.

In the following, I'll use `K' as the number of hops in a one-legged technique 
(e.g. I think that a typical value for K is 4), and `2K' as the number of hops 
in a two-legged technique.

The relative efficiency depends on the size of the actual message.  I'm going 
to look at the first four cases one at a time then do the general table.  You 
can skip down to the end, but please to read the paragraph about retrying in 
the second case below!


case 1:  If the size of the actual message is <= 1/2 of the payload capacity 
`C' (e.g. a typical value for payload capacity 30 KB, so this case is if the 
size of the actual message is <= 15 KB):

Plain-old-distinguishable uses CK underlying bandwidth.  It has latency 
proportional to K and reliability proportional to P^K for some probability P.

Two-payloads uses CK underlying bandwidth.  It has latency proportional to 
K and reliability proportional to P^K.  (It is identical in all respects to 
plain-old for this case.)

Two-legged uses 2CK underlying bandwidth.  It has latency proportional to 2K 
and reliability proportional to P^2K.

case 2:  If the size of the actual message is > 1/2 C, and <= C:

Plain-old-distinguishable uses CK underlying bandwidth.  It has latency 
proportional to K and reliability proportional to P^K.

Two-payloads uses 2CK underlying bandwidth (because it must fragment the 
message into two payloads and send each separately).  It has latency 
proportional to K (they can be sent in parallel) and reliability proportional 
to P^2K (each one might fail -- each one has reliability P^K).

Two-legged uses 2CK underlying bandwidth.  It has latency proportional to 2K 
and reliability proportional to P^2K.

Note that there is an important difference between the two-payloads and the 
two-legged in this case, but this difference isn't reflected in these numbers.  
The difference has to do with retrying: if the message fails in the two-legged 
technique, then the message has to be retried across the whole path, incurring 
the same reliability risk and incurring the same bandwidth usage.  In the two-
payloads case, if one of the packets fails and the other succeeds, then only 
the lost one need be retried, thus incurring less risk of failure, less 
underlying bandwidth usage, and less (!!!???) risk of anonymity-piercing.

That last point is where I get confused and scared -- what is the effect on 
anonymity of retrying, fragmenting and so forth?

case 3:  If the size of the actual message is > C and <= 3/2 C:

Plain-old-distinguishable uses 2CK underlying bandwidth.  It has latency 
proportional to K and reliability proportional to P^2K.

Two-payloads uses 3CK underlying bandwidth (3 fragments).  It has latency 
proportional to K and reliability proportional to P^3K.

Two-legged uses 4CK underlying bandwidth (2 fragments, each one travels 
2K hops).  It has latency proportional to 2K and reliability proportional 
to P^4K.

case 4:  If the size of the actual message is > 3/2 C and <= 2C:

Plain-old-distinguishable uses 2CK underlying bandwidth.  It has latency 
proportional to K and reliability proportional to P^2K.

Two-payloads uses 4CK underlying bandwidth (4 fragments).  It has latency 
proportional to K and reliability proportional to P^4K.

Two-legged uses 4CK underlying bandwidth (2 fragments, each one travels 
2K hops).  It has latency proportional to 2K and reliability proportional to 
P^4K.


Okay here's the general table for a message that requires N payloads, that is 
messagesize > (N-1)C and <= NC.  The two-payloads technique has two measures 
-- for when the last part of the message can fit into one half-payload and for 
when it can't, that is: (messagesize-1%C)<(C/2) or not.

plain-old
 bandwidth           NCK
 reliability       ~P^NK
 latency              ~K

two-payloads
 bandwidth      (2N-1)CK  or    2NCK
 reliability  ~P^(2N-1)K  or  ~P^2NK
 latency              ~K

two-legged
 bandwidth          2NCK
 reliability      ~P^2NK
 latency             ~2K

I'm surprised to see that two-payloads is actually more efficient than two-
legged in half of the cases.

Note that of the three measures here one of them (reliability) is non-linear 
in N.

Here are some numbers plugged in for reliability.  (The script that outputs 
this table is at `http://zooko.com/calcreliability.py'.)  The capacity C is 
fixed at 30 KB for these numbers, and the path length K is fixed at 4 (meaning 
8 for two-legged technique).

The result of this table is the reliability (chance of success) for plain-old, 
two-payloads, and two-legged techniques respectively, for given P and S.

   S = size of message in KB
   |       P = probability of hop success ->
   |       0.5            0.6            0.7            0.8            0.9           
   v       ---            ---            ---            ---            ---           
  15   0.1/0.1/0.0    0.1/0.1/0.0    0.2/0.2/0.1    0.4/0.4/0.2    0.7/0.7/0.4   
  30   0.1/0.0/0.0    0.1/0.0/0.0    0.2/0.1/0.1    0.4/0.2/0.2    0.7/0.4/0.4   
  45   0.0/0.0/0.0    0.0/0.0/0.0    0.1/0.0/0.0    0.2/0.1/0.0    0.4/0.3/0.2   
  60   0.0/0.0/0.0    0.0/0.0/0.0    0.1/0.0/0.0    0.2/0.0/0.0    0.4/0.2/0.2   
  75   0.0/0.0/0.0    0.0/0.0/0.0    0.0/0.0/0.0    0.1/0.0/0.0    0.3/0.1/0.1   
  90   0.0/0.0/0.0    0.0/0.0/0.0    0.0/0.0/0.0    0.1/0.0/0.0    0.3/0.1/0.1   
 105   0.0/0.0/0.0    0.0/0.0/0.0    0.0/0.0/0.0    0.0/0.0/0.0    0.2/0.1/0.0   

Now, if, as we discussed at the beginning of this message, we decide that a 
two-legged technique with a number of hops < 2K can be considered safe enough 
compared to a two-payloads technique with a number of hops K, then the two-
legged technique is no longer as expensive.

Even in that case, the importance of shorter paths on latency and reliability 
is worth keeping in mind.  What it actually works out to quantitatively 
depends on our retrying and fragmentation policy, which raises the important 
question of how retrying and fragmentation interact with anonymity.


Regards,

Zooko

Zooko.Com -- Security and Distributed Systems Engineering