[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: End to end mixminion rationalized, with compression and K-of-N



On Fri, 2002-10-11 at 12:11, Zooko wrote:
 [...] 
> I didn't realize that MixMinion fragments would mostly be sent along the same 
> path(s). I think you are right that this makes packet-level failures better 
> correlated.  Hm.  But what you want is for *message* level failures to be 
> well-correlated.  If you are going to lose 100 chunks out of 1000 chunks 
> during one time period, you very much want to lose all 100 chunks from a 
> single 100-chunk message, instead of losing one chunk each out of 100 
> different messages!  
> 
> You write below that if the failures are path-level then only K, N, and the 
> number of paths matter.  Does this mean that failures are well correlated 
> within messages?  I don't really understand.

Ah. The document I wrote goes alludes to this: it's based on an idea I
got from Justin Chapweske.  As a toy example, suppose K=2, N=4, and you
have 5 chunks.  You then pick 4 paths.  Along path 1, you send packet 1
from each chunk; along path 2, you send packet 2 from each chunk; and so
on.  

In other words, each path gets one packet from each chunk.

This way, if any two paths succeed, we receive two packets from each
chunk, which is enough to reconstruct all the the chunks, and hence the
whole message:  The actual number of chunks never enters the
calculation.

The situation becomes a little more complicated when the number of paths
is not equal to N, but the basic intuition remains the same.

(Note that I'm assuming that paths fail independently: this requires
that no unreliable server is common to any two paths.  In fact, this is
not the case: the exit node is common to all paths!  Therefore, the
success rate of the exit node is an upper bound for the success rate of
a message.)

 [...]
> > Math time: If failures *are* packet-level, assuming a assuming a modest
> > 90% success rate and an EXF of 1.5 or so, a 20-chunk (8960KB) message
> > will arrive with 99.4% probability.  Success probability only drops
> > below 99% when message size
> 
> What's EXF?

Expansion factor: N/K.

> An interesting thing you might want to look at is to hold the failure rate 
> right around K/M (where K is required packets and M is total packets), and 
> then see how twiddling numchunks changes reliability-of-total-message.

[OT: I read "numchunks" as "nunchucks", and was puzzled about how our
system had been infiltrated by ninja.]

>                                                                         This 
> is what happened to Mojo Nation: reliability of individual packets turned out 
> to be much more variable than we had thought, and on average was closer to K/M 
> than to 0.9.  The chunking "compounded" this problem, effectively killing most 
> of the files most of the time, instead of killing a few of the files all of 
> the time.

Interesting.  Did you get any analysis on the underlying causes of
packet loss?  (e.g., Network failure vs server downtime vs bugs vs ...?)

And does anybody know what 

BTW, here are some possible sources of packet loss/path failure in our
current design, assuming (ha!) a bug-free, fault-tolerant
implementation.

Single-packet loss:
 A- A packet is delayed in mix pools for too long: some server later in 
    sequence has already rotated its key.
 B- A hostile server acknowledges receipt of a packet, and then drops or
    corrupts it.
 C- A hardware failure corrupts a packet while it's stored on disk.

Path failure:
 D- A server goes offline, and does not come up in time to 
    receive/transmit packets.  (Note that we retry offline servers, so
    that short-term transient failures are recoverable.)
 E- A hostile server deliberately goes offline in order to DOS the
    mixnet.
 F- A server's network connection fails, and does not come up in time to
    receive/transmit packets.  

I think that, in practice, D and F will be the most common causes of
failure.

-- 
Nick