[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[freehaven-dev] Graduated Mirroring




Roger --

Here is a variation on your scheme, to think about:

-- Assume that each server knows what documents he is storing
   pieces of (since I think this is probably the case anyway).

-- Allow each server to store a number of fragments according
   to how important he thinks the document is.

   (Thus, we get rid of all the economics stuff.)

Details:

-- keep the mixnet for communicating with the servers

-- anyone can sign up as a server. no reputation measures needed

-- If I have an important document D that I do not want to see
   suppressed, then I mail D to all of the servers (perhaps with
   a short cover letter explaining why I think it is important
   that is document not be suppressed).  This goes through the
   mixnet to protect the identity of the servers and the identity
   of the document provider.

-- Each server has a manager (a person).  This person decides
   how much storage to allocate to document D, according to the
   manager's own preferences and political leanings etc.

   (I realize that this *manual* process is not present in your
   design, but I think it allows the participants (the managers)
   to donate to causes they think appropriate, and not to donate
   to others, which rather appeals to me.)

-- Suppose that the unit of granularity is a "chunk". Perhaps a
   chunk is 16K bytes (not too big, not too small).  Let F be 
   a field where chunk values are elements of the field (e.g.
   F = GF(2^(16K*8)) ).

-- The information dispersal algorithm treats the document D as
   a series of chunks, and views the document then as a polynomial
   D(x) with coefficients in F.

-- If the manager wants to allocate t+1 chunks to the storage of D,
   he picks a random value x from F, and stores 
            x, D(x), D(x+1), D(x+2), ..., D(x+t)
   (Here all we need are t different field elements computable from
    starting point x; they don't need to be "successive" in any 
    sense.  You could also use t random values of x, but then
    you have to store them.  I suppose they could be short
    (16-bytes each) without risk of collision between different 
    managers.)

   So the manager decides how much "support" to give a given document.
   The support he gives is likely to be independent of the support
   others give the document, as the x values are likely to be
   disjoint.

-- When someone wants to retrieve the document, they broadcast a
   request to all the servers for their pieces.  The reconstruction
   phase is straightforward, assuming that all of the responders
   are honest.  (It is just interpolation, as usual, to get the
   coefficients of D(x); these coefficients *are* the document D.)

   If an adversary tries to mess up the reconstruction by supplying
   garbage, the reconstructor can still succeed using some clever
   decoding algorithms due to Professor Madhu Sudan (MIT EECS).  See
       http://theory.lcs.mit.edu/~madhu/papers.html
   for pointers to decoding Reed-Solomon codes beyond the error-
   correction bound, etc.  These are essentially interpolation algorithms
   that are capable of throwing out "bad data".

   There may be other approaches to handling garbage from an adversary;
   for example some of the "verifiable secret sharing" ideas or related
   work...

-- Servers can at various times modify their level of support for a given
   document D.  Throwing out some information is easy, just reduce t
   above and keep fewer values of D(x) around.  To raise t, the server can
   reconstruct the document again (by asking for it), and then computing
   even more values of D(x).  The server may also reconstruct D and then decide
   to modify t based on the level of observed support for D.

-- When a document no longer has enough support to be reconstructed, it essentially
   disappears.  Servers still holding information about that document may then
   decide to throw away whatever they have about the document.  This replaces your
   notion of an expiration date, and somehow seems more natural...

Comments appreciated.  This approach has some advantage in terms of simplicity;
there are no reputations to measure, no "shepherds" for fragments, etc.  The
only complexity is having to judge what level of support to give to a document, 
but this "feels natural" to me, rather than having my machine supporting terrorists
without my knowledge or consent, say...  I guess this approach is more like
"graduated mirroring" in the sense that you can be a "mirror" for a document
to almost any degree.  It also has mixnet protection for the identity of the
servers...

        Cheers,
	Ron