[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[freehaven-dev] Graduated Mirroring
Roger --
Here is a variation on your scheme, to think about:
-- Assume that each server knows what documents he is storing
pieces of (since I think this is probably the case anyway).
-- Allow each server to store a number of fragments according
to how important he thinks the document is.
(Thus, we get rid of all the economics stuff.)
Details:
-- keep the mixnet for communicating with the servers
-- anyone can sign up as a server. no reputation measures needed
-- If I have an important document D that I do not want to see
suppressed, then I mail D to all of the servers (perhaps with
a short cover letter explaining why I think it is important
that is document not be suppressed). This goes through the
mixnet to protect the identity of the servers and the identity
of the document provider.
-- Each server has a manager (a person). This person decides
how much storage to allocate to document D, according to the
manager's own preferences and political leanings etc.
(I realize that this *manual* process is not present in your
design, but I think it allows the participants (the managers)
to donate to causes they think appropriate, and not to donate
to others, which rather appeals to me.)
-- Suppose that the unit of granularity is a "chunk". Perhaps a
chunk is 16K bytes (not too big, not too small). Let F be
a field where chunk values are elements of the field (e.g.
F = GF(2^(16K*8)) ).
-- The information dispersal algorithm treats the document D as
a series of chunks, and views the document then as a polynomial
D(x) with coefficients in F.
-- If the manager wants to allocate t+1 chunks to the storage of D,
he picks a random value x from F, and stores
x, D(x), D(x+1), D(x+2), ..., D(x+t)
(Here all we need are t different field elements computable from
starting point x; they don't need to be "successive" in any
sense. You could also use t random values of x, but then
you have to store them. I suppose they could be short
(16-bytes each) without risk of collision between different
managers.)
So the manager decides how much "support" to give a given document.
The support he gives is likely to be independent of the support
others give the document, as the x values are likely to be
disjoint.
-- When someone wants to retrieve the document, they broadcast a
request to all the servers for their pieces. The reconstruction
phase is straightforward, assuming that all of the responders
are honest. (It is just interpolation, as usual, to get the
coefficients of D(x); these coefficients *are* the document D.)
If an adversary tries to mess up the reconstruction by supplying
garbage, the reconstructor can still succeed using some clever
decoding algorithms due to Professor Madhu Sudan (MIT EECS). See
http://theory.lcs.mit.edu/~madhu/papers.html
for pointers to decoding Reed-Solomon codes beyond the error-
correction bound, etc. These are essentially interpolation algorithms
that are capable of throwing out "bad data".
There may be other approaches to handling garbage from an adversary;
for example some of the "verifiable secret sharing" ideas or related
work...
-- Servers can at various times modify their level of support for a given
document D. Throwing out some information is easy, just reduce t
above and keep fewer values of D(x) around. To raise t, the server can
reconstruct the document again (by asking for it), and then computing
even more values of D(x). The server may also reconstruct D and then decide
to modify t based on the level of observed support for D.
-- When a document no longer has enough support to be reconstructed, it essentially
disappears. Servers still holding information about that document may then
decide to throw away whatever they have about the document. This replaces your
notion of an expiration date, and somehow seems more natural...
Comments appreciated. This approach has some advantage in terms of simplicity;
there are no reputations to measure, no "shepherds" for fragments, etc. The
only complexity is having to judge what level of support to give to a document,
but this "feels natural" to me, rather than having my machine supporting terrorists
without my knowledge or consent, say... I guess this approach is more like
"graduated mirroring" in the sense that you can be a "mirror" for a document
to almost any degree. It also has mixnet protection for the identity of the
servers...
Cheers,
Ron