[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[freehaven-dev] Algorithms in Freehaven
Hey Paul, Roger
This note outlines what I see as the current and possible algorithm and
protocol "problems" in the free haven project. Please note that this
is a partial list. I've cc'd it to the freehaven list in the hopes that
people will yell at me if I forgot anything.
* Comparison, specification, and implementation of information dispersal
algorithms.
- Information dispersal is a solved problem, in the sense that
it's possible in polynomial time. At the same time, we might
like to know the following :
- What is the most computationally efficient information
dispersal algorithm?
- What are the space requirements for a share, compared
to the original file? How do these change depending on
the number of shares needed to reconstruct?
- What happens if we receive fewer shares than the
threshold? is partial reconstruction possible?
- What is the exact patent status of all these
algorithms?
- Currently we have mentioned 3 algorithms :
1) M.O. Rabin's IDA algorithm
- the "classic"
- requires O(n^3) to reconstruct
- used in the Intermemory project
for a similar application to ours
2) Madhu Sudan's List-Decode
- copes well with partial reconstruction
- produces a "list" of possible
reconstructions
3) Michael Mitzenmacher's Tornado Codes
- O(n) reconstruction time.
Adam Smith pointed out that we should be able to adapt any
erasure-correcting code for information dispersal.
- After we pick an algorithm, we need to find or build an
implementation. It would be preferable to find one, and
fortunately these seem to be implemented in other places :
- the INDIA project implemented Rabin's IDA
http://www.eecs.harvard.edu/~india
- a friend of mine implemented List-Decode last term
- Mitzenmacher probably has a library we could use
for Tornado codes.
- Another question : can we give some servers "bigger" shares
than others, based on efficiency or trust requirements?
* Specification of cryptosystems.
- In our current model, there are two places we use crypto
1) to format our messages properly for our anonymous
channel,
and 2) to allow each servnet node to sign its own messages,
and possibly encrypt messages before handing them to
the anonymous channel.
- We do not control 1). At least not now, and we are not likely
to add control anytime soon.
- We _do_ control 2), so we can pick the crypto which makes our
lives easier.
- What makes our lives easier? Well, it should satisfy these
requirements :
* the primitives involved should be secure
- for example, we might specify that
we want the AES winner (RC6 ?) for symmetric
crypto, and RSA with large enough key size
as our public-key primitive.
- or we might specify we want 3DES and elliptic
curves.
- this also includes specifying the correct
padding schemes and so on.
* short key size, since we're storing lots of keys
- this is in tension with previous requirement
* computationally efficient
- This may include a "just in time"
requirement : we want a cryptosystem and
signature system which allows us to do lots of
preprocessing and so allow for when we want to
send bursts of messages.
* already implemented, such as part of OpenSSL or
other widely available crypto library
* unpatented, or patented with free license available
- We may need to consider other requirements.
For instance, for our public-key encryption algorithm,
does it matter if you can tell from the ciphertext which public
key was used to encrypt? does that hurt anonymity anywhere?
Do we need to provide for a way to "upgrade" cryptosystems?
Other "nice things to have" may come up as time goes on.
* Efficient Broadcasts and pinning down requirements for a freehaven
broadcast model.
Currently, the ability to broadcast a message to all nodes in
the servnet is an integral part of the freehaven protocol.
This seems unlikely to change. So how, exactly, do we do this?
I sent a previous message to the freehaven-dev list in which
I outlined what I thought some requirements and a model for
broadcast *might* look like. You'll want to look at it, but
to summarise :
Every node knows about every other node, and can reach them
by sending a message through the anonymous channel. So you
can think of the network as this fully connected graph.
The nodes are servnet nodes, and the edges are anonymous
channels.
The thing is that this is actually misleading :
it is possible that every incoming message to a node, no
matter who it's from, is travelling the same "edge" - the
same route in an anonymous channel. Sometimes this edge
goes down -- but then the node can still send messages out.
How should we deal with this?
At this point we are implementing the protocol in which
if a single node X wants to broadcast, X sends a single
message to every other node in the servnet.
So what needs to be done here is becoming familiar with
some literature and applying it to this problem. I've started,
thanks to pointers provided by Adam and the distributed systems
ppl here...but there's possibly more work to be done. It is
unlikely that this will be a priority until after we have
working code, though!
* Codifying the principle of "acceptable losses".
Roger has stated that he wants a protocol which can tolerate
the losses caused by an adversary. That is, even if an adversary can
find and destroy lots of shares, the chances will be really good
that the file being attacked will stay in the freehaven.
This is a neat principle, because it allows for greater efficiency.
No need to do expensive verification protocols or byzantine agreement or
anything like that.
Can we create a definition of "success" for the freehaven which captures
this principle? can we create a model of adversaries we can tolerate?
Can we take the intuitive notions we've been using on this list
and in the meetings to argue and write them down?
This requires analysing the trust model. We're still arguing about
the model, but maybe we can start figuring out the approach.
* Explicitly enumerating, specifying protocols and trying to verify them
somehow.
Write down everything we do, and then go through and write
down an explicit protocol for everything. Then figure out
some means of checking whether the protocol does everything
we think it should do.
The verification part of this might involve learning about
authentication logics. That's a ways away, though. Right now
we just need to implement and then write them down. 8-)
* Definitions of anonymity for freehaven, definitions of "success",
of "failure", of "adversary", etc. etc. etc.
That's what I can think of right now. I will note, however, that our
first priority is getting something out the door which works. So the
more theoretical analysis/"proof of security" may or may not have a
priority right now. It's still something which might be worth pursuing
later.
Thanks much, seeya soon,
-David Molnar