[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[freehaven-dev] Algorithms in Freehaven




Hey Paul, Roger

This note outlines what I see as the current and possible algorithm and
protocol "problems" in the free haven project. Please note that this
is a partial list. I've cc'd it to the freehaven list in the hopes that
people will yell at me if I forgot anything. 

* Comparison, specification, and implementation of information dispersal
  algorithms.
	- Information dispersal is a solved problem, in the sense that
	it's possible in polynomial time. At the same time, we might
	like to know the following :

		- What is the most computationally efficient information
		dispersal algorithm?

		- What are the space requirements for a share, compared
		to the original file? How do these change depending on
		the number of shares needed to reconstruct?

		- What happens if we receive fewer shares than the
		threshold? is partial reconstruction possible? 

		- What is the exact patent status of all these
		algorithms? 

	- Currently we have mentioned 3 algorithms :
			1) M.O. Rabin's IDA algorithm
				- the "classic"
				- requires O(n^3) to reconstruct
				- used in the Intermemory project
				for a similar application to ours
			2) Madhu Sudan's List-Decode
				- copes well with partial reconstruction
				- produces a "list" of possible
				reconstructions
			3) Michael Mitzenmacher's Tornado Codes
				- O(n) reconstruction time. 
		Adam Smith pointed out that we should be able to adapt any
		erasure-correcting code for information dispersal.

	- After we pick an algorithm, we need to find or build an
	implementation. It would be preferable to find one, and 
	fortunately these seem to be implemented in other places :
		- the INDIA project implemented Rabin's IDA
			http://www.eecs.harvard.edu/~india
		- a friend of mine implemented List-Decode last term
		- Mitzenmacher probably has a library we could use
		for Tornado codes.

	- Another question : can we give some servers "bigger" shares
	than others, based on efficiency or trust requirements?

* Specification of cryptosystems.
	
	- In our current model, there are two places we use crypto
		1) to format our messages properly for our anonymous
		channel, 
and		2) to allow each servnet node to sign its own messages,
		and possibly encrypt messages before handing them to
		the anonymous channel.

	- We do not control 1). At least not now, and we are not likely
	to add control anytime soon. 

	- We _do_ control 2), so we can pick the crypto which makes our
	lives easier.

	- What makes our lives easier? Well, it should satisfy these
	requirements :

		* the primitives involved should be secure 
			- for example, we might specify that
			we want the AES winner (RC6 ?) for symmetric
			crypto, and RSA with large enough key size
			as our public-key primitive. 
			- or we might specify we want 3DES and elliptic
			curves. 
			- this also includes specifying the correct
			padding schemes and so on. 

		* short key size, since we're storing lots of keys
			- this is in tension with previous requirement

		* computationally efficient
			- This may include a "just in time" 
			requirement : we want a cryptosystem and
			signature system which allows us to do lots of
			preprocessing and so allow for when we want to
			send bursts of messages. 

		* already implemented, such as part of OpenSSL or
		other widely available crypto library
	
		* unpatented, or patented with free license available

	- We may need to consider other requirements.
	 For instance, for our public-key encryption algorithm,
	 does it matter if you can tell from the ciphertext which public
	key was used to encrypt? does that hurt anonymity anywhere?
	Do we need to provide for a way to "upgrade" cryptosystems?
	Other "nice things to have" may come up as time goes on. 

* Efficient Broadcasts and pinning down requirements for a freehaven
broadcast model.
	
	Currently, the ability to broadcast a message to all nodes in 
	the servnet is an integral part of the freehaven protocol. 
	This seems unlikely to change. So how, exactly, do we do this?

	I sent a previous message to the freehaven-dev list in which
	I outlined what I thought some requirements and a model for
	broadcast *might* look like. You'll want to look at it, but 
	to summarise :

	Every node knows about every other node, and can reach them
	by sending a message through the anonymous channel. So you 
	can think of the network as this fully connected graph. 
	The nodes are servnet nodes, and the edges are anonymous
	channels. 

	The thing is that this is actually misleading :

	it is possible that every incoming message to a node, no
	matter who it's from, is travelling the same "edge" - the
	same route in an anonymous channel. Sometimes this edge
	goes down -- but then the node can still send messages out. 
	How should we deal with this?	

	At this point we are implementing the protocol in which
	if a single node X wants to broadcast, X sends a single
	message to every other node in the servnet.

	So what needs to be done here is becoming familiar with
	some literature and applying it to this problem. I've started,
	thanks to pointers provided by Adam and the distributed systems
	ppl here...but there's possibly more work to be done. It is 
	unlikely that this will be a priority until after we have
	working code, though!


* Codifying the principle of "acceptable losses". 

	Roger has stated that he wants a protocol which can tolerate
the losses caused by an adversary. That is, even if an adversary can
find and destroy lots of shares, the chances will be really good
that the file being attacked will stay in the freehaven. 

This is a neat principle, because it allows for greater efficiency.
No need to do expensive verification protocols or byzantine agreement or
anything like that.  

Can we create a definition of "success" for the freehaven which captures
this principle? can we create a model of adversaries we can tolerate?
Can we take the intuitive notions we've been using on this list
and in the meetings to argue and write them down?

This requires analysing the trust model. We're still arguing about
the model, but maybe we can start figuring out the approach. 

* Explicitly enumerating, specifying protocols and trying to verify them
somehow.
	Write down everything we do, and then go through and write
	down an explicit protocol for everything. Then figure out
	some means of checking whether the protocol does everything
	we think it should do. 
	The verification part of this might involve learning about
	authentication logics. That's a ways away, though. Right now
	we just need to implement and then write them down. 8-)

* Definitions of anonymity for freehaven, definitions of "success",
	of "failure", of "adversary", etc. etc. etc. 


That's what I can think of right now. I will note, however, that our
first priority is getting something out the door which works. So the
more theoretical analysis/"proof of security" may or may not have a
priority right now. It's still something which might be worth pursuing
later. 

Thanks much, seeya soon, 
-David Molnar