[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[freehaven-dev] status

I apologize for not sending this out earlier. It's been a busy several
days so far.

We're going to have another meeting on Sunday, same time same place
(10-280 at 2pm). There might be some new faces at this one, depending on
if Adam and/or Nick decide to drop by.

Time for some wonderful ascii art.

     /     \   /
    /       \ /
comm         X
  | \       / \
  |  \     /   \
  |   buddy-----trust---nodeDB
  \               /      /

It's late, but I'm going to try to share as many of these ideas as I
can for now. We'll solidify them in the meeting, hopefully.

Share DB includes, for each share:
  (Shareindex, DATA, ExpirationDate, PK) signed by PK
  Buddy info: PK_location, PK_oldloc, # failed queries, timestamp of last query
           OR "my buddy is dead"
           OR "I am a secondary copy"
              # of primaries known for this share, # of secondaries known
              Boolean "don't move" (query in progress)

In this case, "primary" means "I am responsible for keeping this share
alive, it's my fault if it goes away" and secondary means "I happen to have
this, but I can delete it if I want to."

Buddy behavior:
  * Locate share that should be checked
  * Mark share don't move
  * Do request for share (via PK or PK+index)

  When share comes in,
  * Is share a primary? If so, increment # primaries else inc # secondaries.

  Once a query times out,
  * If no primary found:
    * Inc # failed queries in sharedb
    * If over threshold, either
      * broadcast loss
      * negotiate for a proxy to broadcast the loss
    * Mark share buddyless
  * Update # primaries, # secondaries in shareDB, as well as timestamp

This algorithm needs to be extended for the idea of multiple buddies.
We need to convince ourselves that doing the buddy system with three
buddies will keep us 'safe enough'. Will some nodes tend to refuse
trades of shares that come with 'not enough buddies'? Is this good?

Some configuration info to keep, on a per-node basis:
  * % of time to choose trade by share (the alternative is to choose by
    node and then pick an appropriate share based on that. We think this
    will produce different behaviors...)
  * % of time to choose proxy for lost buddy announcement
  * Threshold for # failed buddy queries
  * Frequency/threshold for initiating a buddy query
  * Frequency/threshold for initiating a trade
  * ....... (trust things)

In the share DB, we will also want to keep:
  * A directory of share data, so we can do easy lookups without frequently
    examining the blobs of data that are in the db.
  * An ordered list of 'trash shares'. These are secondaries or expired
    shares that can be deleted whenever we want. Our policy is to not
    delete things until we actually need to use the space for something
  Operations on the share DB will maintain these two items. We might want
  a garbage collection module that will build them from scratch.

The node DB keeps a list of nodes that we currently know about. It also
keeps trust information -- whatever the trust module thinks is important
or relevant. The trust module consists of two sets of functions: a set of
"learning functions" which take in information and have no return value, and
a set of "decision booleans", which take in specific questions and decide
'yes' or 'no' based on stuff it finds in the node db.

The comm module is a way of abstracting how the data gets from one node
to the next. It talks to the node DB and figures out enough information
to route messages well.

Note that I've left out a module -- the one that answers queries. The
behavior on this guy is to receive requests, check if we have anything
with that PK, and send out an answer if we do. We'll flag that answer as
primary or secondary depending on which it is. That way if somebody does
a request and gets only secondary responses, he will (hopefully) have time
to worry and deal before the secondaries disappear.

How to implement the databases? The first thought was sql, because everybody
uses sql. On the other hand, getting a mysql server up is a tricky thing to
do. It will raise the minimum clue requirement for running a server by a
huge amount.
I'm tempted to just use a gdbm back-end and that will be that. It should
be fine for this sort of thing.
We talked earlier about a multi-process or multi-threaded approach. My
preference is to make it single-process single-threaded with perhaps an
event system. I figure this can be implemented with a sorted (by when the
event should fire) queue, and then select on the comm socket with a timeout
of the time until the first event in the queue. This should make it a very
simple program (on the order of an evening's coding for a first draft). I've
been meaning to sit down and do it, but haven't found the time this week.
Hopefully this coming week it will happen. :)

I spoke with Adam Smith (grad student at LCS) about doing some more rigorous
and formal definitions of our threat model, and trying to determine what
information is leaked by our protocols. We're not at the point where we can
do this yet, but I think it would be really cool later on. :)

The shareDB is where the "buddy forwarders" are going to go also. Is
there an easy way to integrate them into the current proposed algorithm?

Anyway, I'd like to convince ourselves of the three-buddy idea on Sunday,
and I've asked Brian to have a preliminary set of decision booleans for
the trust system, as well as a list of what information it might be
interested in keeping around in the nodedb.

The better of an idea I have of what we want in the 'configuration info'
section, the cleaner the first draft of the code will be.