[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[freehaven-dev] RFC: Design for an anonymous network



Design Thoughts on the Yet-Unnamed Anonymous Network
----------------------------------------------------

Gerald and I talked a bit on Saturday afternoon about the design of the
anonymous communication network.  Two aspects which we especially focused on
was speed -- the creation of "virtual circuits" and the use of symmetric
crypto
-- and firewall/NAT support.  Largely coinciding with speed was the type and
nature of communications that we would be using.  We want to minimize the
number of "expensive" control messages, for they will have disproportionally
large percent of network crypto overhead as compared to "data", in comparison
to "data" messages, used for the actual transmission of files.

Consideration #1:  we do not want static reply-blocks to lay around in the
system.  First, they are brittle -- if they are no longer valid, other users
don't have a way to send messages to Alice.  Second, they don't provide
forward
anonymity -- you can computationally crack a reply block n years from now, and
resolve Alice's identity. 

This motivates the use of "meeting place" peers, where peers hang out for
others to find them.

Consideration #2:  reply-blocks don't work in a NAT/firewall world.  If we
want
this protocol to be useable by a variety of sources (e.g., cross-platform,
being NATs and firewalls) and fast (i.e., network layer), reply blocks just
don't work.  They require the ability to connect to a node with an incoming
connection.  We need a way so that firewalled peers only use outgoing
connections.


Note #1:  Peers are "registered" in the node db as being able to accept
incoming connections or not.


Simple network model
--------------------

      A --> 1 --> 2 --> M <-- 4 <-- 3 <-- B

Consider that Alice wants to send a message to Bob.  Alice and Bob are
potentially firewalled peers.  Nodes 1, 2, M (Meeting Place), 4, 3 *must* be
able to accept incoming connections.

Bob is already connected to M (wlog, he uses the same protocol as we will show
for Alice.)

Alice wishes to establish a two-way ephemeral connection with M.  She knows
the
PK_net of peers 1,2,M (their public key used for network communication), which
is also mapped to IP address in some database-like fasion.

We thereafter refer to PK_net as merely PK.  The notation K refers to a
newly-generated symmetric key (DES or equiv).  K_source refers to a
source-routed key, that is, one used for routing in direction A --> M.  K_dest
refers to a dest-routed key, used in direction M --> A.

A generates:   K_A
               K_source_1 K_dest_1   (K_s_1, K_d_1)
               K_source_2 K_dest_2   (K_s_2, K_d_2)
               K_M

A wishes to transmit some MSG to M.
A builds the packet as such:


ROUTE_CREATE_PACKET

Routing Information Header

[K_s_1, K_d_1 {K_s_2, K_d_2 (K_M, K_d_2, K_d_1, K_A)_PK_M}_PK_2]_PK_1

Message 

      [{(MSG)_K_M}_K_s_2]_K_s_1


Packet:   {routing info} o {message}


This requires Alice to do a number of things.  First, she needs to generate a
bunch of symmetric keys.  That's basically free -- it's like generating some
numbers at random, given that key is just some random index into pseudorandom
function family.

Second, she needs to perform a few public-key operations.  However, she's
actually only encryption some relatively small blocks (i.e., order of a few
hundred bits).  We could try to further speed up this process with fast
public-key encryption parameters (e.g., for RSA, e=3, 2^16-1, etc.)

What this gives us is that this offloads much of the cost to the sender.  Our
motivation is that we want intermediate nodes to perform as little as possible
-- in order to prevent easy attacks or general network latency.  

In this model, when a node {1, 2} receives a packet, he only needs to
perform a
small public-key decryption for the routing header, pad the header with some
randomness to make sure it stays the same size as it goes through network (to
prevent easy traffic analysis correlation based on size), store {K_s_i,
K_d_i},
and then decrypt (symmetric) the message block with the uncovered key.  This
MSG decryption at each hop desirable also to deter traffic analysis.

This design also provides end-to-end privacy of MSG (A --> M).  We could add
authenticy by using basic PK signature on MSG (which only M needs see), some
degree of preventation from reply by relative time-stamping the MSG as well.


Virtual Circuit - TCP connection already established
---------------

Internal nodes will keep these TCP connections open for a while (we do not
discuss now how/when they decide to close them.)  This forms the creation of a
virtual circuit -- all data can now be transmitted over it using only
symmetric
key primitives.  That is, they remember a correlation between 2 TCP
connections, and they {K_s_i, K_d_i} used over them.


PREEXISTING_ROUTE_PACKET

From Alice to Meeting place:
      [{(MSG)_K_M}_K_s_2]_K_s_1

From Meeting place to Alice:
      [{(MSG)_K_A}_K_d_1]_K_d_2


Again, we can always throw end-to-end authenticity by signatures and
time-stamps on the message transmitted.

Without loss of generality, Bob can establish a connection to his meeting
place
in a similar manner.  


Connecting Alice and Bob
------------------------

There are two different type of public-key schemes in the network.  (We
probably want names for these):

1.  PK_network (PK_net) is used for anonymous network communication.  We need
some database method (described later) to map PK_net to IP address for
routing.   

2. PK_filesystem (PK_fs) used for end-to-end encryption in the (Free Haven)
file system backend.  This is the "pseudonymous" handle of peers, from which
pseudonymous file sharing and reputation systems will be built.

It is *critical* that PK_net and PK_fs are never correlated.

The ROUTE_CREATE_PACKET that Alice sends to Bob includes a connection create
request, naming Bob via PK_fs_B.

The Meeting place then "joins" its virtual circuits from (A --> M) and (B -->
M).  M sends A in ROUTE_CREATE_RESPONSE (K_d_4, K_d_3, K_B).  M sends B in
ROUTE_CREATE_REQUEST (K_2_d, K_1_d, K_A).  [Obviously, I'm making up all these
names as I go along.]

From this point on, the meeting place needs not perform *anything* different
than any other intermediate node (that is, a single symmetric decryption for
each message being routed through it.)

A sends a message to B:

      [{([{(MSG)_K_B}_K_d_3)_K_d_4)_K_M}_K_u_2]_K_u_1

At this point, it is obvious that the path from A<-->B is quite long.  We
probably want to discuss path length parameterization.

A few things are further noticable:

1.    Problem: M knows the length of hops to A and B, 2 in our above
example.  
      Solution:  We can set some system parameters for path-length (say,
l=4). 
If A actually doesn't want to route through 4 intermediate nodes for speed
(duh!), they she can include some "dummy" packets for M, i.e., M gets {K_M,
K_d_2, K_d_1, K_d_dummy2, k_d_dummy1, K_A}.  M then encrypts the response
message with all these.  A actually gets packets still encrypted under
(K_d_dummy, K_d_dummy1, K_A).  It just needs to do a few more symmetric
decryptions, fairly inexpensive.  To all other nodes, this 2-hop path (with 2
virtual dummy nodes) is indistinguishable from a 4-hop path.

2.    Problem;  M knows symmetric keys down to both A and B.  
      Solution:  When A sends its first message to B (say,
ENDPOINT_ROUTE_CREATE), it can include n random numbers (equal in size to
the n
symmetric keys), such that

      new_key = (old_key + random) mod k

i.e., it performs arithmetic transformation in some prime field k to prevent M
from being able to decrypt future message between A and B.  Obviously, this
requires a bit of more public_key operations: 
      
      ENDPOINT_ROUTE_CREATE_MSG = {r1..rn}_PK_fs_B

Obviously, this requires further thought and analysis, but I think it's a
fairly good starting point.  It emphasizes fast symmetric operations,
especially optimizing for pre-existing communication paths, and pushes off
more
expensive operations to "senders" of messages -- some small amount of CPU
resource protection for intermediate nodes.  Secondly, it is designed to allow
endpoint peers to be behind NATs and firewalls, something not considered
(to my
knowledge) by any other pre-existing mix-net type design.


Finding Meeting Place Peers
---------------------------

Til now, we just abstracted that Alice "knows" which Meeting place server on
which to find Bob.  In early prototype, we can obviously just have it as a
database lookup.  But, what about in a scalable system.

We were considering this operation in terms of the Chord peer-to-peer
key-value
lookup service.  That is, Alice queries the Chord system:

      Query:      PK_fs_B
      Response:   M = {PK_net_M, IPaddr_M}

We should be able to assume that Chord provides quickly-propogating updates
through the system.  For example, if Bob's meeting place proxy goes down, he
selects a new one and issues a Chord update() command.  (This decision
probably
also wants to be motivated by a reputation system.)

Alternatively, we might want Bob to utilize some n meeting place servers at
any
one time.  Remember, these are just longer-lived TCP connections. 

      Query:      PK_fs_B
      Response:   {M_1 .. M_n}

For which Alice will just use one M_i at random to initially try and contact
Bob, repeating this process over the known M_i until one actually finds Bob. 
As these meeting places go down, Bob wants to issue Chord updates on this set.


That's my thoughts for now.  Comments, critiques, suggestions requested.

--mike 

-----
"Not all those who wander are lost."                  mfreed@mit.edu