[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[freehaven-dev] RFC: Comm Future Works



Please send any comments / additions.

--mike


P.S. Brian, I left the Meatball idea blank, because I didn't fully know
what you were thinking...thoughts/explanations?  I left a comment there
for you...thanks!



\section{Future Work}

A ``standard'' design for an anonymous communications channel is very 
much an open question.  In section \ref{sec:anon}, we specified the 
requirements for an ideal anonymous communications channel, and
considered how current works fulfill these requirements. In general,
there are various considerations when designing an anonymous
communications channel: 

\begin{itemize}
\item Low latency
\item Delivery robustness 
\item Resistance to traffic analysis and similar attacks
\item Types of anonymity provided: 
\begin{itemize}
\item Anonymity vs. pseudonymity
\item Full vs. partial anonymity
\item Computational vs. information-theoretic anonymity
\item Perfect forward anonymity 
\end{itemize}
\end{itemize}

Some of these goals are conflicting in nature.  Systems which provide
low latency and strong delivery robustness are generally more open to
traffic analysis and other such attacks, as messages are routed
quickly and sometimes repetitively through the channel.  Still, while
Free Haven stresses anonymity over availability, we would prefer a
design which provides latency in the realm of seconds or minutes, as
opposed to hours or days.  If a high latency was to endure, Free Haven
usage would be constrained to publishers and readers which specifically
require our strong notions of anonymity.  Similarly, lossyness within
the channel degrades system performance.  Free Haven can be designed to
handle communications lossyness for file reconstruction by a robust
information dispersal algorithm. Still, this condition has possible
effects on the trust network and trading/buddy protocols, when the loss
is not due to server failure, but rather to an unreliable communications
channel.   In this section, we present some alternative designs for an
anonymous communications channel.


\subsection{Garlic Routing}

The concept behind Garlic Routing is similar to a mix-net.  A sender
encodes routing information in a series of layered encryptions, forming
an ``onion'' of encrypted information.  Each node along the route
decrypts the outer layer of the onion, exposing the next layer and
determining the location of the next hop.  Eventually, the entire onion
is peeled the the message reaches its destination.  A garlic packet looks
similar to an onion packet, until it is unwrapped.  A node then finds
several garlic bulbs to transmit, instead of the normal single onion.
Each bulb is a viable path-to-destination from that intermediate node,
therefore providing several routes.  Earlier intermediate nodes would
have no knowledge of the path or existance of these newly exposed
routes.  

Garlic routing provides a few benefits.  Delivery reliability and
robustness is therefore increased through path redundancy.
Reply-block encoding can be implemented efficiently in terms of size, as 
reply blocks will only grow linearly with the total number of nodes in
onion and garlic routes. The encryption of header information can be
performed using a hybrid scheme:  all garlic bulbs within a layer are
encrypted with the same symmetric key, which is then encrypted
with each of garlic node's public key.  Therefore, the size of a garlic
packet containing $n$ bulbs is only $L + k*n$, where $L$ is the size of
the normal onion layer and $k$ is the symmetric key length.  A similar
hybrid encryption scheme is used by PGP.

Based on the concept of a Chaumian mix-net, a garlic-routing mix-net
provides computational anonymity based on the strength of encryption
used on the garlic.  Reply-blocks for the channel provide a route to an
explicit destination, thus disclosure of the information will
specify a receiver.  This differs from our definition of partial
anonymity schemes, where exposure of information will only yield a
$k$-anonymous set of possibilities.


\subsection{Iterative Exposure}

The concept of Iterative Exposure is similar to that of a mix-net:  each
path node only knows the last hop from where it received the packet and
can only expose the next hop to where it should send the packet.
However, we do not use an layered encryption scheme.   Header
information is a list of entries, each one encrypted to a specific node,
containing a simple message such as ``Node A:  send to node B.''
As with Garlic Routing, we can help provide delivery robustness through
redundancy with a simple change:  ``Node A:  send to nodes B, C, D.''
Entries in the routing list are randomly ordered.  A node iterates
through the list, and attempts to decrypt each entry and check if the
entry corresponds to itself.  Once determining the proper next hop
information, the node can reorder the routing list to protect against
message coding attacks.  To adequately protect against this type of
attack, some further change to the message itself would be required at
each hop. 

A destination can either provide a reply-block for a specific source, in
which the source node has an encrypted element in the list, or a generic
public reply-block.  For the latter, one element needs to be publically
readable so that a source knows where to ``pick up'' the path:  a source
can either send a message to this node as the first hop, or the source
can provide its only anonymous path to this node by adding reply list
entries.  Admittedly, this node will lose carrier-anonymity to
adversaries.  However, such a generic reply block can protect against
pseudonymity marking attacks, and it allows both the sender and receiver
to specify paths accordingly to their own anonymity requirements.  


\subsection{Alien Conspiracy Net}
% We need a new name for this....

The Alien Conspiracy Net seeks to make computational and traffic
analysis attacks more difficult by specifying a naming scheme that does
not yield an explicit destination.  The advantage of this scheme is
predicated on the benefits of partial anonymity:  an adversary can only
determine a $k$-anonymous set of destination possibilities.

\subsubsection{Joining an existing net} 

A node generates a series of $n$ identical tokens, each of $b$
randomly-chosen bits.  This sequence of tokens is the node's address,
and is kept private.  To join an existing network, a node connects to a
few existing public nodes and offers to trade one token with each of its
neighbors.  The node removes the token traded away from its address and
replaces it with the neighbor's token.  The node also adds the token
traded away and the one received as its approximation of the neighbor's
address.  

\subsubsection{Node Behavior}

Nodes will occasionally receive messages, most likely in the form of
encrypted data.  Whether or not the node can read the data, it should
always forward it properly to neighboring nodes to protect against
traffic analysis of channel edges. 

At the top of the message will be a series of tokens.  A node compares
these tokens to the approximation of the address of each neighbor,
calculating the number of tokens in common with each.  Of those which 
have the greatest commonality, the node picks some subset of these nodes 
at random, and sends them the message.  

\subsubsection{Partial-Anonymous Naming of Nodes}

In order for the protocol to work, tokens need to continue to spread
throughout the network.  Thus, each node should offer a trade to each of
its neighbors within some user-set delay.  The token chosen to trade is
selected at random from the node's address, so it may not be one of the
node's own tokens. 

Eventually, the above protocol would lead to a completely homogenous
set of tokens.  This would make message delivery problematic.  Thus
each node has a second user-set delay between mutations.  When that
delay has passed, the node replaces portions of its address with its own
tokens.  

A node publicizes its address by exposing some $\log n$ of its address
tokens.  A sender address a message to this $\log n$ address. Therefore,
a node's public address yields partial anonymity:  many such
combinations result from the node's actual address, and many addresses
can yield this set of $\log n$ address tokens.  As destinations forward
their own messages as well, an adversary cannot determine the actual
destination only given the message's destination address.  Obviously,
messages are given a TTL (time-to-live). 

\subsubsection{Network Topology}

This protocol relies on token gradients across the network.  The message
travels from regions of low potential to regions of high potential.
For this to happen, the network probably requires a properly-connected
graph.  If nodes connect to large numbers of other public nodes, or edges
between nodes are randomly distributed, the resulting network may not
have clear token gradients.  We have not yet determined what type of
network topology is actually required for adequate message delivery, nor
considered protocols to yield a desirable topology via distributed
communications and control.


\subsection{Meatball Routing}

% Brian: 
%
% I know you mentioned this several times, but I never got all the
% specifics which you were thinking of.  Could you perhaps explain 
% this stuff?


\subsection{Zones}

The Crowds system organizes users into collections of nodes called
crowds, achieving partial-anonymity for non-local adversaries and
sender-anonymity within each crowd.  Crowds, as proposed by AT\&T
Research, focuses on anonymous Web browsing, in which users communicate
with specific end servers.  Built onto the Crowds model, the Zones
system seeks to achieve anonymous communications between two
users instead.

Communications from a sender are routed through the sender's zone as
specified by the Crowds protocol.  When the message is dispatched from a
crowd node, however, it is not sent directly to an end server.  Instead,
the message is dispatched to another zone.  Once within the destination
zone, the message is either multi-casted to all nodes or forwarded
around via some random walk.  Each node attempts to decrypt each message
received to determine if it is the proper receiver.  If random-walking
is used, the proper receiver still wants to forward the message to
protect against zone traffic analysis.  This being so, a message's TTL
has to be sufficiently high such that a random-walk will
probabilistically find the proper receiver.  This type of distribution
strategy provides $k$-anonymous receiver-anonyminity, where $k$ is the
size of the receiver's zone.

\subsection{Small Worlds Model}

Social networks display two characteristics which initially may appear
to be contradictory.  First, social connections display clustering,
whereby friends are likely to share the same group of friends.  Second,
they exibit what has been termed by Stanley Milgram as the ``small
worlds effect'' \cite{milgram}.  Namely, any two people can establish
contact by going through only a short chain of intermediate
acquaintances.  Milgram proposed that all people in the world are
separated by six intermediataries on average;  this effect is better
known as ``Six Degrees of Separation'' or the ``Kevin Bacon Game.''

\subsubsection{Network Construction and Transmission}

Mathematicians have begun studying sparse networks to prove the small
worlds effect \cite{smallworlds}.  Using these models for network
topology, we can construct an anonymous communications channel built on
this network routing principle.  Crowds or zones mimic small worlds of
friends, without the necessary long-range connections that provide the
small worlds effect.  Therefore, we can achieve this effect by
constructing zones whereby people also specify a few connections to
users in other zones.

A node sends a message to all of its friends, who in turn propogate the
message to the rest of their friends.  While friends share many
connections due the clustering effect -- an explicitly formed by the
creation of zones -- the long-range connections allow the propogation of
messages through the network.  This model places a large load on the
system, especially given the high degree of connectivity in most
networks.  We have also considered an alternative ``directed''
propogation of messages accordingly to some greedy heuristic, such as a
Hamming distance or the protocol we describe in the Alien Conspiracy
Net. 

\subsubsection{Network Anonymity}

The anonymity of this communications channel is based on the presumed
innocence of the Crowds system.  An adversary within the network cannot
determine whether a message received originated from its last hop, or was
merely forwarded on by that node.  As the number of nodes that may be
involved in the message's path increases, the innocence of the last hop
becomes probabilistically greater \cite{crowds}.  The system does not
protect the anonymity of senders from a global observer. 

$K$-anonymous receiver-anonymity can be achieved if receivers forward
the message along.  For simple broadcast networks, $k$ is equal to the
number of nodes in the entire network;  for directed multi-cast
networks, $k$ is the number of nodes that received the message.
With large values of $k$ and a high propogation of message
transmissions, adversaries have a difficult time in performing traffic
analysis.  


Messages should be encrypted to the receiver such that only
the receiver can tell if the message is meant for her, such that an
adversary cannot simply compare some naming scheme to link messages to
receivers.  A small worlds model likely requires a low-latency
communications channel, such as that built directly on TCP/IP, due to
the high bandwidth requires.  Delivery robustness is assured also by the
redundancy of paths to any destination.  To protect against message
coding and message volume attacks, messages can be point-to-point
encrypted and padded to the same or some random size.  Reordering
messages within a given node helps protect against timing attacks
similar to mix-nets.  Like other schemes, we still witness the trade-off 
between a low latency channel and a resistance to traffic analysis.

Gnutella uses a similar six-degrees of separation model for network
communication.  Nodes broadcast a received messsage to all their
friends, and messages include an explicit TTL to stop infinite looping.
Messages are not encrypted or padded.


---------------------
 Michael J. Freedman

Email: mfreed@mit.edu
Web:  griffen.mit.edu
Phone:   617.225.9381