[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] initial checkin for the nato-rta paper



Update of /home/freehaven/cvsroot/doc/rta04
In directory moria.mit.edu:/home2/arma/work/freehaven/doc/rta04

Added Files:
	nato-rta04.bib nato-rta04.tex 
Log Message:
initial checkin for the nato-rta paper


--- NEW FILE: nato-rta04.bib ---
(This appears to be a binary file; contents omitted.)

--- NEW FILE: nato-rta04.tex ---
\documentclass{article}
%\usepackage{latex8}
\usepackage{times}
\usepackage{url}
\usepackage{graphics}
\usepackage{amsmath}

%\pagestyle{empty}


\hyphenation{a-non-y-mize a-non-y-miz-er}

\renewcommand\url{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}
\newcommand\emailaddr{\begingroup \def\UrlLeft{<}\def\UrlRight{>}\urlstyle{tt}\Url}

\newcommand{\workingnote}[1]{}        % The version that hides the note.
%\newcommand{\workingnote}[1]{(**#1)}   % The version that makes the note visible.


% If an URL ends up with '%'s in it, that's because the line *in the .bib/.tex
% file* is too long, so break it there (it doesn't matter if the next line is
% indented with spaces). -DH

%\newif\ifpdf
%\ifx\pdfoutput\undefined
%   \pdffalse
%\else
%   \pdfoutput=1
%   \pdftrue
%\fi

\newenvironment{tightlist}{\begin{list}{$\bullet$}{
  \setlength{\itemsep}{0mm}
    \setlength{\parsep}{0mm}
    %  \setlength{\labelsep}{0mm}
    %  \setlength{\labelwidth}{0mm}
    %  \setlength{\topsep}{0mm}
    }}{\end{list}}

\begin{document}

%% Use dvipdfm instead. --DH
%\ifpdf
%  \pdfcompresslevel=9
%  \pdfpagewidth=\the\paperwidth
%  \pdfpageheight=\the\paperheight
%\fi

\title{Protecting Against Traffic Analysis on Unclassified
  Networks\thanks{This work supported by DARPA and ONR.}}
% Putting the 'Private' back in 'Virtual Private Network

\author{Roger Dingledine \\ The Free Haven Project \\ arma@freehaven.net \and
Nick Mathewson \\ The Free Haven Project \\ nickm@freehaven.net \and
Catherine Meadows \\ Naval Research Laboratory \\ meadows@itd.nrl.navy.mil \and
Paul Syverson \\ Naval Research Laboratory \\ syverson@itd.nrl.navy.mil}

\maketitle
\thispagestyle{empty}

\begin{abstract}
  While the need for data and message confidentiality is well known,
  the need to protect networks against traffic analysis is less widely
  recognized.  Unclassified networks are subject to traffic analysis.
  Tor is a circuit-based low-latency anonymous communication service
  that resists traffic analysis. This second-generation Onion Routing
  system adds to the first-generation design with perfect forward
  secrecy, congestion control, directory servers, integrity checking,
  variable exit policies, and a practical design for rendezvous
  points. Tor works on the real-world Internet, requires no special
  privileges or kernel modifications, requires little synchronization
  or coordination between nodes, and provides a reasonable tradeoff
  between anonymity, usability, and efficiency.
\end{abstract}

%\begin{center}
%\textbf{Keywords:} anonymity, peer-to-peer, remailer, nymserver, reply block
%\end{center}


\section{Introduction}

It is well known that encryption hides the content of communication
but does nothing to hide who is communicating. Indeed, Whit Diffie, an
inventor of public-key cryptography, has noted that cryptanalysis is
not the backbone of signals intelligence, rather, it is traffic
analysis. The military has many reasons to communicate over open
networks but must sometimes hide the fact that it is doing so. For
example, it may be much more expedient and convenient to gather
intelligence from open Internet sources. Another reason for using open
networks is rapid formation of dynamic coalitions without an existing
shared private infrastructure between members. A third reason is that
hiding communication with vendors may help conceal procurement
patterns. Finally, it is sometimes not the communicants that are
sensitive but their location.  A server whose physical or logical
location is known may be vulnerable to physical attack and denial of
service.

Onion Routing  is on overlay network concept for making anonymous
connections resistant to eavesdropping and traffic analysis.  It
permits low-latency TCP-based communication such as web traffic,
secure shell remote login, and instant messaging. The current design
and implementation, Tor, makes a number of improvements on the
original. These include perfect forward secrecy, being able to interface
with applications via SOCKS without modification to those applications
or to Onion Routing, multiplexing of application connections on
Onion Routing circuits, congestion control, fault tolerance for node
failure, integrity checking, and rendezvous points that protect the
responder of a connection in addition to the initiator.

Onion Routing may be used anywhere traffic analysis is a concern.
Because Onion Routing is an overlay network, it can exist on top of
public networks such as the Internet without any modification to the
underlying routing structure or protocols.  The confidentiality and
integrity of communications are automatically protected by the Onion
Routing protocol. However, the endpoints are also hidden. An
intelligence analyst surfing a web site through Onion Routing is
hidden both from that web site and from the Onion Routing network
itself.  On the other hand, Onion Routing separates anonymity of the
communication from that of the data stream. So, a procurement officer
can place orders with a vendor and completely authenticate himself to
the vendor while still hiding the communication from any
observers---including compromised Onion Routing network components.
Onion Routing can also be used to provide location hidden servers with
better protection and yet less redundancy than standard approaches to
distributed denial of service.  In this paper we provide a brief
overview of the Tor design. More detailed description is given in
\cite{tor-design}, from which much of the present paper was extracted.
As we describe the system design, we will note how Onion Routing can
be used to protect military communications in the above described
settings.

\subsection{Related Work}
Onion Routing did not arise in a vacuum. In this summary we cannot
describe all of the related work that came before. We give here only a
broad description of prior work, references and comparisons can be
found in \cite{tor-design}.  Modern anonymity systems date to Chaum's
{\bf Mix-Net} design \cite{chaum-mix}. Chaum proposed hiding the
correspondence between sender and recipient by wrapping messages in
layers of public-key cryptography, and relaying them through a path
composed of ``mixes.''  Each mix in turn decrypts, delays, and
re-orders messages, before relaying them toward their destinations.

Subsequent relay-based anonymity designs have diverged in two main
directions.  Some have tried to maximize anonymity at the cost of
introducing comparatively large and variable latencies.  Because of
this decision, these \emph{high-latency} networks resist strong global
adversaries, but introduce too much lag for interactive tasks like web
browsing, internet chat, or SSH connections.

Tor belongs to the second category: \emph{low-latency} designs that
try to anonymize interactive network traffic. These systems handle a
variety of bidirectional protocols.  They also provide more convenient
mail delivery than the high-latency anonymous email networks, because
the remote mail server provides explicit and timely delivery
confirmation.  But because these designs typically involve many
packets that must be delivered quickly, it is difficult for them to
prevent an attacker who can eavesdrop both ends of the communication
from correlating the timing and volume of traffic entering the
anonymity network with traffic leaving it.  These protocols are also
vulnerable against active attacks in which an adversary introduces
timing patterns into traffic entering the network and looks for
correlated patterns among exiting traffic.  Although some work has
been done to frustrate these attacks, most designs protect primarily
against traffic analysis rather than traffic confirmation (cf.\ 
Section~\ref{subsec:threat-model}).
 

The simplest low-latency designs are single-hop proxies such as the
Anonymizer \cite{anonymizer}, wherein a single trusted server
strips the data's origin before relaying it.  More complex are
distributed-trust, circuit-based anonymizing systems.  In these
designs, a user establishes one or more medium-term bidirectional
end-to-end circuits, and tunnels data in fixed-size cells.
Establishing circuits is computationally expensive and typically
requires public-key cryptography, whereas relaying cells is
comparatively inexpensive and typically requires only symmetric
encryption.  Because a circuit crosses several servers, and each
server only knows the adjacent servers in the circuit, no single
server can link a user to her communication partners.  There have been
many of these making a variety of design choices and we again refer
the reader to \cite{tor-design} for more information.

\section{Design goals and assumptions}
\label{sec:assumptions}

\noindent{\large\bf Goals}\\
Like other low-latency anonymity designs, Tor seeks to frustrate
attackers from linking communication partners, or from linking
multiple communications to or from a single user.  Within this
main goal, however, several considerations have directed
Tor's evolution.

\textbf{Diversity:} If all onion routers are operated by the defense
department or ministry of a single nation and all users of the network
are DoD users, then traffic patterns of individuals, enclaves, and
commands may be protected. However, any traffic emerging from the
Onion Routing network to the Internet will be recognized as coming
from the DoD.  Therefore, it is necessary that the Onion Routing
network carry traffic of a broader class of users. Similarly, having
onion routers run by diverse entities, including nonmilitary entities
and entities from diverse countries, will help broaden and enlarge the
class of users who will trust that system insiders will not monitor
their traffic. This will provide both a greater diversity and greater
volume of cover traffic. Unlike confidentiality, one cannot have
anonymity by oneself, no matter how strong the technology.  This need
for diversity affects the way other goals must be pursued.

\textbf{Deployability:} The design must be deployed and used in the
real world.  Thus it must not be expensive to run (for example, by
requiring more bandwidth than onion router operators are willing to
provide); must not place a heavy liability burden on operators (for
example, by allowing attackers to implicate onion routers in illegal
activities); and must not be difficult or expensive to implement (for
example, by requiring kernel patches, or separate proxies for every
protocol).  We also cannot require non-anonymous parties (such as
websites) to run our software.  (Our rendezvous point design does not
meet this goal for non-anonymous users talking to hidden servers,
however; see Section~\ref{sec:rendezvous}.)

\textbf{Usability:} A hard-to-use system has fewer users---and because
anonymity systems hide users among users, a system with fewer users
provides less anonymity.  Usability is thus not only a convenience: it
is a security requirement \cite{econymics,back01}. Tor should
therefore not require modifying applications; should not introduce
prohibitive delays; and should require users to make as few
configuration decisions as possible.  Finally, Tor should be easily
implemented on all common platforms; we cannot require users to change
their operating system in order to be anonymous.  (The current Tor
implementation runs on Windows and assorted Unix clones including
Linux, FreeBSD, and MacOS X.)

\textbf{Flexibility:} The protocol must be flexible and
well-specified, so Tor can serve as a test-bed for future research.
Many of the open problems in low-latency anonymity networks, such as
generating dummy traffic or preventing Sybil attacks \cite{sybil}, may
be solvable independently from the issues solved by Tor. Hopefully
future systems will not need to reinvent Tor's design.  (But note that
while a flexible design benefits researchers, there is a danger that
differing choices of extensions will make users distinguishable.
Experiments should be run on a separate network.)

\textbf{Simple design:} The protocol's design and security parameters
must be well-understood. Additional features impose implementation and
complexity costs; adding unproven techniques to the design threatens
deployability, readability, and ease of security analysis. Tor aims to
deploy a simple and stable system that integrates the best accepted
approaches to protecting anonymity.\\

\noindent{\large\bf Non-goals}\label{subsec:non-goals}\\
In favoring simple, deployable designs, we have explicitly deferred
several possible goals, either because they are solved elsewhere, or because
they are not yet solved.

\textbf{Not peer-to-peer:} Tarzan and MorphMix aim to scale to completely
decentralized peer-to-peer environments with thousands of short-lived
servers, many of which may be controlled by an adversary.  This approach
is appealing, but still has many open problems
\cite{tarzan:ccs02,morphmix:fc04}.

\textbf{Not secure against end-to-end attacks:} We do not claim that
Tor provides a definitive solution to end-to-end attacks, such as
correlating the timing of connections opening or correlating when
users are on the system with when certain traffic is observed (also
known as intersection attacks). Some approaches, such as running an
onion router, may help; see \cite{tor-design} for more discussion.

\textbf{No protocol normalization:} Tor does not provide
\emph{protocol normalization} like Privoxy \cite{privoxy} or the
Anonymizer \cite{anonymizer}. In other words, Tor anonymizes the
channel, but not the data or applications that pass over it.  This
means that Tor in itself will not hide, for example, a web surfer from
being identified by the data or application protocol information
observed at a visited web site.  If anonymization from the responder
is desired for complex and variable protocols like HTTP, Tor must be
layered with a filtering proxy such as Privoxy to hide differences
between clients, and expunge protocol features that leak identity.
Note that by this separation Tor can also provide services that are
anonymous to the network yet authenticated to the responder, like SSH.
So, for example, road warriors can make authenticated connections to
their home systems without revealing this to anyone including the
local network access point.  Similarly, Tor does not currently
integrate tunneling for non-stream-based protocols like UDP; this too
must be provided by an external service.

\textbf{Not steganographic:} Tor does not try to conceal who is connected
to the network by someone in a postion to observe that connection.

\subsection{Threat Model}
\label{subsec:threat-model}

A global passive adversary is the most commonly assumed threat when
analyzing theoretical anonymity designs. But like all practical
low-latency systems, Tor does not protect against such a strong
adversary. Instead, we assume an adversary who can observe some
fraction of network traffic; who can generate, modify, delete, or
delay traffic; who can operate onion routers of its own; and who can
compromise some fraction of the onion routers.

In low-latency anonymity systems that use layered encryption, the
adversary's typical goal is to observe both the initiator and the
responder. By observing both ends, passive attackers can confirm a
suspicion that Alice is talking to Bob if the timing and volume
patterns of the traffic on the connection are distinct enough; active
attackers can induce timing signatures on the traffic to force
distinct patterns. Rather than focusing on these \emph{traffic
  confirmation} attacks, we aim to prevent \emph{traffic analysis}
attacks, where the adversary uses traffic patterns to learn which
points in the network he should attack.

Our adversary might try to link an initiator Alice with her
communication partners, or try to build a profile of Alice's behavior.
He might mount passive attacks by observing the network edges and
correlating traffic entering and leaving the network---by
relationships in packet timing, volume, or externally visible
user-selected options. The adversary can also mount active attacks by
compromising routers or keys; by replaying traffic; by selectively
denying service to trustworthy routers to move users to compromised
routers, or denying service to users to see if traffic elsewhere in
the network stops; or by introducing patterns into traffic that can
later be detected. The adversary might subvert the directory servers
to give users differing views of network state. Additionally, he can
try to decrease the network's reliability by attacking nodes or by
performing antisocial activities from reliable servers and trying to
get them taken down; making the network unreliable flushes users to
other less anonymous systems, where they may be easier to attack.  
The Tor design provides various protections against these various threats.
Discussion of how well the Tor design defends
against each of these attacks is presented in \cite{tor-design}.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Overview of the Tor Design}
\label{sec:design}

The Tor network is an overlay network; each onion router (OR) 
runs as a normal
user-level process without any special privileges.
Each onion router maintains a long-term TLS \cite{TLS}
connection to every other onion router.
%(We discuss alternatives to this clique-topology assumption in
%Section~\ref{sec:maintaining-anonymity}.)
% A subset of the ORs also act as
%directory servers, tracking which routers are in the network;
%see Section~\ref{subsec:dirservers} for directory server details.
Each user
runs local software called an onion proxy (OP) to fetch directories,
establish circuits across the network,
and handle connections from user applications.  These onion proxies accept
TCP streams and multiplex them across the circuits. The onion
router on the other side 
of the circuit connects to the destinations of
the TCP streams and relays data.

Each onion router uses three public keys: a long-term identity key, a
short-term onion key, and a short-term link key.  The identity
key is used to sign TLS certificates, to sign the OR's \emph{router
descriptor} (a summary of its keys, address, bandwidth, exit policy,
and so on), and (by directory servers) to sign directories. Changing
the identity key of a router is considered equivalent to creating a
new router. The onion key is used to decrypt requests
from users to set up a circuit and negotiate ephemeral keys. Finally,
link keys are used by the TLS protocol when communicating between
onion routers. Each short-term key is rotated periodically and
independently, to limit the impact of key compromise.

Section~\ref{subsec:cells} presents the fixed-size
\emph{cells} that are the unit of communication in Tor. We describe
in Section~\ref{subsec:circuits} how circuits are
built, extended, truncated, and destroyed. Section~\ref{subsec:tcp}
describes how TCP streams are routed through the network.  We address
integrity checking in Section~\ref{subsec:integrity-checking},
and resource limiting in Section~\ref{subsec:rate-limit}.
Finally,
Section~\ref{subsec:congestion} talks about congestion control and
fairness issues.

\subsection{Cells}
\label{subsec:cells}

Onion routers communicate with one another, and with users' OPs, via
TLS connections with ephemeral keys.  Using TLS conceals the data on
the connection with perfect forward secrecy, and prevents an attacker
from modifying data on the wire or impersonating an OR.

Traffic passes along these connections in fixed-size cells.  Each cell
is 256 bytes (but see Section~\ref{sec:conclusion} for a discussion of
allowing large cells and small cells on the same network), and
consists of a header and a payload. The header includes a circuit
identifier (circID) that specifies which circuit the cell refers to
(many circuits can be multiplexed over the single TLS connection), and
a command to describe what to do with the cell's payload.  (Circuit
identifiers are connection-specific: each single circuit has a different
circID on each OP/OR or OR/OR connection it traverses.)
Based on their command, cells are either \emph{control} cells, which are
always interpreted by the node that receives them, or \emph{relay} cells,
which carry end-to-end stream data.   The control cell commands are:
\emph{padding} (currently used for keepalive, but also usable for link
padding); \emph{create} or \emph{created} (used to set up a new circuit);
and \emph{destroy} (to tear down a circuit).

Relay cells have an additional header (the relay header) after the
cell header, containing a stream identifier (many streams can
be multiplexed over a circuit); an end-to-end checksum for integrity
checking; the length of the relay payload; and a relay command.  
The entire contents of the relay header and the relay cell payload 
are encrypted or decrypted together as the relay cell moves along the
circuit, using the 128-bit AES cipher in counter mode to generate a
cipher stream.
The
relay commands are: \emph{relay
data} (for data flowing down the stream), \emph{relay begin} (to open a
stream), \emph{relay end} (to close a stream cleanly), \emph{relay
teardown} (to close a broken stream), \emph{relay connected}
(to notify the OP that a relay begin has succeeded), \emph{relay
extend} and \emph{relay extended} (to extend the circuit by a hop,
and to acknowledge), \emph{relay truncate} and \emph{relay truncated}
(to tear down only part of the circuit, and to acknowledge), \emph{relay
sendme} (used for congestion control), and \emph{relay drop} (used to
implement long-range dummies).


\subsection{Circuits and streams}
\label{subsec:circuits}

Onion Routing originally built one circuit for each
TCP stream.  Because building a circuit can take several tenths of a
second (due to public-key cryptography and network latency),
this design imposed high costs on applications like web browsing that
open many TCP streams.

In Tor, each circuit can be shared by many TCP streams.  To avoid
delays, users construct circuits preemptively.  To limit linkability
among their streams, users' OPs build a new circuit
periodically if the previous one has been used,
and expire old used circuits that no longer have any open streams.
OPs consider making a new circuit once a minute: thus
even heavy users spend negligible time
building circuits, but a limited number of requests can be linked
to each other through a given exit node. Also, because circuits are built
in the background, OPs can recover from failed circuit creation
without delaying streams and thereby harming user experience.\\

\noindent{\large\bf Constructing a circuit}
\label{subsubsec:constructing-a-circuit}\\
%\subsubsection{Constructing a circuit}
A user's OP constructs circuits incrementally, negotiating a
symmetric key with each OR on the circuit, one hop at a time. To begin
creating a new circuit, the OP (call her Alice) sends a
\emph{create} cell to the first node in her chosen path (call him Bob).  
(She chooses a new
circID $C_{AB}$ not currently used on the connection from her to Bob.)
The \emph{create} cell's
payload contains the first half of the Diffie-Hellman handshake
($g^x$), encrypted to the onion key of the OR (call him Bob). Bob
responds with a \emph{created} cell containing the second half of the
DH handshake, along with a hash of the negotiated key $K=g^{xy}$.

Once the circuit has been established, Alice and Bob can send one
another relay cells encrypted with the negotiated
key.\footnote{Actually, the negotiated key is used to derive two
  symmetric keys: one for each direction.}  More detail is given in
\cite{tor-design}.\\


\noindent{\large\bf Relay cells}\\
%\subsubsection{Relay cells}
%
Once Alice has established the circuit (so she shares keys with each
OR on the circuit), she can send relay cells.  Recall that every relay
cell has a streamID that indicates to which
stream the cell belongs.  This streamID allows a relay cell to be
addressed to any OR on the circuit.  Upon receiving a relay
cell, an OR looks up the corresponding circuit, and decrypts the relay
header and payload with the session key for that circuit.
If the cell is headed downstream (away from Alice) the OR then checks
whether the decrypted streamID is recognized---either because it
corresponds to an open stream at this OR for the given circuit, or because
it is the control streamID (zero).  If the OR recognizes the
streamID, it accepts the relay cell and processes it as described
below.  Otherwise, 
the OR looks up the circID and OR for the
next step in the circuit, replaces the circID as appropriate, and
sends the decrypted relay cell to the next OR.  (If the OR at the end
of the circuit receives an unrecognized relay cell, an error has
occurred, and the cell is discarded.)
\\ \\
\noindent{\large\bf Opening and closing streams}\\
\label{subsec:tcp}
When Alice's application wants a TCP connection to a given
address and port, it asks the OP (via SOCKS) to make the
connection. The OP chooses the newest open circuit (or creates one if
none is available), and chooses a suitable OR on that circuit to be the
exit node (usually the last node, but maybe others due to exit policy
conflicts; see Section~\ref{subsec:exitpolicies}.) The OP then opens
the stream by sending a \emph{relay begin} cell to the exit node,
using a streamID of zero (so the OR will recognize it), containing as
its relay payload a new randomly generated streamID, the destination
address, and the destination port.  Once the
exit node completes the connection to the remote host, it responds
with a \emph{relay connected} cell.  Upon receipt, the OP sends a
SOCKS reply to notify the application of its success. The OP
now accepts data from the application's TCP stream, packaging it into
\emph{relay data} cells and sending those cells along the circuit to
the chosen OR.
\\
\noindent{\large\bf Integrity checking on circuits}
\label{subsec:integrity-checking}
\\
Because the old Onion Routing design used a stream cipher, traffic was
vulnerable to a malleability attack: though the attacker could not
decrypt cells, any changes to encrypted data
would create corresponding changes to the data leaving the network.
(Even an external adversary could do this, despite link encryption, by
inverting bits on the wire.)

This weakness allowed an adversary to change a padding cell to a destroy
cell; change the destination address in a \emph{relay begin} cell to the
adversary's webserver; or change an FTP command from
{\tt dir} to {\tt rm~*}. Any OR or external adversary
along the circuit could introduce such corruption in a stream, if it
knew or could guess the encrypted content.

Tor prevents external adversaries from mounting this attack by
using TLS on its links, which provides integrity checking.
Addressing the insider malleability attack, however, is
more complex. Detail is given in \cite{tor-design}.
\\ \\
\noindent{\large\bf Rate limiting and fairness}
\label{subsec:rate-limit}
\\
Volunteers are generally more willing to run services that can limit
their own bandwidth usage. To accommodate them, Tor servers use a
token bucket approach \cite{tannenbaum96} to 
enforce a long-term average rate of incoming bytes, while still
permitting short-term bursts above the allowed bandwidth. Current bucket
sizes are set to ten seconds' worth of traffic.
\\ \\
\noindent{\large\bf Congestion control}
\label{subsec:congestion}
\\
Even with bandwidth rate limiting, we still need to worry about
congestion, either accidental or intentional. If enough users choose
the same OR-to-OR connection for their circuits, that connection can
become saturated. For example, an attacker could send a large file
through the Tor network to a webserver he runs, and then refuse to
read any of the bytes at the webserver end of the circuit. Without
some congestion control mechanism, these bottlenecks can propagate
back through the entire network. We don't need to reimplement full TCP
windows (with sequence numbers, the ability to drop cells when we're
full and retransmit later, and so on), because TCP already guarantees
in-order delivery of each cell. Tor provides both circuit and stream
level throttling. See \cite{tor-design} for more details.

\section{Other design decisions}

\subsection{Resource management and denial-of-service}
\label{subsec:dos}

Providing Tor as a public service creates many opportunities for
denial-of-service attacks against the network.  While
flow control and rate limiting (discussed in
Section~\ref{subsec:congestion}) prevent users from consuming more
bandwidth than routers are willing to provide, opportunities remain for
users to
consume more network resources than their fair share, or to render the
network unusable for others. We discuss some of these in \cite{tor-design}.

\subsection{Exit policies and abuse}
\label{subsec:exitpolicies}

% originally, we planned to put the "users only know the hostname,
% not the IP, but exit policies are by IP" problem here too. Not
% worth putting in the submission, but worth thinking about putting
% in sometime somehow. -RD

Exit abuse is a serious barrier to wide-scale Tor deployment. Anonymity
presents would-be vandals and abusers with an opportunity to hide
the origins of their activities. Attackers can harm the Tor network by
implicating exit servers for their abuse. Also, applications that commonly
use IP-based authentication (such as institutional mail or webservers)
can be fooled by the fact that anonymous connections appear to originate
at the exit OR.

We stress that Tor does not enable any new class of abuse. Spammers
and other attackers already have access to thousands of misconfigured
systems worldwide, and the Tor network is far from the easiest way
to launch antisocial or illegal attacks.
%Indeed, because of its limited
%anonymity, Tor is probably not a good way to commit crimes.
But because the
onion routers can easily be mistaken for the originators of the abuse,
and the volunteers who run them may not want to deal with the hassle of
repeatedly explaining anonymity networks, we must block or limit
the abuse that travels through the Tor network.

To mitigate abuse issues, in Tor, each onion router's \emph{exit policy}
describes to which external addresses and ports the router will
connect. This is described further in \cite{tor-design}.

Finally, we note that exit abuse must not be dismissed as a peripheral
issue: when a system's public image suffers, it can reduce the number
and diversity of that system's users, and thereby reduce the anonymity
of the system itself.  Like usability, public perception is a
security parameter.  Sadly, preventing abuse of open exit nodes is an
unsolved problem, and will probably remain an arms race for the
forseeable future.  The abuse problems faced by Princeton's CoDeeN
project \cite{darkside} give us a glimpse of likely issues.

\subsection{Directory Servers}
\label{subsec:dirservers}

First-generation Onion Routing designs \cite{freedom2-arch,or-jsac98} used
in-band network status updates: each router flooded a signed statement
to its neighbors, which propagated it onward. But anonymizing networks
have different security goals than typical link-state routing protocols.
For example, delays (accidental or intentional)
that can cause different parts of the network to have different views
of link-state and topology are not only inconvenient: they give
attackers an opportunity to exploit differences in client knowledge.
We also worry about attacks to deceive a
client about the router membership list, topology, or current network
state. Such \emph{partitioning attacks} on client knowledge help an
adversary to efficiently deploy resources
against a target \cite{minion-design}.


Tor uses a small group of redundant, well-known onion routers to
track changes in network topology and node state, including keys and
exit policies.  Each such \emph{directory server} acts as an HTTP
server, so participants can fetch current network state and router
lists, and so other ORs can upload
state information.  Onion routers periodically publish signed
statements of their state to each directory server. The directory servers
combine this state information with their own views of network liveness,
and generate a signed description (a \emph{directory}) of the entire
network state. Client software is
pre-loaded with a list of the directory servers and their keys,
to bootstrap each client's view of the network.
More details are provided in \cite{tor-design}.

Using directory servers is simpler and more flexible than flooding.
Flooding is expensive, and complicates the analysis when we
start experimenting with non-clique network topologies. Signed
directories can be cached by other
onion routers.
Thus directory servers are not a performance
bottleneck when we have many users, and do not aid traffic analysis by
forcing clients to periodically announce their existence to any
central point.

\section{Rendezvous points and hidden services}
\label{sec:rendezvous}

Rendezvous points are a building block for \emph{location-hidden
services} (also known as \emph{responder anonymity}) in the Tor
network.  Location-hidden services allow Bob to offer a TCP
service, such as a webserver, without revealing its IP address.
This type of anonymity protects against distributed DoS attacks:
attackers are forced to attack the onion routing network as a whole
rather than just Bob's IP address.

Our design for location-hidden servers has the following goals.
\textbf{Access-controlled:} Bob needs a way to filter incoming requests,
so an attacker cannot flood Bob simply by making many connections to him.
\textbf{Robust:} Bob should be able to maintain a long-term pseudonymous
identity even in the presence of router failure. Bob's service must
not be tied to a single OR, and Bob must be able to tie his service
to new ORs. \textbf{Smear-resistant:}
A social attacker who offers an illegal or disreputable location-hidden
service should not be able to ``frame'' a rendezvous router by 
making observers believe the router created that service.
%slander-resistant? defamation-resistant?
\textbf{Application-transparent:} Although we require users
to run special software to access location-hidden servers, we must not
require them to modify their applications.

We provide location-hiding for Bob by allowing him to advertise
several onion routers (his \emph{introduction points}) as contact
points. He may do this on any robust efficient
key-value lookup system with authenticated updates, such as a
distributed hash table (DHT) like CFS \cite{cfs:sosp01}\footnote{
Rather than rely on an external infrastructure, the Onion Routing network
can run the DHT itself.  At first, we can run a simple lookup
system on the
directory servers.} Alice, the client, chooses an OR as her
\emph{rendezvous point}. She connects to one of Bob's introduction
points, informs him of her rendezvous point, and then waits for him
to connect to the rendezvous point. This extra level of indirection
helps Bob's introduction points avoid problems associated with serving
unpopular files directly (for example, if Bob serves
material that the introduction point's community finds objectionable,
or if Bob's service tends to get attacked by network vandals).
The extra level of indirection also allows Bob to respond to some requests
and ignore others.


\subsection{Integration with user applications}

Bob configures his onion proxy to know the local IP address and port of his
service, a strategy for authorizing clients, and a public key. Bob
publishes the public key, an expiration time (``not valid after''), and
the current introduction points for his service into the DHT, indexed
by the hash of the public key.  Bob's webserver is unmodified,
and doesn't even know that it's hidden behind the Tor network.

Alice's applications also work unchanged---her client interface
remains a SOCKS proxy. We encode all of the necessary information
into the fully qualified domain name Alice uses when establishing her
connection. Location-hidden services use a virtual top level domain
called {\tt .onion}: thus hostnames take the form {\tt x.y.onion} where
{\tt x} is the authorization cookie, and {\tt y} encodes the hash of
the public key. Alice's onion proxy
examines addresses; if they're destined for a hidden server, it decodes
the key and starts the rendezvous as described above.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Future Directions}
\label{sec:conclusion}

Tor brings together many innovations into a unified deployable system. The
next immediate steps include:

\emph{Scalability:} Tor's emphasis on deployability and design simplicity
has led us to adopt a clique topology, semi-centralized 
directories, and a full-network-visibility model for client
knowledge. These properties will not scale past a few hundred servers.
The Tor design paper \cite{tor-design} describes some promising
approaches, but more deployment experience will be helpful in learning
the relative importance of these bottlenecks.

\emph{Bandwidth classes:} This paper assumes that all ORs have
good bandwidth and latency. We should instead adopt the Morphmix model,
where nodes advertise their bandwidth level (DSL, T1, T3), and
Alice avoids bottlenecks by choosing nodes that match or
exceed her bandwidth. In this way DSL users can usefully join the Tor
network.

\emph{Incentives:} Volunteers who run nodes are rewarded with publicity
and possibly better anonymity \cite{econymics}. More nodes means increased
scalability, and more users can mean more anonymity. We need to continue
examining the incentive structures for participating in Tor.

\emph{Cover traffic:} Currently Tor omits cover traffic---its costs
in performance and bandwidth are clear but its security benefits are
not well understood. We must pursue more research on link-level cover
traffic and long-range cover traffic to determine whether some simple padding
method offers provable protection against our chosen adversary.

%%\emph{Offer two relay cell sizes:} Traffic on the Internet tends to be
%%large for bulk transfers and small for interactive traffic. One cell
%%size cannot be optimal for both types of traffic.
% This should go in the spec and todo, but not the paper yet. -RD

\emph{Caching at exit nodes:} Perhaps each exit node should run a
caching web proxy, to improve anonymity for cached pages (Alice's request never
leaves the Tor network), to improve speed, and to reduce bandwidth cost.
On the other hand, forward security is weakened because caches
constitute a record of retrieved files.  We must find the right
balance between usability and security.

\emph{Better directory distribution:}
Clients currently download a description of
the entire network every 15 minutes. As the state grows larger
and clients more numerous, we may need a solution in which
clients receive incremental updates to directory state.
More generally, we must find more
scalable yet practical ways to distribute up-to-date snapshots of
network status without introducing new attacks.

\emph{Implement location-hidden services:} The design in
Section~\ref{sec:rendezvous} has not yet been implemented.  While doing
so we are likely to encounter additional issues that must be resolved,
both in terms of usability and anonymity.

\emph{Further specification review:} Although have a public
byte-level specification for the Tor protocols, it needs
extensive external review.  We hope that as Tor
is more widely deployed, more people will examine its
specification.

\emph{Multisystem interoperability:} We are currently working with the
designer of MorphMix to unify the specification and implementation of
the common elements of our two systems. So far, this seems
to be relatively straightforward.  Interoperability will allow testing
and direct comparison of the two designs for trust and scalability.

\emph{Wider-scale deployment:} The original goal of Tor was to
gain experience in deploying an anonymizing overlay network, and
learn from having actual users.  We are now at a point in design
and development where we can start deploying a wider network.  Once
we have many actual users, we will doubtlessly be better
able to evaluate some of our design decisions, including our
robustness/latency tradeoffs, our performance tradeoffs (including
cell size), our abuse-prevention mechanisms, and
our overall usability.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% commented out for anonymous submission
%\section*{Acknowledgments}
% Peter Palfrader, Geoff Goodell, Adam Shostack, Joseph Sokol-Margolis,
%   John Bashinski, Zack Brown:
%   for editing and comments.
% Matej Pfajfar, Andrei Serjantov, Marc Rennhard: for design discussions.
% Bram Cohen for congestion control discussions.
% Adam Back for suggesting telescoping circuits.
% Cathy Meadows for formal analysis of the extend protocol.
% This work supported by ONR and DARPA.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\bibliographystyle{plain}
\bibliography{nato-rta04}

\end{document}

% Style guide:
%     U.S. spelling
%     avoid contractions (it's, can't, etc.)
%     prefer ``for example'' or ``such as'' to e.g.
%     prefer ``that is'' to i.e.
%     'mix', 'mixes' (as noun)
%     'mix-net'
%     'mix', 'mixing' (as verb)
%     'middleman'  [Not with a hyphen; the hyphen has been optional
%         since Middle English.]
%     'nymserver'
%     'Cypherpunk', 'Cypherpunks', 'Cypherpunk remailer'
%     'Onion Routing design', 'onion router' [note capitalization]
%     'SOCKS'
%     Try not to use \cite as a noun.  
%     'Authorizating' sounds great, but it isn't a word.
%     'First, second, third', not 'Firstly, secondly, thirdly'.
%     'circuit', not 'channel'
%     Typography: no space on either side of an em dash---ever.
%     Hyphens are for multi-part words; en dashs imply movement or
%        opposition (The Alice--Bob connection); and em dashes are
%        for punctuation---like that.
%     A relay cell; a control cell; a \emph{create} cell; a
%     \emph{relay truncated} cell.  Never ``a \emph{relay truncated}.''
%
%     'Substitute ``Damn'' every time you're inclined to write ``very;'' your
%     editor will delete it and the writing will be just as it should be.'
%     -- Mark Twain

***********************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/