[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[or-cvs] Various changes. Some more references. Section on enclaves ...



Update of /home/or/cvsroot/tor/doc/design-paper
In directory moria.mit.edu:/tmp/cvs-serv27508/tor/doc/design-paper

Modified Files:
	challenges.tex tor-design.bib 
Log Message:
Various changes. Some more references. Section on enclaves and path length.


Index: challenges.tex
===================================================================
RCS file: /home/or/cvsroot/tor/doc/design-paper/challenges.tex,v
retrieving revision 1.30
retrieving revision 1.31
diff -u -d -r1.30 -r1.31
--- challenges.tex	1 Feb 2005 11:39:54 -0000	1.30
+++ challenges.tex	1 Feb 2005 22:48:10 -0000	1.31
@@ -103,7 +103,7 @@
 help in addressing these issues. Section~\ref{sec:what-is-tor} gives an
 overview of the Tor
 design and ours goals. Sections~\ref{sec:crossroads-policy}
-and~\ref{sec:crossroads-technical} go on to describe the practical challenges,
+and~\ref{sec:crossroads-design} go on to describe the practical challenges,
 both policy and technical respectively, that stand in the way of moving
 from a practical useful network to a practical useful anonymous network.
 
@@ -155,7 +155,7 @@
 additional application-level scrubbing proxies, such as
 Privoxy~\cite{privoxy} for HTTP.  Furthermore, Tor does not permit arbitrary
 IP packets; it only anonymizes TCP and DNS, and only supports connections via
-SOCKS (see Section \ref{subsec:tcp-vs-ip}).
+SOCKS (see Section~\ref{subsec:tcp-vs-ip}).
 
 Tor differs from other deployed systems for traffic analysis resistance
 in its security and flexibility.  Mix networks such as
@@ -207,7 +207,7 @@
 open proxies around the Internet~\cite{open-proxies}, can provide good
 performance and some security against a weaker attacker. Dresden's Java
 Anon Proxy~\cite{web-mix} provides similar functionality to Tor but only
-handles web browsing rather than arbitrary TCP. Also, JAP's network
+handles web browsing rather than arbitrary TCP\@. Also, JAP's network
 topology uses cascades (fixed routes through the network); since without
 end-to-end padding it is just as vulnerable as Tor to end-to-end timing
 attacks, its dispersal properties are therefore worse than Tor's.
@@ -244,9 +244,12 @@
 communication partners.  Defeating this attack would seem to require
 introducing a prohibitive degree of traffic padding between the user and the
 network, or introducing an unacceptable degree of latency (but see
-Section \ref{subsec:mid-latency}).  Thus, Tor only
-attempts to defend against external observers who cannot observe both sides of a
-user's connection.
+Section \ref{subsec:mid-latency}). 
+And, it is not clear that padding works at all if we assume a
+minimally active adversary that merely modifies the timing of packets
+to or from the user. Thus, Tor only attempts to defend against
+external observers who cannot observe both sides of a user's
+connection.
 
 Against internal attackers, who sign up Tor servers, the situation is more
 complicated.  In the simplest case, if an adversary has compromised $c$ of
@@ -279,14 +282,29 @@
 % not? -nm
 % Sure. In fact, better off, since they seem to scale more easily. -rd
 
-in practice tor's threat model is based entirely on the goal of dispersal
-and diversity. george and steven describe an attack \cite{attack-tor-oak05} that
-lets them determine the nodes used in a circuit; yet they can't identify
-alice or bob through this attack. so it's really just the endpoints that
-remain secure. and the enclave model seems particularly threatened by
-this, since this attack lets us identify endpoints when they're servers.
-see \ref{subsec:helper-nodes} for discussion of some ways to address this
-issue.
+In practice Tor's threat model is based entirely on the goal of
+dispersal and diversity. Murdoch and Danezis describe an attack
+\cite{attack-tor-oak05} that lets an attacker determine the nodes used
+in a circuit; yet s/he cannot identify the initiator or responder,
+e.g., client or web server, through this attack. So the endpoints
+remain secure, which is the goal. On the other hand we can imagine an
+adversary that could attack or set up observation of all connections
+to an arbitrary Tor node in only a few minutes.  If such an adversary
+were to exist, s/he could use this probing to remotely identify a node
+for further attack.  Also, the enclave model seems particularly
+threatened by this attack, since it identifies endpoints when they're
+also nodes in the Tor network: see Section~\ref{subsec:helper-nodes}
+for discussion of some ways to address this issue.
+
+[*****Suppose an adversary with active access to the responder traffic
+wants to keep a circuit alive long enough to attack an identified
+node. Could s/he do this without the overt cooperation of the client
+proxy? More immediately, someone could identify nodes in this way and
+if in their jurisdiction, immediately get a subpoena (if they even
+need one) and tell the node operator(s) that she must retain all the
+active circuit data she now has at that moment.  That \emph{can} be
+done in real time.********** We should say something about this
+here or later in the paper -pfs]
 
 see \ref{subsec:routing-zones} for discussion of larger
 adversaries and our dispersal goals.
@@ -308,7 +326,7 @@
 attacks because they came from the same IP space. These engineers wanted
 to use Tor to hide their tracks. First, from a technical standpoint,
 Tor does not support the variety of IP packets one would like to use in
-such attacks (see Section \ref{subsec:ip-vs-tcp}). But aside from this,
+such attacks (see Section~\ref{subsec:tcp-vs-ip}). But aside from this,
 we also decided that it would probably be poor precedent to encourage
 such use---even legal use that improves national security---and managed
 to dissuade them.
@@ -383,8 +401,9 @@
 Another factor impacting the network's security is its reputability:
 the perception of its social value based on its current user base. If I'm
 the only user who has ever downloaded the software, it might be socially
-accepted, but I'm not getting much anonymity. Add a thousand Communists,
-and I'm anonymous, but everyone thinks I'm a Commie. Add a thousand
+accepted, but I'm not getting much anonymity. Add a thousand animal rights
+activists, and I'm anonymous, but everyone thinks I'm a bambi lover (or
+NRA member if you prefer a contrasting example). Add a thousand
 random citizens (cancer survivors, privacy enthusiasts, and so on)
 and now I'm harder to profile.
 
@@ -400,8 +419,9 @@
 While people therefore have an incentive for the network to be used for
 ``more reputable'' activities than their own, there are still tradeoffs
 involved when it comes to anonymity. To follow the above example, a
-network used entirely by cancer survivors might welcome some Communists
-onto the network, though of course they'd prefer a wider variety of users.
+network used entirely by cancer survivors might welcome some animal rights
+activists onto the network, though of course they'd prefer a wider
+variety of users.
 
 Reputability becomes even more tricky in the case of privacy networks,
 since the good uses of the network (such as publishing by journalists in
@@ -466,12 +486,13 @@
 their servers it would seem that they should be allowed to.  But, a
 possible major problem with the blocking of Tor is that it's not just
 the decision of the individual server administrator whose deciding if
-he wants to post to wikipedia from his Tor node address or allow
-people to read wikipedia anonymously through his Tor node. If e.g.,
+he wants to post to Wikipedia from his Tor node address or allow
+people to read Wikipedia anonymously through his Tor node. (Wikipedia
+has blocked all posting from all Tor nodes based in IP address.) If e.g.,
 s/he comes through a campus or corporate NAT, then the decision must
 be to have the entire population behind it able to have a Tor exit
-node or write access to wikipedia. This is a loss for both of us (Tor
-and wikipedia). We don't want to compete for (or divvy up) the NAT
+node or to have write access to Wikipedia. This is a loss for both of us (Tor
+and Wikipedia). We don't want to compete for (or divvy up) the NAT
 protected entities of the world.
 
 (A related problem is that many IP blacklists are not terribly fine-grained.
@@ -480,9 +501,11 @@
 though this information is readily available.  One IP blacklist even bans
 every class C network that contains a Tor server, and recommends banning SMTP
 from these networks even though Tor does not allow SMTP at all.)
+[****Since this is stupid and we oppose it, shouldn't we name names here -pfs]
+
 
 Problems of abuse occur mainly with services such as IRC networks and
-Wikipedia, which rely on IP-blocking to ban abusive users.  While at first
+Wikipedia, which rely on IP blocking to ban abusive users.  While at first
 blush this practice might seem to depend on the anachronistic assumption that
 each IP is an identifier for a single user, it is actually more reasonable in
 practice: it assumes that non-proxy IPs are a costly resource, and that an
@@ -501,7 +524,7 @@
 identities need to impose a significant switching cost in resources or human
 time.
 
-Once approach, similar to that taken by Freedom, would be to bootstrap some
+One approach, similar to that taken by Freedom, would be to bootstrap some
 non-anonymous costly identification mechanism to allow access to a
 blind-signature pseudonym protocol.  This would effectively create costly
 pseudonyms, which services could require in order to allow anonymous access.
@@ -514,16 +537,22 @@
   We could use IP addresses, but that's the problem, isn't it?
 \item Managing single sign-on services is not considered a well-solved
   problem in practice.  If Microsoft can't get universal acceptance for
-  passport, why do we think that a Tor-specific solution would do any good?
+  Passport, why do we think that a Tor-specific solution would do any good?
 \item Even if we came up with a perfect authentication system for our needs,
   there's no guarantee that any service would actually start using it.  It
   would require a nonzero effort for them to support it, and it might just
   be less hassle for them to block tor anyway.
 \end{tightlist}
 
-Squishy IP based ``authentication'' and ``authorization'' is a reality
-we must contend with. We should say something more about the analogy
-with SSNs.
+The use of squishy IP-based ``authentication'' and ``authorization''
+has not broken down even to the level that SSNs used for these
+purposes have in commercial and public record contexts. Externalities
+and misplaced incentives cause a continued focus on fighting identity
+theft by protecting SSNs rather than developing better authentication
+and incentive schemes \cite{price-privacy}. Similarly we can expect a
+continued use of identification by IP number as long as there is no
+workable alternative.
+
 
 
 
@@ -557,6 +586,7 @@
 \label{sec:crossroads-design}
 
 \subsection{Transporting the stream vs transporting the packets}
+\label{subsec:stream-vs-packet}
 \label{subsec:tcp-vs-ip}
 
 We periodically run into ex ZKS employees who tell us that the process of
@@ -603,7 +633,7 @@
 which nodes will allow which packets to exit.
 \item \emph{The Tor-internal name spaces would need to be redesigned.} We
 support hidden service {\tt{.onion}} addresses, and other special addresses
-like {\tt{.exit}} (see Section \ref{subsec:}), by intercepting the addresses
+like {\tt{.exit}} (see Section~\ref{subsec:}), by intercepting the addresses
 when they are passed to the Tor client.
 \end{enumerate}
 
@@ -653,7 +683,8 @@
 Section~\ref{subsec:tcp-vs-ip}). In other words, there would
 probably be no direct attempt to synchronize on batches of data
 entering the Tor network at the same time. Rather, it is the link
-level batching that will add noise to the traffic patterns exiting the
+level batching that will add noise to the traffic patterns entering
+and passing through the
 network.  Similarly, if end-to-end traffic confirmation is the
 concern, there is little point in mixing. It might also be feasible to
 pad chunks to uniform size as is done now for cells; if this is link
@@ -667,19 +698,31 @@
 
 The distinction between traffic confirmation and traffic analysis is
 not as practically cut and dried as we might wish. In \cite{hintz-pet02} it was
-shown that if latencies to and/or data volumes of various popular
+shown that if data volumes of various popular
 responder destinations are catalogued, it may not be necessary to
 observe both ends of a stream to confirm a source-destination link.
-These are likely to entail high variability and massive storage since
+This should be fairly effective without simultaneously observing both
+ends of the connection. However, it is still essentially confirming
+suspected communicants where the responder suspects are ``stored'' rather
+than observed at the same time as the client.
+Similarly latencies of going through various routes can be
+catalogued~\cite{back01} to connect endpoints.
+This is likely to entail high variability and massive storage since
 % XXX hintz-pet02 just looked at data volumes of the sites. this
 % doesn't require much variability or storage. I think it works
 % quite well actually. Also, \cite{kesdogan:pet2002} takes the
 % attack another level further, to narrow down where you could be
 % based on an intersection attack on subpages in a website. -RD
+%
+% I was trying to be terse and simultaneously referring to both the
+% Hintz stuff and the Back et al. stuff from Info Hiding 01. I've
+% separated the two and added the references. -PFS
 routes through the network to each site will be random even if they
-have relatively unique latency or volume characteristics. So these do
-not seem an immediate practical threat. Further along similar lines, in
-\cite{attack-tor-oak05}, it was shown that an outside attacker can
+have relatively unique latency characteristics. So the do
+not seem an immediate practical threat. Further along similar lines,
+the same paper suggested a ``clogging attack''. A version of this
+was demonstrated to be practical in
+\cite{attack-tor-oak05}. There it was shown that an outside attacker can
 trace a stream through the Tor network while a stream is still active
 simply by observing the latency of his own traffic sent through
 various Tor nodes. These attacks are especially significant since they
@@ -704,7 +747,9 @@
 record of destinations and/or data visited by Tor users.  While
 limited to network insiders, given the need for wide distribution
 they could serve as useful data to an attacker deciding which locations
-to target for confirmation.
+to target for confirmation. A way to counter this distribution
+threat might be to only cache at certain semitrusted helper nodes.
+
 
 [nick will work on this]
 
@@ -728,13 +773,58 @@
 
 [nick will work on this section, unless arma gets there first]
 
-\subsection{Anonymity benefits for running a server}
+\subsection{Running a Tor server, path length, and helper nodes}
 
-Does running a server help you or harm you? George's Oakland attack.
+It has been thought for some time that the best anonymity protection
+comes from running your own onion router~\cite{or-pet00,tor-design}.
+(In fact, in Onion Routing's first design, this was the only option
+possible~\cite{or-ih96}.) The first design also had a fixed path
+length of five nodes. Middle Onion Routing involved much analysis
+(mostly unpublished) of route selection algorithms and path length
+algorithms to combine efficiency with unpredictability in routes.
+Since, unlike Crowds, nodes in a route cannot all know the ultimate
+destination of an application connection, it was generally not
+considered significant if a node could determine via latency that it
+was second in the route. But if one followed Tor's three node default
+path length, an enclave-to-enclave communication (in which two of the
+ORs were at each enclave) would be completely compromised by the
+middle node. Thus for enclave-to-enclave communication, four is the fewest
+number of nodes that preserves the $\frac{c^2}{n^2}$ degree of protection
+in any setting.
 
-Plausible deniability -- without even running your traffic through Tor!
-But nobody knows about Tor, and the legal situation is fuzzy, so this
-isn't very true really.
+The Murdoch-Danezis attack, however, shows that simply adding to the
+path length may not protect usage of an enclave protecting OR\@.  A
+hostile web server can determine all of the nodes in a three node Tor
+path. The attack only identifies that a node is on the route, not
+where. For example, if all of the nodes on the route were enclave
+nodes, the attack would not identify which of the two not directly
+visible to the attacker was the source.  Thus, there remains an
+element of plausible deniability that is preserved for enclave nodes.
+However, Tor has always sought to be stronger than plausible
+deniability. Our assumption is that users of the network are concerned
+about being identified by an adversary, not with being proven guilty
+beyond any reasonable doubt. Still it is something, and may be desired
+in some settings.
+
+It is reasonable to think that this attack can be easily extended to
+longer paths should those be used; nonetheless there may be some
+advantage to random path length. If the number of nodes is unknown,
+then the adversary would need to send streams to all the nodes in the
+network and analyze the resulting latency from them to be reasonably
+certain that it has not missed the first node in the circuit. Also,
+the attack does not identify the order of nodes in a route, so the
+longer the route, the greater the uncertainty about which node might
+be first. It may be possible to extend the attack to learn the route
+node order, but it is not clear that this is practically feasible.
+
+Another way to reduce the threats to both enclaves and simple Tor
+clients is to have helper nodes. Helper nodes were introduced
+in~\cite{wright03} as a suggested means of protecting the identity
+of the initiator of a communication in various anonymity protocols.
+The idea is to use a single trusted node as the first one you go to,
+that way an attacker cannot ever attack the first nodes you connect
+to and do some form of intersection attack. This will not affect the
+Danezis-Murdoch attack at all.
 
 We have to pick the path length so adversary can't distinguish client from
 server (how many hops is good?).
@@ -746,6 +836,7 @@
 [arma will write this section]
 
 \subsection{Helper nodes}
+\label{subsec:helper-nodes}
 
 When does fixing your entry or exit node help you?
 Helper nodes in the literature don't deal with churn, and

Index: tor-design.bib
===================================================================
RCS file: /home/or/cvsroot/tor/doc/design-paper/tor-design.bib,v
retrieving revision 1.8
retrieving revision 1.9
diff -u -d -r1.8 -r1.9
--- tor-design.bib	1 Feb 2005 10:31:14 -0000	1.8
+++ tor-design.bib	1 Feb 2005 22:48:10 -0000	1.9
@@ -263,6 +263,19 @@
   year = 2002,
 }
 
+
+@InCollection{price-privacy,
+  author =	 {Paul Syverson and Adam Shostack},
+  editor =	 {L. Jean Camp and Stephen Lewis},
+  title = 	 {What Price Privacy? (and why identity theft is about neither identity nor theft)},
+  booktitle =	 {Economics of Information Security},
+  chapter = 	 10,
+  publisher = 	 {Kluwer},
+  year = 	 2004,
+  pages =	 {129--142}
+}
+
+
 @InProceedings{trickle02,
   author =       {Andrei Serjantov and Roger Dingledine and Paul Syverson},
   title =        {From a Trickle to a Flood: Active Attacks on Several