[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: A cleaner MorphMix / A P2P Tor
- To: or-dev@freehaven.net
- Subject: Re: A cleaner MorphMix / A P2P Tor
- From: Marc Rennhard <rennhard@tik.ee.ethz.ch>
- Date: Mon, 26 Jul 2004 10:50:16 +0200
- Delivered-to: archiver@seul.org
- Delivered-to: or-dev-outgoing@seul.org
- Delivered-to: or-dev@seul.org
- Delivery-date: Mon, 26 Jul 2004 04:50:55 -0400
- In-reply-to: <4104AEB8.5010003@tik.ee.ethz.ch>
- References: <4104AEB8.5010003@tik.ee.ethz.ch>
- Reply-to: or-dev@freehaven.net
- Sender: owner-or-dev@freehaven.net
- User-agent: Mozilla Thunderbird 0.7.2 (Windows/20040707)
Morphmix assumptions:
(Marc, correct me if these are wrong or misleading)
(1) We're willing to make multiple new TLS connections at each hop when
building the circuit -- we do that for witnesses.
(2) In order to have much security against collusion, we aim to have
a database of thousands of nodes.
(3) Alice will tolerate having the adversary link her to some of her Bobs,
as long as it doesn't happen "too often". (I'd like to do away with this
assumption; that's part of what this mail is about.)
(4) Alice has plenty of extra time (relative to her actual traffic)
to build circuits just to test the waters.
(5) Given a node, we are capable of picking a random node out of the
nodes we know that probably isn't colluding with that node. That's why
we pick the witness for extending.
Correct.
My problems with Morphmix:
I worry that Morphmix's collusion detection system (aka reputation system)
isn't up to the task of detecting a smart adversary. Specifically,
an adversary attacking Alice should act normally until Alice connects
to one of his nodes; then he can show her only colluding nodes and win
that circuit. Morphmix has one last defense against that -- "but maybe
that was just a test circuit" -- but I'd like to do better.
True. The adversary wins for a couple of circuits until he'll be
detected by Alice. From the adversary's point of view, it's like he has
a few credits and can spend them over a time period (# of circuits). He
can spend them "as he wishes", evenly distributed over time or "more
aggressively" during a shorter time.
Also, Morphmix doesn't do much to consider incentives against freeloading,
or selective service by adversarial nodes.
The only incentive is that relaying gives better protection because it
somewhat confuses a first intermediate malicious node. But that's rather
weak.
So here's a different design, with the beginnings of some analysis.
Each node keeps a list of other nodes it knows about. It periodically
asks nodes for other node lists, so it can keep roughly up to date. When
it wants to open a circuit, it picks a random set of nodes (if you like,
it can require that the nodes each have a different IP prefix or AS). It
asks each node to establish a TLS connection right now (if necessary)
to the next node.
Call a node a relay if it's willing to relay traffic for others, and a
non-relay if it's freeloading.
Assume there are n nodes in the system, that all nodes are relays, that
each node wants to hold three circuits open, and that circuits are length
5. So that's 15 circuits for each node. The expected number of links
used will remain O(n) even as the total number of links grows as n^2. At
small n, everybody just keeps track of everybody and it's like the current
Tor except without the directory servers (and slower circuit-building).
Now the way to attack the system by sheer numbers is to get a bunch
of nodes in different IP prefixes, and tell everybody about them. If
necessary, there are some additional defenses that could be adapted to
our needs. For example, http://dbpubs.stanford.edu:8090/pub/2003-51
describes a neat self-correction mechanism to prevent somebody from
advertising his nodes much more than average.
Now we don't have to deal with the morphmix collusion detection algorithms
(which probably don't work often enough to protect Alice enough from a
smart adversary anyway), and we can scale a lot better than Tor. Plus
we have the added benefit of deniability for circuit initiators (before,
if the circuit came from an OP, it was clear he was the originator).
In principal, I agree, but there are two things that come to my mind
right now: (1) path setup may often fail because one of the five nodes
may have left. Not crucial because the initiator just tries again. (2)
Local optimisation as in MorphMix is not possible, because the initiator
does not know the state of the nodes along the tunnel. This is not easy
to overcome (unless we have a lookup mechanism that keeps track of
current node states, which is hard with a large number of nodes).
Now assume not all the nodes are relays -- if only sqrt(n) of them are
(that is, there are sqrt(n) non-relays for every 1 relay), then every
non-relay is connected to 3-ish nodes, and every relay is connected to
O(sqrt(n)) nodes -- the sqrt(n) other relays, plus its portion of the
non-relays. Still quite manageable (and based on the Econymics paper,
relays *want* some non-relays, so they can aggregate more traffic). But
if there are too many non-relays, the remaining relays will have to talk
to an increasing number of non-relays each, which would get ugly.
Yes. same problem with MorphMix here. If 95% are free riders, the system
won't work well. Unless the 5% are really good bandwidth nodes, but
essentially it would render the P2P system to a Tor-like system.
How do the users know who's a relay? They build circuits through
everybody they hear about: some of them tend to work, and others
tend not to. Assuming *most* nodes either pretty-consistently relay or
pretty-consistently don't relay, then Alice can quickly find nodes which
will work, and she will be capable of building circuits that work.
Agreed. Are malicious nodes likely to advertise IP addresses of nodes
that are non-relays (or non-nodes)? Like inserting bad mp3 files into
file-sharing systems? Could that significantly disrupt the service? What
if 90% of the nodes I know of are not usable to be used as realys?
Hardly any path-setup will be successful. Of course I cannot test the
nodes for availability before I use them for obvious reasons. Will such
a system consist of 99% path setup traffic? MorphMix has advantages
here, because every nodes just handles its local environment and
non-nodes can easily be excluded.
More on the reputation system below; but first let's consider our
adversary. If he runs only non-relays, then he's going to have a heck
of a time getting Alice to use him for her first and last hop. So we're
providing an *incentive* for the adversary to run relays, else he won't
be able to attack Alice with it. Great.
Indded. For an adversary in anonymity-systems, there are probably only
two reasonable ways to play: offer great service to be in as many paths
and collect as much information as possible, or drop any traffic to
disrupt the service (if this is possible withour beeing detected).
Therefore the fraction of total peers the adversary owns is not relevant
for this attack: only his fraction of relays matters here. Specifically,
we've changed his chance of success from c^2/n^2 where c/n is the fraction
of Tor servers he owns, to c'^2/n'^2 where c'/n' is the fraction he owns
of the nodes that Alice thinks are relays.
(Overall fraction is still relevant in terms of how much influence he
has on Alice's list of known peers. Perhaps Alice should preferentially
believe the lists-of-peers from relays. Or perhaps precisely because
non-relays are not juicy targets, she should believe non-relays too --
so an adversary with a limited number of nodes cannot easily be both.)
Now consider a selective relay: a node that chooses whether to relay
based on some decision function. The simplest selective relay is a peer
that's not in the system all the time. So Alice's reputation system (to
keep it simple) should always make decisions based on recent behavior:
generally speaking, a node that was a relay just a moment ago will still
be a relay now.
A more complex selective relay might try to convince only Alice that it's
a relay. The first benefit to this is that it doesn't have to handle
non-Alice traffic, but the real benefit is that people making circuits
through it are more likely to be Alice (the attack here is that the
adversary only has to own the last hop Alice chooses -- he learns about
Bob and guesses that Alice was the initiator). Fortunately, if Alice does
her performance tests via other nodes also, building circuits as usual,
the node can't target Alice without performing for all users. And once
it's performing for everybody, it's in a position to attack everybody;
thus if the system has enough nodes, it's not appreciably easier for the
adversary to target specifically Alice than to target everybody. (There's
still the predecessor issue, where a given node making a request is more
likely to be the initiator...but so it goes.)
Another issue here is, as you mentioned above, that nodes ask other
nodes about nodes, so Alice - having learned that this malicious node
targeting her is a good one - would tell others about this cool node.
The problem for the malicious node is that he can hardly control which
nodes know about the malicious one. This is an attack I analysed in my
thesis which in theory is possible, but very hard to carry out in practice.
Another fine attack is for the selective relay to allow some incoming
circuits, but claim that certain other nodes are down. Certainly if
extending the circuit fails, Alice should give up on that circuit rather
than try to extend elsewhere (which would allow that node to dictate
her next node). But the more subtle issue is that the adversary can
influence the set of nodes that Alice thinks allow relaying. Assuming a
small fraction of adversary-owned nodes, we want a reputation system where
the adversary can't confuse Alice about too many honest relays, and can't
confuse Alice about too many dishonest non-relays. That's a hard problem
in the general case, but we may be able to get a workable simplified
answer. Part of the solution probably involves Alice doing some of her
queries directly-but-plausibly-for-others, because she knows who failed
in that case. Another part might be a witness mechanism a la mix-acc or
morphmix, if we can assume sometimes-relay-but-sometimes-suspiciously-not
is rare.
Maybe this is too easy but assuming a node n_1 claims the next hop n_2
is not reachable, Alice simply tries to contact n_2 directly (e.g.
picking n_2 as the first hop in a dummy circuit of length 1 that will be
torn down right away). If n_2 is indeed down, n_1 "wins" and gets a
better reputation from Alice's point of view and n_2 "loses" or vice
versa. Alice does not give away much information since the circuit won't
be used anyway although, if n_2 was indeed reachable and n_1 cheated,
Alice maybe shouldn't use n_2 in her circuits for a while. Is this too
simple?
As usual, sadly, the reputation system is risky: the adversary can
manipulate Alice's behavior by feeding her information, and moreover
he can *recognize* Alice based on the behavior he induced. We must
limit the number of cases where he can make his target Alice different
from the other peers. However, even if we have the Morphmix adversary
(targetting Alice but not allowed to be very subtle about it), we're
still doing better, because at each step we choose from all the nodes we
know about, rather than the set offered by step i-1. (You should argue
with me about this.)
An adversary who can observe Alice can save some effort by accepting
relayed connections only while Alice is making circuits. An active
adversary can also blow away outgoing circuits that don't go to the right
node, thus forcing Alice onto circuits he owns if she actually wants to
communicate. That's a problem for Tor and Morphmix also, though.
A very hard problem. One crucial property of any such systems must be
that no set of nodes can get much more than it's fair share to be picked
as first hop. Maybe having any kond of reputation scheme is a bad idea
if we assume acitive attackers? The reputation scheme will always help
such an attacker: he simply disrupt the service offered by honest nodes
that are willing to relay to esclude them as relays. But again, a
10%-of-the-Internet active attacker can on average exclude 10% of the
honest relays (OK, probably its the 1 - 0.9^2 case again, so its more
like 20%) which is not terrible. Which brings us back to the discussion
of a reasonable threat model...
And finally, it's not just a matter of whether the node accepts
connections. It's also a matter of providing reasonable-quality
service for those connections. Time for the requisite p2p totally
unanalyzable simple idea with complex emergent properties: we could give
preferential treatment to relays. You give priority to cells from nodes
that you believe are relays, or in the more complex variant, to relays
that have given you fast service recently (it would seem you need to
consider performance rather than just relay-ness, because if you incent
everybody to be a relay but nobody to perform well, then you're simply
making it hard to find a working relay). Just as Bittorrent's tit-for-tat
strategy provides better service for people who are providing you better
service, we can reward relays by passing their cells before others. Now,
there are a whole host of reputation-based anonymity attacks here (if
somebody's getting quick routing through node x, it means node x thinks
they're a relay, which means node x recently used them for something;
I can offer you fast service if I want you to go to me for fast service;
if I can make the other relays look crummy then I'll look even better;
etc). But on the other hand, incenting nodes to be relays is critical,
both so the network doesn't collapse under its own weight, and because
our security parameter above is based on the fraction of relays that
the adversary owns.
I'm pretty sure that in a practical P2P system, the only way to get more
than 10% or so non-free riders is to include a mechanism that gives
better service to the relays and not so much better anonymity. Maybe I'm
pessimistic, but I don't believe that 50% of all nodes will relay
traffic to offer "privacy for the world". Relaying means giving away
parts of your bandwidth (CPU is probably neglectable) and this it what
counts. Offering better service to relays should compensate for giving
away bandwidth in first place. Why not using better anonymity as an
incetive? Beacause unlike Tor, a node in a P2P system usually gets
relatively little traffic to handle, the traffic is much less "dense"
than in Tor. A single node has only little traffic to blend in the own
traffic, which - from a traffic analysis point of view - is porbably
neglectible. Unless we employ other measures (e.g. cover traffic... not
that I believe in its practical value) does simply not buy you much
anonymity in a P2P system. Or am I wrong?
--Roger
Cheers,
Marc
--
Dr. Marc Rennhard Swiss Federal Institute of Technology
Office: ETZ G61.1 Computer Engineering and Networks Lab (TIK)
Fon: +41 1 632-7005 ETH Zentrum / Gloriastrasse 35 / CH-8092 Zurich
Fax: +41 1 632-1035 PGP-KeyID: 84AEB193, PGP encrypted mail welcome