[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

(FWD) Re: [p2p-hackers] P2P Onion Routing?



----- Forwarded message from Roger Dingledine <arma@MIT.EDU> -----

From: Roger Dingledine <arma@MIT.EDU>
Date: Mon, 1 Apr 2002 17:38:51 -0500
To: p2p-hackers@zgp.org
Subject: Re: [p2p-hackers] P2P Onion Routing?

On Thu, Mar 28, 2002 at 09:29:53AM -0600, Brandon Wiley wrote:
> I think people need to get over this what is and is not P2P thing. ZKS is
> P2P when every user runs a ZKS node. In which case the problem becomes
> that ZKS is not scalable in P2P mode because, I think, it assumes that
> every user knows about all of the mix nodes.

Well, Freedom is trickier because it's aiming to be low-latency and
connection-oriented. In order to get reasonable performance, you need
to be thinking about latencies between nodes. You must choose routes
that achieve good latency while not giving up 'too much' anonymity
(the shortest route is best but using it is probably a bad idea).

To provide good latency, Freedom keeps routing/latency tables and
propagates them like other internet routing protocols. Typical internet
routing protocols aren't designed to handle a non-trivial churn rate
over thousands of nodes.

With high-latency systems like traditional mix-nets, it seems that we can
simplify the design by not paying attention to performance between nodes.

> This brings up one of the hard problems in mixnets which Roger mentioned,
> how do you dish out subsets of the network to various users?

This question has been on my plate for a while. I'll get us started here
and see where it goes.

We have a reputation server (in its trivial form, it just keeps track
of participating nodes; it might also maintain state about performance
and reliability if we like).

Alice should pull down info on the entire set of nodes. Otherwise people
could observe the subsets people download and match Alice's messages to
her pretty well.

This single server is a trust bottleneck -- for instance, it might give
out different answers to different people and then observe which nodes a
message traverses. So we must make several redundant reputation servers;
more about that below.

These attacks are actually subtler than that -- imagine that at midnight
Alice asks for a set of nodes, and then at 12:05 Bob asks for a set
of nodes. If a new node has joined after midnight, and somebody uses
that node, then we know it's not Alice. Similar issues arise with
synchronization between reputation servers: "you used N_16; the only
reputation server that knew about N_16 then was RS_4; so you must be
Alice because she's the only one who asked RS_4 for a node list then."

So it would seem that we need to keep the set of node information
strongly synchronized across reputation servers and with all users.

There are a number of simplications we can make to get away with loose
synchronization instead.

* First of all, don't use new information. Have a time threshold (say,
an hour after a new node is advertised) by which everybody should
know about it, and so people can start using data an hour after it's
posted with relative confidence that most other people will know about
it too. It's all about anonymity sets and getting a lot of users with
indistinguishable profiles. Users who don't follow the guidelines will
lose out on anonymity.

* PIR from the reputation servers will make it possible to pull down
subsets of the nodes without revealing which nodes we've learned. We still
have the above issue of dangerous nodes (nodes which some reputation
servers know about and others don't). PIR will also be computationally
impractical for the (near) future.

* If we assume some sort of bootstrap, people can query for node lists,
subsets, or updates ("all the changes since noon") via the mix-net itself.
Users must take care not to give away identifying information or patterns
(eg by using newly advertised nodes immediately). This is still messy from
a traffic analysis perspective though -- say the adversary is watching all
the reputation servers and learning the subsets of nodes that are being
downloaded. A message going into a given node N_i is likely to be heading
towards some node which was learned at the same time as N_i. If there
are lots of participating nodes and the subsets are relatively small,

And now a word about keeping the reputation servers verifiable and loosely
synchronized -- after all, a rogue reputation server could intentionally
keep some information for itself or give out false reputation reports to
rig the chance that a user will pick a given node. If we name mixes by
their public keys (so they can self-certify), mixes can provide periodic
certificates about their state. Because each mix signs and timestamps
each certificate, a reputation server can at most fail to provide the
newest certificate. The reputation servers can work together to ensure
correct and complete data (perhaps by successively signing certificate
bundles, so users can be sure that a given mix certificate has been
seen by a threshold of reputation servers, and that they're getting
the whole set of certificates in that bundle) and correct behavior
(perhaps by doing random queries through the mix-net). Because the set
of reputation servers is smaller and more static, it's easier to detect
and punt misbehaving servers.

There's plenty more that needs to be worked out. But this should serve
as a starting foundation. What do you think?

--Roger

_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers

----- End forwarded message -----