[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

(FWD) Re: architectural proposal & technical problems



[Forwarding because Johannes isn't subscribed to the list -RD]

----- Forwarded message from owner-or-dev@xxxxxxxxxxxxx -----

Date: Tue, 24 Apr 2007 15:11:55 +0200
From: Johannes Renner <renner@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
To: Mike Perry <mikeperry@xxxxxxxxxx>
Cc: or-dev@xxxxxxxxxxxxx,
	Andriy Panchenko <panchenko@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
	Lexi Pimenidis <lexi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: architectural proposal & technical problems

Mike Perry wrote:
> I doubt that bandwidths between routers
> will change very much from minute to minute, and even if they do,
> currently directory descriptors don't refresh fast enough for it to
> matter.

I think that the available bandwidths between routers may actually
change from minute to minute since it depends on how many streams
a link carries and what amounts of data. I'll investigate further,
but you are right, router descriptors don't refresh fast enough and
we actually didn't plan to use them (see below).

> So I'm a bit confused. Will the routers be publishing bandwidth
> information to the directory via opt flags, or will they be publishing
> it to modified clients via the addon?

No, they should not publish any bw-information to the directory, but
just an optional flag, so that clients can see that this router is
running such an addon and thus can request information directly from
it by connecting to an advertised port, e.g. "opt BandwidthInformer 9053".

(btw: does anybody know technically how to set "opt"-flags in a comfortable way?)

The QoS-addon, running together with an OP, could request a BW-status
document from these BandwidthInformer-addons of supporting routers via
this port. This document could consist of a list of all nodes the router
maintains TLS-connections to, together with their available, max and avg
throughput-values. A single entry could look like this, where this
specific entry would describe information about the link from the
node the information comes from to xyz (node, max, avg):

xyz, 10759.1210487, 4395.78653144

max in here is the maximal throughput the link has seen over a single
interval in some recent period. Of course it could possibly carry
also faster, but at least this speed has already been seen. avg is
the average measured value for the last interval, so the difference
of the both shows up to be the available/currently not used throughput
(this example is all bytes/sec). Because this information is averaged
over some time interval it is not that new, that it could be dangerous
for anonymity, but maybe 'new enough' to get used by clients as routing
metrics.

> How do you intend to do the balancing for bandwidth and latency if
> every client were to be using your scheme?

Latencies would be measured locally from within the clients using a
probabilistic selection of nodes, just like selecting nodes for
regular circuits (maybe restricted to the fast percentile of the
routers). This information will not be made available for the others,
so every client will end up with his/her own pool of currently
considered to be fast nodes/links. That will do the load-balancing,
because if all of the clients were using the same nodes, we would
measure bad performance and simply (probabilistic) choose other nodes.

And it will be the same with bandwidth: everyone who is interested
in, queries different (probabilistic chosen) routers in different
moments. Because the best links a router maintains in t+1 can be
different from those in t, every client in the end explores its
own random corner of the network.

> So, is this whole design a prototype, or is it meant to last?

In the begining it will be a prototype, because we surely not know
what is the best practice. It would be good to work on a prototype,
that is designed in a way, that it could become a real-world used
implementation later on.

> why not just get a fast
> link and measure bandwidths AND latencies of two-hop paths
> client-side, and store them yourself to run tests? Speedracer can
> measure 2 hop bandwidths pretty well, and while it is a hack, you can
> do latency measurements via socks and localhost as you mentioned.

The idea is to put as little additional load on the network as possible;
therefore we do not plan to transfer extra files in order to measure the
bandwidth (why not take that info from ORs since it is already available
there?). Otherwise we think that latency measurements will not put too much
load on the network - these are single ping cells.

But we do not understand exactly what you mean with two-hop paths in here.
Why exactly 2? Do you mean to use the EXTEND command to build a two-hop
circuit and use this for measuring? This should be also possible for
one-hop circuits, right?

Well, what we actually would like to have is the possibility to construct
only one three-hop circuit and measure partial circuits regularly (lets say
you have circ through 0-1-2-3 [where 0 is us], we are interested in doing
measurings for 0-1, 0-1-2 and 0-1-2-3).
I also like the PathlenCoinWeight proposal. This could also be useful for
measuring latencies, but the best would be just to be able to exit an existing
3-hop-circuit at any chosen hop in between. So we could measure RTTs of partial
circuits and easily compute values for single used links.
Any of the devels can help us further?

> The other thing to consider is that this information is likely to get
> pretty large if every node participates. You should spend at least
> some time considering doing some form of eigenvector compression (SVD:
> http://en.wikipedia.org/wiki/Singular_value_decomposition, PCA:
> http://en.wikipedia.org/wiki/Principal_components_analysis), and how
> much this compresses the data and at what cost to accuracy and ability
> to detect liars.

True, if you consider a central repository for the data. In this v0
proposal we thought about making only local information available for
the others (see above).

> 1. Do we trust individual nodes to publish their peer latencies and
>    bandwidths?

I think, that we of course cannot trust them individual nodes.
At first we want to check whether the published information reflects the
reality and if it can be useful for clients at all. And then I think this
would be the point where we would need something like reputation?

> 2. If we do not trust individual nodes, how do we deal with the fact
>    that it will soon be very hard to collect these n^2 measurments from
>    the perspective of a central authority?
>    A. Can we divide them into tiers?
>    B. Or perhaps just truncate at some %age of the network where we just assign a constant value to the
>       bandwidth of all peers (perhaps node_capacity/num_peers). But what
>       about latency?
>    C. Or do we do all these measurements on the fly and keep long-term client
>       state around, so Tor gets faster the more you use it. This could
>       work really well for both bandwidth and latency in the 2-hop
>       "Censorship Resistance Only" mode I proposed in Proposal 112.
>       But this is particularly dangerous to anonymity, since clients
>       are likely to learn highly unique views of the network, and is
>       potentially vulnerable to gaming/selective service attacks.

We first want to test and try out and check what information exactly
helps best to improve the performance.
Then, if we know what will be useful we can think more about hiding
stuff for saving as much anonymity as possible.
These are all very interesting, challenging and important questions,
that have to addressed, but for the moment I think 2C sounds good:
The long-term client that makes Tor faster the more you use it.

So for our purposes we would be pleased with a Tor/Tor control protocol
that is extended to the following capabilities:

- Two new configuration options that can be set via SETCONF:

  1. "opt"-entry for descriptor, example: SETCONF OptDescEntry="BandwidthInformer 9053"
  2. option to tell Tor, that it should not construct any circuits by itself, similar
     to __LeaveStreamsUnattached, example: SETCONF __IdleCircuits=0

- One new command that connects over a given circuit to destIP, destPort specifying
  0, 1 or 2 to tell Tor which node of the circuit we want to address for exiting, e.g.:

  "CONNECT" SP circID SP ("0"|"1"|"2") SP destIP SP destPort

This would enable us to implement my favorite method of latency-testing,
but surely needs modifications in Tor and the control protocol. So how
do you do such modifications? Do we have to implement a patch for Tor as
kind of a proposal? What other extensions do you want the control
protocol to implement for your projects? Maybe we can develop it in a way
everybody will like it?

Greetings,
Hannes

----- End forwarded message -----