[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: On the performance scalability of Tor



Thus spake Robert Hogan (robert@xxxxxxxxxxxxxxx):

> On Wednesday 18 July 2007 15:58:17 Steven Murdoch wrote:
> > As always, comments and suggestions, either here on the list or on the
> > blog, are appreciated.

(Excellent observations Steven. I think you're spot-on. I've long
suspected this sort of behavior also. The converse is why I always
argue to make Tor more efficient to get more users. Also, I think
first Tor needs to fix balancing issues before more nodes can even
really support more users to begin with).

> It has always seemed to me that there is plenty of raw 'bandwidth' on the tor 
> network. I've just downloaded the tor tarball at a relatively nippy 17KB/s. 
> Not greased lightning by any means but clearly if it was just bandwidth at 
> issue the general browsing experience would be a lot different.

Well, also, the bandwidth of Tor fetches can vary considerably, and in
a balanced network this shouldn't be the case.  Currently, the middle
of the network is overloaded due to guard bug #440 and exit clipping.
I published a brief study of this in December, though at that point I
did not yet know the cause.

The 35-45% tier nodes are much more unreliable than the upper AND
lower bandwidths. Note that the guard node cutoff is about 50% by
bandwidth.
http://archives.seul.org/or/talk/Dec-2006/msg00123.html

I think this plus the exit issue are one of the reasons why Tor
bandwidth performance is irregular. Another major other reason being path
selection issues: like crossing the atlantic ocean 4 times to retrieve
some document.

> From a layman's point of view, opening a web page with tor seems to involve at 
> least 10 to 15 separate streams, usually over the same circuit. Once the 
> streams are up they are only up for a short time before a new one is created. 
> Just looking at the connection monitor on TorK it seems to me that half the 
> time is spent creating these streams and half the time (often a lot less) 
> actually using them.
> 
> I presume Tor is just reflecting the behaviour of privoxy and the browser 
> here, which is opening up new tcp sessions for numerous different requests to 
> the same destination. I understand that pipelining in firefox and polipo 
> mitigates this somewhat. Or does it? I'm not 100% sure on that score.
> 
> At any rate couldn't/shouldn't Tor take care of this? Why can't Tor maintain 
> and reuse a successfully created stream for all requests to an active 
> destination and let the exit break out the requests into their respective tcp 
> sessions? Put simply, if my understanding is correct Tor is respecting the 
> tcp architecture in the wrong place, at the client (creating new streams for 
> each tcp request/session) rather than the exit (creating new tcp sessions 
> where appropriate from the same stream).

The problem is the round trip time to create the TCP connection. A
client has to tell the exit to create the TCP connections somehow.
RELAY_EXTEND is the way this is done. I believe clients can and do
send multiple RELAY_EXTENDs in a row, so it's not like its a
chattyness issue.. Sending multiple requests in a row is effectively
the same as sending a single request with a bunch of "please connect"
requests stacked in it, from a networking standpoint. You just have to
wait for the requests to cross n oceans and 3 queues on the way out,
and on the way back. So long as you are not waiting for them to be
sent one at a time, you are going about as fast as you can go.

So, pipeling may or may not help this aspect, at least. So long as the
circuit is reused, it probably is not terribly significant as to
whether there is a single RELAY_EXTEND that will set up a pipelined
HTTP request, or a bunch of RELAY_EXTENDs issued in parallel for a
bunch of HTTP requests. What probably matters more is how many
concurrent proxy requests your browser is willing to issue.
Concurrency is main way to mitigate request/response latency..

But, that being said, the one case where pipelining WILL help more
than just multiple HTTP requests is if you have a high latency between
your exit and your TCP destination. High latency makes TCP slow start
a real bottleneck for short-lived connections like HTTP. So if you
exit out of germany to fetch documents in the US, you will spend most
of your time waiting for TCPs congestion windows to grow up to fill
the BDP of their links.  If you use pipelining, one TCP connection and
thus only one TCP slow start is incurred for these fetches. So that
can be a huge bonus for cross-oceanic webpage downloads.


P.S. Yes, I do know way too much about this sort of thing. :)

-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs

Attachment: pgpNxxtJIkUB5.pgp
Description: PGP signature