[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Effect of Tor window size on performance



On Mon, Feb 16, 2009 at 12:26:32PM +0100, Csaba Kiraly wrote:
> >Hey, that's a really good point. We don't have to change much code at
> >all if we want to use a *smaller* package window than we are allowed. We
> >simply pretend that the package window started out smaller, and either
> >side can do that independently!
> >
> >Do you have a patch in mind for this? It would seem that if we
> >change init_circuit_base() so it sets circ->package_window to some
> >lower number x, and change connection_exit_begin_conn() so it sets
> >nstream->package_window to some even lower number y, that should be it.
> >The client side will send sendme's like normal, and the only difference
> >is that no more than y cells will ever be 'in flight' for a given stream.
> >  
> Its not even that "complicated", you already have the defines in or.h :-)
> #define CIRCWINDOW_START 1000
> #define CIRCWINDOW_INCREMENT 100
> #define STREAMWINDOW_START 500
> #define STREAMWINDOW_INCREMENT 50
> 
> In our test code we've also changed some other things to have a better 
> control of the test environment, but the modification needed is only to 
> change these.

I think just changing those defines won't work. The problem is that we
set both the package_window and the deliver_window to be that number.

And if we set the deliver_window to be 100, and the other side hasn't
applied the patch (so they set their package_window to be 1000), we will
start hanging up on their circuits because they're breaking the protocol.

Your patch should work in a Tor network consisting only of patched Tors.
But it shouldn't work in a hybrid network.

Unless I'm wrong. :)

> >1) What values of x and y are best? Presumably as we reduce them from
> >1000 and 500, the performance gets better, but at some point they become
> >so low that performance gets worse (because each round-trip through the
> >network takes extra time). As sample numbers, if we start x at 100 and
> >y at 50, then we need another round-trip for any stream that delivers
> >more than 24900 bytes, and/or for every 49800 bytes on the circuit.
> >Should our choices be influenced by the 'typical' Tor stream that the
> >network is seeing right now? (Not that we have those numbers, but maybe
> >we should get them.) What other factors are there?
> >  
> Note that changing y would also change our overhead. Currently with 100 
> it is 1% (1 send_me sent back for every 100 cells received).

I should clarify that if we don't want to change existing clients much,
we are limited in changes we can make.

Specifically, we can't choose a circuit package window over 1000 or
under 100, and the circuit sendme value must remain 100.

Similarly, we can't choose a stream package window over 500 or under 50,
and the stream sendme value must remain 50.

> Another thing to consider is the x/y ratio. Currently it is 10. Lets see 

No, currently the ratio is 1000/500 = 2. I think we're confusing our
variables below. :)

> some considerations:
> - 2 seems a bit low: Say A and B are the two ends of the circuit and 
> traffic flows mainly from A to B. Assuming that one-way delays are equal 
> in both directions (this is reasonable since we have asymmetry only in 
> the A-B traffic, but overlay links of the same path are used by other 
> circuits in the reverse direction as well), the send_me cell is sent 
> from B when half of the cells arrives. The send_me arrives back to A 
> when all the cells have arrived to B. At this point A can start pumping 
> in new cells, which might be a bit late.
> - 10 might be a bit large, I don't think we have such asymmetries in the 
> network, between OR nodes that justify such a high factor.

My current understanding of what happens is that most of the cells
in the window arrive at about the same time. That is, the exit relay
packages all 500 cells that are allowed in the stream window, and pushes
them down the circuit, and then goes silent. They arrive to the client
at about the same time, so the client sends 10 sendme cells back. The
sendme cells arrive at the exit relay at about the same time, so it
packages another 500 cells. Repeat.

My original goal with "fine-grained" sendmes was that the data cells
coming from the exit relay would be spread throughout the circuit, and
if we could get a sendme cell back to the exit relay before the last
data cells arrive, the flow would never have to slow down. I didn't
anticipate how much end-to-end latency there would be, though. Now that
cells get buffered for entire seconds at intermediate relays, my original
idea looks kind of stupid. (If they were to get split up in the network,
such that some were arriving when others were still stuck in the circuit,
it would look more like a good idea again. I wonder how to learn whether
that happens.)

So overall, we should be thinking of our total window size as a block
of cells that will go through the network all at once. The bigger the
size is, the more congestion it will produce with other blocks-of-cells.
Whereas the smaller the size is, the lower the maximum data rate can be.

And the fact that it takes more than one sendme cell to refill the window
is basically now a bug.

> I think 500 is a safe bet for x to start with, but from our measurements 
> it seems that even 200 would do fine with nice performance improvements. 
> Of course we are working on more measurements to be on the safe side.

If our goal is to reduce the amount of clutter inside the network, I'm
inclined to go for the smallest numbers we can: circuit window of 100
and stream window of 50 (which can be refilled by one sendme cell).

After all, trying to optimize bandwidth hasn't worked very well, because
there are people out there who want to use way more bandwidth than Tor
can provide. Maybe we should give optimizing latency a shot next.

So to think this through: at 50 cells per stream window, we're saying
that any file less than 25KB can be gotten in one 'go', and after that
we require one extra network round-trip per 25KB we want. We'd better be
cutting down latency a lot by this change, or the additional round-trips
will put any latency savings back in.

Can we come up with a number for average webpage size we want to optimize
for (and deviation around that average)? If we find that many people
want 30KB files, then choosing a window of 25KB isn't smart.

> >2) What are the effects of this patch in a hybrid Tor network? If some
> >exit relays use package windows of 1000, and newer ones use package
> >windows of 100, will the newer ones get 'clobbered' by the old ones?
> >That is, should we expect to see even more slow-down in the network
> >while the relays transition? Is there anything we can do about that,
> >since it will take a year or more for everybody to transition?
> >  
> We've already done measurements on what we call the asymmetric scenario, 
> i.e. where the client is the old one and the exit node is the new one. 
> It seems that performance improvements are there. We are currently 
> thinking of some "fairness" tests; we have some ideas, but it is not yet 
> exactly clear how to perform them. Anyway, from current numbers it seems 
> that the 'clobbered' ones would be the old ones, so at least we have one 
> more incentive for people to change version :-)

Ah, I think you misunderstand. I meant a network where some exit relays
use the new package window and some use the old one.

Imagine a network where one exit relay uses the new package window
(25KB/50KB), and all the other exit relays use the old package window
(250KB/500KB). Assume a set of clients, some of which are trying to fetch
huge files via Tor.

Anybody who picks the normal (old-style) exit relays will get 250KB of
data per round-trip. But anybody who picks the new exit relay gets only
25KB of data per round-trip.

If most exit relays are still sending back 250KB blocks, then anybody
who picks the new exit relay is going to get the worst of both worlds:
a) a Tor network with really high latency, and b) only 25KB at a time.

But I guess it's not as bad as it could be. It could be that the user
gets worse performance when she upgrades to a smaller circuit window, so
she would choose to never upgrade. (This situation will actually happen
with respect to uploading lots of data, since the client's package window
comes into play there. But I think that's rarer.)

--Roger