Thus spake Robert Hogan (robert@xxxxxxxxxxxxxxx): > On Thursday 26 July 2007 11:57:44 Mike Perry wrote: > You don't say so in your post, but I take it the percentile ranges were taken > in descending order of advertised bw? Correct. There is a graph that illustrates the distribution of bandwidth by percentile: http://fscked.org/transient/torgraphs/BwBar.pdf > So if I understand this correctly: > > * Overall network bw is poorly utilized because high-capacity nodes aren't > participating in the large slice of the network's circuits that their bw > warrants. Yeah. > * Tor needs to be careful (and clip at some sane bw of X mb/s) because > otherwise too much traffic will concentrate on a small group of nodes, > harming anonymity. Not to offend you, but I believe this argument is nonsense. A high capacity node should be fully utilized to the full extent of its capacity. Artifically clipping capacity does not properly solve *any* problems related to nodes lying about capacity, nor does it "improve anonymity". If you try to "protect" anonymity this way, people with high capacity links will just start mounting sybil attacks instead. In fact, such uniform balancing actually is extremely damaging to anonymity, because it both artifically hampers user size, and causes exorbitiant amounts of circuit failures which can be used to bias route selection (the last guard percentile, the 45-50 range, exhibited nearly 40% circuit failure rate!). > Given that a tor circuit is only as fast as its slowest node isn't it just as > likely that for some reasonable percentage of the time, the user's circuit's > will still be problematically slow (e.g. fast guard, slow relay, fast exit)? That is what load balancing is for though. If you notice, that is why I did all my userbase size calculations based on a 10KB/sec stream rate, which is the average speed of the rest of the Tor network. As I pointed out, the top 5% has room for SEVEN TIMES MORE 10KB/sec streams, since its average stream capacity is 70K/sec. Assuming fairness, a node with capacity C and N streams should produce stream capacity of C/N. It is simple algebra to see that if C/N=70, you can have N'=7N users at 10K/sec instead. As my calculations (which assume TCP fairnes, a property that may not hold over internal circuits) showed, I believe that properly balancing the network should increase effective user capacity by 4X. (X is current network size: X*.45*(70/10)+X*.15*(30/10)+X*.25=4.3X) If it turns out this is the case because of fairness or other issues, we can take more measurements once the changes are in place and go to work devising a better weighting scheme. Right now, however, load balancing has such fundamental issues as these that it is impossible to make any other adjustments or measurements other than to say "Oh dear goddess is it horribly broken". Incidentally, it is ALSO currently extremely difficult to detect nodes that are currently maliciously failing circuits, or who are lying about their bandwidth, because the network is so unbalanced. So saying that we shoud "wait until we can detect liars" until we fix this stuff will leave us in circular deadlock. > If that's true then increasing the clipping point alone will not improve > overall performance, so in order to work does your proposal have to be > two-pronged? > > ( 1 ) adopt two-hop paths as a user option, because those will always be > mostly, but not always, be fast if ( 2 ) high-capacity nodes get their 'fair' > share of the user's circuits once punitive clipping is removed. This is > because guards have a guaranteed high-ish bw and we've reduced our odds of > getting a slow node on the rest of the circuit because we're only choosing > one more, rather than two more AND we've allowed fast nodes to throw their > weight around a bit more in the random selection. No, two hop paths is independent of this proposal. But I'm not giving up on that one either. I believe 2 hop paths to be purely an implementation issue. There is no theoretical reason why 2 hop paths significantly degrades anonymity, but I do agree the devil is in the implementation details. > Is there more to it or am I missing something? > > As a side note: > Wouldn't a useful metric be nodes' cpu usage or even capacity (e.g. make and > ghz)? Is there any research or intuition on the relationship between cpu > make/ghz'age and bw-capacity that illustrate how much of that bw is just > notional once the node is participating in x circuits with y kb/s? Would it > be worth including this in descriptors? No, I think this is up to node operators to look at their CPU usage and limit tor's bandwidth proportionally to achieve their desired load. CPUs vary widely and can have various optimizations as well (ssl accelerator cards, vector operations, etc). If nodes properly report their current capacity, it doesn't matter if a node is network bound or CPU bound from a balancing perspective (though perhaps Johannes will discover that CPU load possibly increases latency in some weird ways). -- Mike Perry Mad Computer Scientist fscked.org evil labs
Attachment:
pgp1n3UyQyylv.pgp
Description: PGP signature