On Wed, Jul 18, 2007 at 01:36:02AM -0700, Mike Perry wrote: [...] > Presently, Tor attempts to balance exit node usage thusly (from > path-spec.txt): > > For non-exit positions on "fast" circuits, we pick routers as > above, but we weight the clipped advertised bandwidth of > Exit-flagged nodes depending on the fraction of bandwidth > available from non-Exit nodes. Call the total clipped advertised > bandwidth for Exit nodes under consideration E, and the total > clipped advertised bandwidth for non-Exit nodes under > consideration N. If E<N/2, we do not consider Exit-flagged nodes. > Otherwise, we weight their bandwidth with the factor > (E-N/2)/(N+E-N/2) == (2E - N)/(2E + N). This ensures that > bandwidth is evenly distributed over nodes in 3-hop paths. [...] Man. That part of tor-spec.txt has to be just about the worst, most unclear, paragraph that I've written in ages. I'll try to analyze the formulas in question and show more about how I derived them. With any luck, if I'm making a mistake, somebody will be able to show me where it is. [I'm writing this parenthetical after having gone through the derivation. Summary: I got the same result as Mike.] (Most of this won't be news to most of you, but I want to be really explicit in order to try to expose the flaws in my logic.) (Oh yeah. You'll want to view this in a fixed-width font, or your head will hurt.) Our goal is to decide how to select Exit nodes as middlemen. For purposes of discussion, we'll assume that our goal is to use all of the bandwidth of all nodes equally. When we pick nodes, we weight the probability of choosing any given node by the node's bandwidth. To avoid overloading exits, when considering exits for middleman positions, we weight their bandwidth by some factor w. (As a simplifying assumption, assume that we choose the nodes in a path independently.) Let's suppose that we build an L-hop circuit, and send some amount of bytes B through that circuit. We will always pick an Exit as the last hop. For the other L-1 hops, we will pick an Exit with probability Pe = (w*E)/(w*E+N) and we will pick a non-Exit with probability Pn = N/(w*E+N). Thus, the expected number of bytes relayed by exits is: B (for the exit hop) + B * (L-1) * Pe (for all the other hops) Call this quantity BE. The expected number of bytes relayed by non-exits is: B * (L-1) * Pn Call this quantity BN. How should we choose w? We'd like to have the property that the ratio of expected usage between exit and non-exit is about the same as the ratio of capacity between exit and non-exit. In other words, we want BE/BN = E/N. Setting up the equation gives: B[ 1 + (L-1) * (w*E) / (w*E+N) ] E -------------------------------- = --- B[ (L-1) * N / (w*E+N) ] N The B's cancel each other. We then multiply the numerator and denominator of the left-hand fraction by (w*E+N) to get: w*E + N + (L-1)*(w*E) E --------------------- = --- (L-1)*N N Multiplying both sides of the equation by (L-1)*N gives: w*E + N + (L-1)*(w*E) = E(L-1) Collecting w-terms on one side and non-w terms on the other gives: w*E + (L-1)*(w*E) = E(L-1) - N Simplifying: w*E*(1+L-1) = E(L-1) - N w*E*L = E(L-1) - N E(L-1) - N w = ----------- E*L Note in practice that if this equation gives a value of w less than 0, it means that there is too little exit bandwidth for us to distribute bandwidth evenly over exit and non-exit nodes. Let's check for some boundary conditions and special cases. When N=(L-1)*E, we have just enough exit bandwidth to match the non-exit bandwidth, and w=0. For L=3 (as in Tor today, mostly), we have w = (2E-N)/3E. In terms of T, that's w = [2E - (T-E)] / 3E = (3E - T) / 3E = 1 - T/3E. Okay, that looks like the result Mike got. Hooray! So long as we're doing this, let's do it properly and expand our model a little, to consider cannibalized circuits and non-3-hop paths. These questions are mostly for Roger, but anybody else who has empirical data should step up too: 1) Roger, can you comment on the average path length in practice? Is it close enough to that we should just set L=3? 2) What fraction of circuits are cannibalized? If all L-hop circuits that already end at an Exit are (with probability Pc) cannibalized and extended to another Exit, we now have BE = (1+Pc)*B + B*(L-1) * Pe, which changes our analysis a little. Right. Time for me to sleep. Clearly, I need it. peace, -- Nick Mathewson
Attachment:
pgpReAKu26IFk.pgp
Description: PGP signature