For purposes of load balancing, exit and guard node node bandwidths are weighted by fractional values when they are probabilistically selected for circuit positions other than their primary purpose. This done is to govern scarcity and properly distribute load amongst the node classes. The exit weight and guard weight are both currently computed from the formulas in: http://archives.seul.org/or/dev/Jul-2007/msg00056.html and http://archives.seul.org/or/dev/Jul-2007/msg00021.html. We unfortunately made two oversights in the derivation of these formulas. Firstly, exit-only nodes cannot be placed in the Entry position, and Guard-only nodes cannot be placed in the Exit position. Secondly, Guard+Exit flagged nodes are not properly used in many cases of scarcity. These oversights are evident in measurement results and in review of the source code. While looking over data for Bug 1217 (https://bugs.torproject.org/flyspray/index.php?do=details&id=1217), I discovered that nodes that were flagged for both Guard and Exit seemed to be being rated as significantly less loaded by the bandwidth authorities than either Guards or Exits alone. I also noticed that Guard, Middle, and Exit capacity measurements were vastly different on average. These measurements were observed using the consensus statistic gathering script in TorFlow: https://svn.torproject.org/svn/torflow/trunk/NetworkScanners/statsplitter.py The source code actually has three implementations of the above formulas: one for guards, one for exits, and one for guard+exits (which is simply a multiplication of the exit and guard result). After spot-checking the source against different edge cases, it quickly became apparent that for the case when either or both Guards or Exits were scarce, we were not properly weighting Guard+Exit nodes in smartlist_choose_by_bandwidth(). This can be seen by considering what happens when exit_weight=0 and/or guard_weight=0 in that function: if (is_exit && is_guard) bw = ((uint64_t)(bandwidths[i] * exit_weight * guard_weight)); Because of the multiplication, the result is that when both guards and exits are scarce, Exit+Guard nodes cannot be used for either the Exit or the Guard position. Additionally, they are not considered at their full bandwidth weight for the exit position if exits but not guards are scarce. Similarly, they are not properly weighted for the guard position if guards but not exits are scarce. This is obviously wrong. To derive the correct formulas for weighting Guard, Exit, and Guard+Exit nodes, we actually need to consider four separate cases of Exit and Guard capacity: 1. Neither are scarce 2. Both Guard and Exit are scarce 3. One of Guard or Exit is scarce Case 1: Neither are scarce This is the most general case, and also the most common for the current Tor network. Let G be the total Guard-only capacity. Let M be the total capacity of all Middle-only nodes. Let E be the total Exit-only capacity. Let D be the total Exit+Guard capacity (Dual-flagged nodes) Let T be the total network capacity. Note that M, E, G, and D are disjoint, and thus: T = G + M + E + D If neither Guards nor Exits are scarce, we know that E > T/3 and G > T/3. Therefore the remainder of the bandwidth for middle nodes must be M+D < T/3. In this case, it is relatively easy to see that the Guard+Exit bandwidth could be devoted exclusively to the middle position. However, this will reduce entropy in the network, and exclude some potentially scarce, stable Exit+Guard nodes from ever being used as Exits. Let's instead divide the Exit+Guard bandwidth equally between the Guard, Middle, and Exit positions in this case. In order to do this, we want to define a set of constraints such that the amount of bandwidth from Guard (G), Middle (M), Exit (E), and Guard+Exit (D) nodes devoted to the entry, middle, and exit positions are equal. Let Wgd be the weight for choosing a Guard+Exit for the guard position. Let Wmd be the weight for choosing a Guard+Exit for the middle position. Let Wed be the weight for choosing a Guard+Exit for the exit position. Let Wme be the weight for choosing an Exit for the middle position. Let Wmg be the weight for choosing a Guard for the middle position. Let Wgg be the weight for choosing a Guard for the guard position. Let Wee be the weight for choosing an Exit for the exit position. The allocation equality condition is then: Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G (guard bw = middle bw) Wgg*G + Wgd*D = Wee*E + Wed*D (guard bw = exit bw) We also want to constrain the Guard+Exit weights to divide the bandwidth equally between the entry, middle, and exit positions: Wed*D + Wmd*D + Wgd*D = D (aka: Wed + Wmd + Wdg = 1) Wed = Wmd = Wgd = 1/3 Also, since the amount of Guard and Exit bandwidth placed on the middle node is also removed from the entry and exit positions: Wmg*G + Wgg*G = G (aka: Wgg = 1-Wmg) Wme*E + Wee*E = E (aka: Wee = 1-Wme) We can then simplify down to 2 equations with 2 unknowns: 1. (1-Wme)*E + D/3 = (1-Wmg)*G + D/3 2. (1-Wmg)*G + D/3 = M + D/3 + Wme*E + Wmg*G Solving these two equations gives: Wmg = (2G - E - M)/3G Wme = (2E - M - G)/3E Here is a Mathematica line to verify this from the general formula: Reduce[{Wgg*G + Wgd*D == M + Wmd*D + Wme*E + Wmg*G, Wgg*G + Wgd*D == Wee*E + Wed*D, Wed*D + Wmd*D + Wgd*D == D, Wed == Wmd, Wmd == Wgd, Wgd == 1/3, Wmg*G + Wgg*G == G, Wme*E + Wee*E == E}, {Wgg, Wgd, Wmd, Wme, Wmg, Wee, Wed}] Case 2: Both Guards and Exits are scarce If both exits and guards are scarce, we need to determine weightings to distribute the Exit+Guard bandwidth evenly between the guard and exit positions, and ignore the middle position. First, we should consider some subcases. Let R denote the more scarce class (Rare) between Guard vs Exit. Let S denote the less scarce class. Subcase a: R+D <= S For this case, we should simply devote all of D to the more scarce condition. This is because we don't have enough Exit+Guard bandwidth to make the more-scarce position have as much capacity as the less-scarce one. Subcase b: R+D > S For this case, we should divide D to make the two scarce classes equal. Using Case 1's General Form: Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G (guard bw = middle bw) Wgg*G + Wgd*D = Wee*E + Wed*D (guard bw = exit bw) Wed + Wmd + Wgd = 1 (Wed*D+Wmd*D+Wgd*D = D) Wgg = 1-Wmg (Wmg*G + Wgg*G = G) Wee = 1-Wme (Wme*E + Wee*E = E) We want guard bandwidth to equal exit bandwidth, and also: Wgg = Wee = 1 and Wme = Wmg = Wmd = 0. Therefore, we know we want the following two equations to hold: 1. G + Wgd*D = E + Wed*D (guard bw = exit bw) 2. Wed*D + Wgd*D = D (properly divide D) Solving for Wed by adding 1+2: 2*Wed*D + E + Wgd*D = G + Wgd*D + D 2*Wed*D = G + D - E Wed = (G+D-E)/2D Wed = (G-E)/2D + 1/2 Solving for Wgd by rewriting 2 as Wgd=1-Wed: Wgd = 1-(G+D-E)/2D Wgd = (E+D-G)/2D Wgd = (E-G)/2D + 1/2 Here is a Mathematica line to verify this: Reduce[{G + Wgd*D == E + Wed*D, Wed*D + Wgd*D == D}, {Wed, Wgd}] Case 3: One of Exit or Guard is scarce If only one of exits or guards are scarce, we want to proceed similarly as Case 2. Let S be the scarce class. This has two subcases: Subcase a: S+D < T/3 In this case, we can simply treat D as the scarce class, and attempt to balance the load between the non-scarce class and the middle nodes. For simplification's sake, lets say S=G. Using Case 1's General Form: Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G (guard bw = middle bw) Wgg*G + Wgd*D = Wee*E + Wed*D (guard bw = exit bw) Wed + Wmd + Wgd = 1 (Wed*D+Wmd*D+Wgd*D = D) Wgg = 1-Wmg (Wmg*G + Wgg*G = G) Wee = 1-Wme (Wme*E + Wee*E = E) We then want Wgg = Wgd = 1 and Wmd = Wed = Wmg = 0. We're now left with: G + D < M + Wme*E (guard bw < middle bw) G + D < Wee*E (guard bw < exit bw) M + Wme = Wee*E (middle bw = exit bw) Wee = 1-Wme (Wme*E + Wee*E = E) Which is only two equations with two unknowns. Solving for Wme: M + Wme = (1-Wme)*E M + Wme = E - Wme*E Wme + Wme*E = E - M (1+E)*Wme = E-M Wme = (E-M)/(1+E) Wee = (1+E)/(1+E) - (E-M)/(1+E) Wee = (1+M)/(1+E) Here is a Mathematica line to verify this: Reduce[{M + Wme == Wee*E, Wee == 1-Wme}, {Wme, Wee}] Subcase b: S+D >= T/3 In this case, the formerly scarce class is no longer scarce, and now we have a condition that is similar to Case 1 with the exception that we want to change the D weighting from 1/3 for position S to whatever quantity brings it up to T/3. For simplification's sake, lets say S=G: G+Wgd*D = T/3 (guard bw = T/3) Wgd = (T/3-G)/D Using Case 1's General Form: Wgg*G + Wgd*D = M + Wmd*D + Wme*E + Wmg*G (guard bw = middle bw) Wgg*G + Wgd*D = Wee*E + Wed*D (guard bw = exit bw) Wed + Wmd + Wgd = 1 (Wed*D+Wmd*D+Wgd*D = D) Wgg = 1-Wmg (Wmg*G + Wgg*G = G) Wee = 1-Wme (Wme*E + Wee*E = E) We then want Wgg = 1 and Wmg = 0. We can then evenly divide D between middle and exit positions: Wed = Wmd We're now left with: G + Wgd*D = M + Wed*D + Wme*E G + Wgd*D = (1-Wme)*E + Wed*D 2*Wed*D + Wgd*D = D Wgd = (T/3-G)/D This gives: Wed = 1/2 - (T/3-G)/2D Wme = (E-M)/2E Here is a Mathematica line to verify this: Reduce[{G + Wgd*D == M + Wed*D + Wme*E, G + Wgd*D == (1-Wme)*E + Wed*D, 2*Wed*D + Wgd*D == D, Wgd == (T/3-G)/D, T == G + M + E + D}, {Wed, Wme, Wgd}] Implementation Notes: The current smartlist_choose_by_bandwidth() calculates Guard, Exit, and Total bandwidth based on the smartlist passed in, which has already been restricted to viable nodes. We'll need to refactor this function to receive G, E, M, D and T as parameters independent of the current node selection list, which is a local position property independent from the global capacity and balance of the network. Furthermore, even these weights are approximate, because in actual operation not all Exit nodes are equal. In fact, some nodes lack an Exit flag, but still contribute Exit bandwidth for certain ports. We could envision dividing the classes such that E and D are counted only for the current exit port instead of purely based on flag, but this is not always known at time of circuit construction. The implementation should verify that the total weighted bandwidth available to each position is properly balanced according to the current scarcity case of the network. This will be difficult to do in a unit test, but there should be debug ifdefs on non-release Tors to check these conditions on the current consensus during selection. -- Mike Perry Mad Computer Scientist fscked.org evil labs
Attachment:
pgpgLoYfmgZsx.pgp
Description: PGP signature