> On 27 Feb 2017, at 11:35, Nick Mathewson <nickm@xxxxxxxxxxxx> wrote: >>> Is there any reason to think this amount of smoothing would not >>> be save? >> ... >> Given the existing variance is around 2%, and the existing rounding of >> 0.5%, rounding by 5% seems quite sensible. >> >> (Typical variance *between* bandwidth authorities is much more than >> this anyway.) I was wrong: mikeperry raised some serious issues with this scheme. Issue 1: Increased Inaccuracy This scheme increases the inaccuracy of the bandwidth authority votes, compared to the bandwidth measurements. This effect is particularly pronounced for high-bandwidth relays, because the bandwidths are distributed according to a power law. (Mike, feel free to correct me if I've got this wrong.) For example, the top relay is approximately 1% of the network, or 750 Mbps. The existing variance due to rounding is 0.5% of that, or 4 Mbps. The variance due to this scheme would be 5.5%, or 41 Mbps. If the top 10 relays were rounded down, which would happen in approximately 1 in 2**10 consensuses (once every 6 weeks), this could distribute ~500 Mbps of client bandwidth to the rest of the relays arbitrarily and without warning. This effect gets worse as relay performance increases due to multithreading or optimisation or processor improvements. Issue 2: Different Levels of Inaccuracy As a separate but related issue, rounding to 3 significant digits means that the error ratio is much higher when those digits are 100..., compared with when those digits are 999... The maximum error ratios vary by 10x: 100- 101 0.00499 ... 999-1000 0.00050 This is an existing bug, and contributes to network instability. We should fix it anyway, and this is a good opportunity to do so. >>> Is there a greedier smoothing algorithm that would produce better >>> results? >> ... To address issue 1, I suggest that we report higher bandwidths more accurately, and lower bandwidths less accurately. Here's a sketch of how that could work: Top 1%: 3+1 significant digits (900 geometric values per decile) Next 9%: 2+1 significant digits (90 geometric values per decile) Next 90%: 1+1 significant digits (9 geometric values per decile) 0-100: set values: 0, 1, 5, 22 (4 geometric values for 4 deciles) (0-100 is special, because the most common values are 0 and 20.) To address issue 2, I suggest that we round bandwidths geometrically, rather than linearly. Here's a sketch of how that could work when rounding 100-1000 to 2 significant digits: Normal linear rounding yields 90 values: 100, 110, 120, ... , 980, 990, 1000 Using the linear midpoints (a+b)/2: 105, 115, ... , 985, 995 Geometric rounding yields 90 values (using an extra significant digit): 100, 103, 105, ... , 950, 975, 1000 Using the geometric midpoints sqrt(a*b): 101, 104, ... , 962, 987 The maximum geometric error ratios vary by up to 2x, and that's due to the rounding to 3 significant digits (otherwise, all the ratios would be equal): 100- 103 0.01489 ... 975-1000 0.01274 But this is much better than linear rounding, where the ratios vary by up to 10x. I've attached a script that calculates these geometric series: it's configured to calculate the 900 values for Top 1% rounding, their geometric error ratios, and their linear error differences. (Changing precision to 2+1 and 1+1 gets you the 90 and 9 series, changing precision to 1, values to 3, and high to 100.0 gets you the tail-end series.) This would be relatively easy to implement in C, or it could be precalculated and stored using ~2 kilobytes (1003 * uint16_t). And as we're also doing work on compression: There are also opportunities for table-driven compression: each value can be specified using 10 bits for the table index, plus a few bits for the power of 10 of the multiplier: Top: 750 Mbps * 1024 / 8 = 96000 KBps = 9599 * 10**1 (1 bit) 1%: 100 Mbps * 1024 / 8 = 12800 KBps = 129 * 10**2 (2 bits) 10%: 25 Mbps * 1024 / 8 = 3200 KBps = 36 * 10**2 (2 bits) 99%: 4 Mbps * 1024 / 8 = 512 KBps = 46 * 10**1 (1 bit) End: 20 KBps = 20 KBps = 22 * 10**0 (1 bit) 6 extra bits (2 bytes total) would be sufficient to cover a 10,000-fold increase in maximum individual relay bandwidth (up to 8 Tbps). T -- Tim Wilson-Brown (teor) teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n xmpp: teor at torproject dot org ------------------------------------------------------------------------
#!/usr/bin/python import math # define error functions and midpoint functions def error_ratio(mid, minmax): '''The ratio between the rounded value and the actual value | n / r(n) - 1 | ''' return abs(minmax / mid - 1.0) def mid_min_error_ratio(l, h): '''Minimise the ratio between the rounded value and the actual value ''' return math.sqrt(l * h) def error_difference(mid, minmax): '''The difference between the rounded value and the actual value | n - r(n) | ''' return abs(minmax - mid) midpoint = mid_min_error_ratio tests = [error_ratio, error_difference] # Start with an interval representing the current rounding # we round to 3 significant figures, but we need an extra digit of precision # to capture the values around the low end precision = 3+1 low = math.pow(10.0, precision - 1) high = low*10.0 values = (high - low) / 10.0 bounds = [low, high] # Create at least as many values as are in the existing scheme # (an appropriate midpoint function should be used to choose whether to round # to the lower or higher value) scale = (high-low)/values # increase the values geometrically between low and high step = math.pow(high/low, 1.0/values) print step next = low/scale bounds = [] while next <= (high/scale + 1): bounds.append(next) next *= step # round to the nearest integer bounds = [int(round(b*scale, 0)) for b in bounds] # increase the values linerly between low and high #bounds = range(int(low), int(high)+1) print len(bounds) b_prev = None for b in bounds: comparison = "{:.0f}".format(b) if b_prev is not None: for test in tests: comparison += " {:.5f}".format(test(midpoint(b_prev, b), b)) print comparison b_prev = b # chop off the upper bound: it belongs to the next decile if bounds[-1] == high: del bounds[-1] print bounds
Attachment:
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev