[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Proposal 276: Report bandwidth with lower granularity in consensus documents



> On 27 Feb 2017, at 11:35, Nick Mathewson <nickm@xxxxxxxxxxxx> wrote:

>>>  Is there any reason to think this amount of smoothing would not
>>>  be save?
>> ...
>> Given the existing variance is around 2%, and the existing rounding of
>> 0.5%, rounding by 5% seems quite sensible.
>> 
>> (Typical variance *between* bandwidth authorities is much more than
>> this anyway.)

I was wrong: mikeperry raised some serious issues with this scheme.

Issue 1: Increased Inaccuracy

This scheme increases the inaccuracy of the bandwidth authority votes,
compared to the bandwidth measurements. This effect is particularly
pronounced for high-bandwidth relays, because the bandwidths are
distributed according to a power law.

(Mike, feel free to correct me if I've got this wrong.)

For example, the top relay is approximately 1% of the network, or 750
Mbps. The existing variance due to rounding is 0.5% of that, or 4 Mbps.

The variance due to this scheme would be 5.5%, or 41 Mbps.

If the top 10 relays were rounded down, which would happen in
approximately 1 in 2**10 consensuses (once every 6 weeks), this could
distribute ~500 Mbps of client bandwidth to the rest of the relays
arbitrarily and without warning.

This effect gets worse as relay performance increases due to
multithreading or optimisation or processor improvements.

Issue 2: Different Levels of Inaccuracy

As a separate but related issue, rounding to 3 significant digits
means that the error ratio is much higher when those digits are
100..., compared with when those digits are 999...

The maximum error ratios vary by 10x:
100- 101 0.00499
...
999-1000 0.00050

This is an existing bug, and contributes to network instability.
We should fix it anyway, and this is a good opportunity to do so.

>>>  Is there a greedier smoothing algorithm that would produce better
>>>  results?
>> ...

To address issue 1, I suggest that we report higher bandwidths more
accurately, and lower bandwidths less accurately.

Here's a sketch of how that could work:

Top   1%: 3+1 significant digits (900 geometric values per decile)
Next  9%: 2+1 significant digits  (90 geometric values per decile)
Next 90%: 1+1 significant digits   (9 geometric values per decile)
   0-100: set values: 0, 1, 5, 22  (4 geometric values for 4 deciles)
(0-100 is special, because the most common values are 0 and 20.)

To address issue 2, I suggest that we round bandwidths geometrically,
rather than linearly.

Here's a sketch of how that could work when rounding 100-1000 to
2 significant digits:

Normal linear rounding yields 90 values:
100,  110,  120, ... , 980,  990,  1000
Using the linear midpoints (a+b)/2:
   105,  115,    ...    , 985,  995

Geometric rounding yields 90 values (using an extra significant digit):
100,  103,  105, ... , 950,  975,  1000
Using the geometric midpoints sqrt(a*b):
   101,  104,    ...    , 962,  987

The maximum geometric error ratios vary by up to 2x, and that's due to
the rounding to 3 significant digits (otherwise, all the ratios would be
equal):
100- 103 0.01489
...
975-1000 0.01274

But this is much better than linear rounding, where the ratios vary by
up to 10x.

I've attached a script that calculates these geometric series: it's
configured to calculate the 900 values for Top 1% rounding, their
geometric error ratios, and their linear error differences.

(Changing precision to 2+1 and 1+1 gets you the 90 and 9 series,
changing precision to 1, values to 3, and high to 100.0 gets you
the tail-end series.)

This would be relatively easy to implement in C, or it could be
precalculated and stored using ~2 kilobytes (1003 * uint16_t).

And as we're also doing work on compression:

There are also opportunities for table-driven compression: each value
can be specified using 10 bits for the table index, plus a few bits
for the power of 10 of the multiplier:

Top: 750 Mbps * 1024 / 8 = 96000 KBps = 9599 * 10**1 (1 bit)
 1%: 100 Mbps * 1024 / 8 = 12800 KBps =  129 * 10**2 (2 bits)
10%:  25 Mbps * 1024 / 8 =  3200 KBps =   36 * 10**2 (2 bits)
99%:   4 Mbps * 1024 / 8 =   512 KBps =   46 * 10**1 (1 bit)
End:             20 KBps =    20 KBps =   22 * 10**0 (1 bit)

6 extra bits (2 bytes total) would be sufficient to cover a 10,000-fold
increase in maximum individual relay bandwidth (up to 8 Tbps).

T

--
Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
xmpp: teor at torproject dot org
------------------------------------------------------------------------
#!/usr/bin/python

import math

# define error functions and midpoint functions

def error_ratio(mid, minmax):
    '''The ratio between the rounded value and the actual value
       | n / r(n) - 1 | 
    '''
    return abs(minmax / mid - 1.0)

def mid_min_error_ratio(l, h):
    '''Minimise the ratio between the rounded value and the actual value
    '''
    return math.sqrt(l * h)

def error_difference(mid, minmax):
    '''The difference between the rounded value and the actual value
       | n - r(n) |
    '''
    return abs(minmax - mid)

midpoint = mid_min_error_ratio
tests = [error_ratio, error_difference]

# Start with an interval representing the current rounding
# we round to 3 significant figures, but we need an extra digit of precision
# to capture the values around the low end
precision = 3+1
low = math.pow(10.0, precision - 1)
high = low*10.0
values = (high - low) / 10.0
bounds = [low, high]
# Create at least as many values as are in the existing scheme
# (an appropriate midpoint function should be used to choose whether to round
# to the lower or higher value)
scale = (high-low)/values

# increase the values geometrically between low and high
step = math.pow(high/low, 1.0/values)
print step
next = low/scale
bounds = []
while next <= (high/scale + 1):
    bounds.append(next)
    next *= step

# round to the nearest integer
bounds = [int(round(b*scale, 0)) for b in bounds]

# increase the values linerly between low and high
#bounds = range(int(low), int(high)+1)

print len(bounds)
b_prev = None
for b in bounds:
    comparison = "{:.0f}".format(b)
    if b_prev is not None:
        for test in tests:
            comparison += " {:.5f}".format(test(midpoint(b_prev, b), b))
    print comparison
    b_prev = b

# chop off the upper bound: it belongs to the next decile
if bounds[-1] == high:
    del bounds[-1]

print bounds


Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev