Thus spake Nick Mathewson (nickm@xxxxxxxxxxxxxx): > > The better fix is to allow certain authorities to specify that they are > > voting on bandwidth "offsets": how much they think the weight should > > be changed for the relay in question. We should put the offset vote in > > the stanza for the relay in question, so a given authority can choose > > which relays to express preferences for and which not. > > As Roger and I discussed on Friday, this seems needlessly complex. > The semantics you're suggesting seem to be that an authority can say: > w Bandwidth=X > if they're not too sure, and > w Bandwidth=Y Offset=Z > if they're pretty sure that the real value of the bandwidth is Y+Z. > > This is a big buggy, as noted. Suppose that there are 3 authorities, > and they say about a single router: > Bw=1000 Offset=0 (Total=1000) > Bw=1500 Offset=0 (Total=1500) > Bw=1000 Offset=500 (Total=1500) > Note that the algorithm described below will give a median Bw if 1000 > and a median offset of 0, producing a declared bandwidth of 1000. But > if instead we had taken the median of the actual observed totals, we > would have gotten a value of 1500. > > It makes more sense just to let bandwidth mean bandwidth. If we want > to have measured bandwidth count for more than reported bandwidth, > let's have an optional flag on the vote line that looks like: > > w Bandwidth=X Measured=1 > > This way the median actually -is- the median. See below for my > suggested voting algorithm. Ok, just FYI, the bandwidth measurements are actually computed as the ratio of the average stream bandwidth through a node to the average stream bandwidth observed for nodes of similar reported capacity (or the average of the network as a whole). I'll be detailing the exact nature of this computation in Proposal 161 today, but the end result of the measurement is actually just a floating point value that is computed independent of the reported bandwidth. It only becomes a bandwidth value once we multiply it against a reported bandwidth for a node. I'm guessing the value we would use for this multiplication would be the reported value we saw during the scan. > > 4. Design > > > > First, we need a new consensus method to support this new calculation. > > > > Now v3 votes can have a new weight on the "w" line: > > "Bandwidth_Offset=" INT. > > Once we're using the new consensus method, the new way to compute the > > Bandwidth weight is by taking the old vote (explained in proposal 141: > > median, then choose the lower number in the case of ties), and adding > > or subtracting the median offset (using the offset closer to 0 in the > > case of ties, and with a sum of 0 if the sum is negative). > > > > Then the actual consensus looks just the same as it did before, > > so clients never have to know that this additional calculation is > > happening. > > Here are some additional suggestions that came up as we were talking. > > * We'd like to avoid having little changes in measured bandwidth > result in changes to the consensus, since we'd like to be able to > transfer consensus diffs. Thus, let's round our votes to the > first N significant bits. > > In other words, if we've observed a bandwidth of 28789 bytes for a > node, that's 111 0000 0111 0101. We round that down to 111 0000 > 0000 0000, and declare 26872. > > This is better than rounding to the nearest 1k, since a 1k change is > very significant for low values, and relatively frequent for high > values. Ok. If we are voting on bandwidths, should we do this in the python or in Tor? > * It's really important that measurers try to normalize declared > bandwidth values to the same scale. Conceptually, a bandwidth as we > use it in the consensus is just a weight on an arbitrary scale. But > if different authorities are voting for bandwidths using different > scales -- for example, if one systematically underestimates by 10% > -- then the one with the middle _scale_ will pretty much > unilaterally set the weights for the routers. > > I don't have a great solution for this one, other than to try to > make sure the bandwidth measurement algorithms don't have any > systemic bias depending on which authority is running them from > where on the Internet. We _could_ normalize everything to a > fraction, but that doesn't seem like it would give stable values, > and combining these values would seem hard. I'm thinking for now we just distribute the scanners on different geographical locations: one west coast US, one east coast US, and one Europe to start. My intuition is that the median authority will vary from node to node, depending upon the geographical location of that node. If this turns out not to be the case and the measurements end up being biased for some reason, we can try to figure that out at that point. But even so, the freedom of the middle value should at least be bounded by the other two measurements' bias. -- Mike Perry Mad Computer Scientist fscked.org evil labs
Attachment:
pgpbtaEHgwu93.pgp
Description: PGP signature