[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics



#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  defect              |         Status:  needs_review
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:
--------------------------------+------------------------------

Comment (by amj703):

 > Hmm, I believe that disallowing negative values by rounding them up to
 zero would bias the result unnecessarily. For example, imagine an extreme
 case where we picked a large `bin_size` value and an even larger `delta_f`
 value: most relays would observe values between 0 and `bin_size`, and they
 would add very large positive or negative noise values. In such a case we
 need both negative and positive values to come up with a result close to
 0.

 Sure, that makes sense as a reason to try and produce unbiased estimates
 for each relay’s value. It seems to me that there is another way to do
 this, which would involve adding the relay values first and then adjusting
 the result, given that the noise has already been summed and thus has zero
 bias (aka an expectation of zero). But the current method seems to be
 working just fine!

 > Hmm, I'm not so sure about this one, either. Remember that we
 implemented the code at relays to report the ''right side'' of a bin, not
 the ''center''. Take another example where a relay observes exactly 0
 events over two days: on day 1 it adds a small positive noise value and
 reports `bin_size` as number, and on day 2 it adds a small negative noise
 value and reports 0 as number. If we subtract `bin_size / 2` from those
 reported values, we'll end up with 0 on average, which would be correct.
 But if we didn't, we'd end up with `bin_size / 2` as average, which seems
 wrong.

 As another example, consider that on day 1 a positive noise value in
 (bin_size/2, bin_size] is added, which results in bin_size being reported,
 and on day 2 a large-magnitude negative value in [-bin_size, -bin_size/2)
 is reported, which results in -bin_size being reported. Then subtracting
 bin_size/2 from those reported values ends up with -bin_size/2 as the
 average, which seems wrong. I don’t think the “right side” rounding is
 happening with current use of the floor function, if it ever was. Maybe
 I’m wrong, but as I understand it Math.floorDiv((reportedNumber + binSize
 / 2) will round -0.75*binSize to -binSize.

 > Does this make sense? If so, I think I'd prefer to leave the general
 formula to remove noise unchanged and only focus on fixing the
 implementation bug related to integer truncation.

 It definitely makes sense to just focus on the immediate issue discovered.
 I was just thinking through how this was supposed to work again and
 thought I would let you know how it appeared to me.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs