[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics



#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  defect              |         Status:  needs_review
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:
--------------------------------+------------------------------

Comment (by amj703):

 > > Why not throw out the outliers, then add the remaining, then do the
 adjustment?
 >
 > The way we're determining whether a reported value was an outlier or not
 is by extrapolating all reported values to network totals and discarding
 the lowest 25% and highest 25% of ''extrapolated'' values. But
 extrapolating values requires us to make these adjustment first, or we'd
 extrapolate to the wrong network totals.

 It sounds to me like it doesn't make a difference if throw out the
 outliers before adjustment or after adjustment. The adjustment preserves
 the relative ordering of the values, and so the bottom (and top) 25% of
 data points will remain the same. So here's a suggestion: (1) extrapolate
 by dividing the raw measurement by the relay's weight; (2) remove the top
 and bottom 25% of extrapolated values; (3) add the remaining raw values;
 (4) adjust that result by rounding towards the closest bin edge,
 subtracting num_relay*bin_size/2, and replacing a negative value with 0;
 and then (5) extrapolate the result by dividing by the total weight of the
 included relays.

 > Here's another idea, though: what if we change the way how we're
 removing noise by ''only'' subtracting `bin_size / 2` to undo the binning
 step as good as we can and leave the Laplace noise alone. Basically, we'd
 only account for the fact that relays always round up to the next multiple
 of `bin_size`, but we wouldn't do anything about the positive or negative
 noise.

 This isn't really the best guess about the value at the relays before
 noise is added. While not performing the noise-removal step makes it
 easier to explain how the metrics numbers are produced (and to implement
 them), I think it makes it harder to understand what they mean. This is a
 judgement call that I can see both sides of, though!

 > Wait, we're talking about two different things:
 >
 >  1. Relays internally round ''up'' to the next multiple of `bin_size`.
 >  2. metrics-web contains that `removeNoise()` method that this ticket is
 all about.

 Ah, right, thanks for clarifying that.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs