[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics



#26022: Fix a flaw in the noise-removing code in our onion service statistics
------------------------------------+--------------------------
     Reporter:  karsten             |      Owner:  metrics-team
         Type:  defect              |     Status:  new
     Priority:  Medium              |  Milestone:
    Component:  Metrics/Statistics  |    Version:
     Severity:  Normal              |   Keywords:
Actual Points:                      |  Parent ID:
       Points:                      |   Reviewer:
      Sponsor:                      |
------------------------------------+--------------------------
 I think I found a flaw in the aggregation part of our onion service
 statistics.

 For context, I'm quoting a paragraph from our technical report where we
 explain how we're [https://research.torproject.org/techreports
 /extrapolating-hidserv-stats-2015-01-31.pdf extrapolating network totals
 from hidden-service statistics]:

 ''"When processing hidden-service statistics, we need to handle the fact
 that they have been obfuscated by relays. As first step, we're attempting
 to remove the additive Laplace-distributed noise by rounding up or down to
 the nearest multiple of `bin_size`. The idea is that it's most likely that
 noise was added to the closest right side of a bin than to the right side
 of another bin. In step two, we're subtracting half of `bin_size`, because
 the relay added between 0 and `bin_size − 1` to the originally observed
 value."''

 In our Java code, these two steps turned into:

 {{{
   /** Removes noise from a reported stats value by rounding to the nearest
    * right side of a bin and subtracting half of the bin size. */
   private long removeNoise(long reportedNumber, long binSize) {
     long roundedToNearestRightSideOfTheBin =
         ((reportedNumber + binSize / 2) / binSize) * binSize;
     long subtractedHalfOfBinSize =
         roundedToNearestRightSideOfTheBin - binSize / 2;
     return subtractedHalfOfBinSize;
   }
 }}}

 I think that this code has a flaw: in `(reportedNumber + binSize / 2) /
 binSize`, we use [https://en.wikipedia.org/wiki/Truncation integer
 truncation]. However, as described in that Wikipedia article, ''"for
 negative numbers truncation does not round in the same direction as the
 floor function: truncation always rounds toward zero, the floor function
 rounds towards negative infinity."''

 I'm going to attach a graph (once this ticket exists) that visualizes the
 effect for reported values around zero. In short: the step function has
 one really long step, and that seems flawed or at least inconsistent.

 My suggestion would be to change that Java code and use `Math.floorDiv()`
 rather than integer truncation. Something like this:

 {{{
 diff --git
 a/src/main/java/org/torproject/metrics/stats/hidserv/Parser.java
 b/src/main/java/org/torproject/metrics/stats/hidserv/Parser.java
 index f2abc789..888c8959 100644
 --- a/src/main/java/org/torproject/metrics/stats/hidserv/Parser.java
 +++ b/src/main/java/org/torproject/metrics/stats/hidserv/Parser.java
 @@ -245,7 +245,7 @@ public class Parser {
     * right side of a bin and subtracting half of the bin size. */
    private long removeNoise(long reportedNumber, long binSize) {
      long roundedToNearestRightSideOfTheBin =
 -        ((reportedNumber + binSize / 2) / binSize) * binSize;
 +        Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize;
      long subtractedHalfOfBinSize =
          roundedToNearestRightSideOfTheBin - binSize / 2;
      return subtractedHalfOfBinSize;
 }}}

 See the [https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html
 #floorDiv-long-long- JavaDocs for that method] for details. Quoting:
 ''"Normal integer division operates under the round to zero rounding mode
 (truncation). This operation instead acts under the round toward negative
 infinity (floor) rounding mode. The floor rounding mode gives different
 results than truncation when the exact result is negative."''

 I'm yet unclear whether this might affect overall results. Right now,
 we're handling all negative reported values wrong by adding 1 `binSize` to
 them which we shouldn't do. Of course, negative reported values should be
 the exception, not the rule, but there's a reason why we're keeping them
 in the process, just like we're keeping values that we think are too large
 because of noise. We'll have to see.

 But regardless, I think we need to fix this. Opinions?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs