[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules

To: undisclosed-recipients: ;
Subject: Re: [tor-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules
From: "Tor Bug Tracker & Wiki" <blackhole@xxxxxxxxxxxxxx>
Date: Tue, 08 May 2018 15:34:47 -0000
Auto-submitted: auto-generated
Delivered-to: archiver@xxxxxxxx
Delivery-date: Tue, 08 May 2018 11:35:02 -0400
In-reply-to: <047.a2a0df1086b1d08b28e136d7e031b119@torproject.org>
List-archive: <http://lists.torproject.org/pipermail/tor-bugs/>
List-help: <mailto:tor-bugs-request@lists.torproject.org?subject=help>
List-id: "auto: Tor bug tracker status mails" <tor-bugs.lists.torproject.org>
List-post: <mailto:tor-bugs@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs>, <mailto:tor-bugs-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-bugs>, <mailto:tor-bugs-request@lists.torproject.org?subject=unsubscribe>
References: <047.a2a0df1086b1d08b28e136d7e031b119@torproject.org>
Reply-to: no-reply@xxxxxxxxxxxxxx, tor-assistants@xxxxxxxxxxxxxx
Sender: "tor-bugs" <tor-bugs-bounces@xxxxxxxxxxxxxxxxxxxx>

#26035: Streamline sample quantile types used in the various modules
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  enhancement         |         Status:  new
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:  Sponsor13
--------------------------------+------------------------------

Comment (by iwakeh):

 Long description, true.

 Let's look at the postgresql implementation (using source of 9.6.8, but
 assuming it is not subject to frequent change,
 ./src/backend/utils/adt/orderedsetaggs.c).

 The computation of percentile_* as C-ish pseudo-code (which could easily
 be used for implementation in Java or python):
 {{{
 #!C
 /*
    All values are sorted and indexed according to the order.
    first and second are indices; val(k) is the value at index k;
    percentile is the wanted percentile.  N is the count of values.
 */

 first = floor(percentile * N);
 second = ceil(percentile * N);
 /* if first==second the value of first is used. */
 if (first==second) {
   result = val(first);
 } else { /* if first and second differ the following interpolation is
 used. */
   proportion = (percentile * N) - first;
   /* the value is chosen between the values of first and second */
   result = val(first) + (proportion * (val(second) - val(first)));
 }

 /*-----------------------------------*/
 /* For comparison percentile_disc:  */
 result = val( ceil(N*percentile));
 }}}

 For values, where fractions make sense, e.g. seconds for onionperf
 results, the interpolation or continuous method could be used and for all
 else, e.g. user numbers etc., the simpler calculation (refered to as
 discontinuous) could be used.

 The classification R-x is not really a DIN and I wouldn't rely on an
 implementation without checking the source.
 Thus, using our own implementation in Java might be less trouble and
 smaller dependency counts/space used.  Python is not used anymore soon,
 and R is only used for the median and either could document the median
 calculation in R or implement the 0.5-percentile.

 Possible steps:
 * keep using postgresql percentile_cont
 * implement the equivalent Java functionality and replace commons-math, if
 this is the only functionality why it is included.
 * document percentile calculation once for Java together with postgresql
 * adapt the R median calculation, if necessary

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26035#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online

_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

References:
- [tor-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules
  - From: Tor Bug Tracker & Wiki

Prev by Author: Re: [tor-bugs] #22079 [Community]: Community governance documents
Next by Author: Re: [tor-bugs] #26079 [Metrics/Relay Search]: Expand text for Unmeasured flag to explain it can happen to older relays too
Previous by thread: [tor-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules
Next by thread: Re: [tor-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules
Index(es):
- Author
- Thread