[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] tor's definition of 'median'



Virgil's absolutely right. Median as the "middle" value in a _sorted_ set is:
- for odd number of data points, it's the middle one: set[N/2]
- for even number of data points, it's theÂaverage of two in the middle: (set[N/2] + set[(N+1)/2]) / 2
Â
Best regards,
Maciej

On Tue, Aug 11, 2015 at 3:44 PM, Virgil Griffith <i@xxxxxxxxx> wrote:
I mean the median.

From Wikipedia...

For example, ifÂaÂ<ÂbÂ<Âc, then the median of the list {abc} isÂb, and, ifÂaÂ<ÂbÂ<ÂcÂ<Âd, then the median of the list {abcd} is the mean ofÂbÂandÂc; i.e., it is (bÂ+Âc) / 2.

-V

On Tue, Aug 11, 2015 at 9:29 PM John <oneofthem@xxxxxxxxxx> wrote:
I think you are confusing the median with the mean:

https://en.wikipedia.org/wiki/Median
https://en.wikipedia.org/wiki/Mean

Taking the median instead of the mean can be beneficial in situations
where you have larger outliers in your data, which typically affect the
mean very much.

-j

Virgil Griffith:
> Is there some implementation-specific reason not to use the standard
> mathematical definition of "median"? If not, I propose changing the
> implementation to become it.
>
> -V
>
> On Tue, Aug 11, 2015 at 2:44 AM Nick Mathewson <nickm@xxxxxxxxxxxx> wrote:
>
>> On Mon, Aug 10, 2015 at 1:11 PM, nusenu <nusenu@xxxxxxxxxxxxxxx> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA512
>>>
>>> Hi,
>>>
>>> https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2028
>>>
>>>> If 3 or more authorities provide a Measured= keyword for a router,
>>>> the authorities produce a consensus containing a "w" Bandwidth=
>>>> keyword equal to the median of the Measured= votes.
>>>
>>> a random sample from recent votes:
>>>
>>> grep 37.59.38.117 -A 3 *|grep Measured
>>> w Bandwidth=6869 Measured=7570
>>> w Bandwidth=6869 Measured=15500
>>> w Bandwidth=6869 Measured=18100
>>> w Bandwidth=6869 Measured=30500
>>>
>>> Tor says the median value is
>>> 15500
>>>
>>> 2015-08-10-16-00-00-consensus:
>>> w Bandwidth=15500
>>>
>>> but the median of these 4 values is actually:
>>> (18100+15500)/2 = 16800
>>> no?
>>>
>>> Has tor a different definition of 'median' and simply takes always the
>>> second ordered measurement vote out of 4 votes or is there a bug in
>>> the spec or implementation?
>>
>> There's one misplaced throwaway sentence in dir-spec.txt:
>>
>> "Â All ties in computing medians are broken in favor of the smaller or
>>Â Â earlier item.
>> "
>>
>> We should bring this, and probably other things, into a "definitions"
>> section earlier in dir-spec.txt. Patches welcome. ;)
>>
>> --
>> Nick
>> _______________________________________________
>> tor-dev mailing list
>> tor-dev@xxxxxxxxxxxxxxxxxxxx
>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>>
>
>
>
> _______________________________________________
> tor-dev mailing list
> tor-dev@xxxxxxxxxxxxxxxxxxxx
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev