[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #18167 [Metrics]: Don't trust "bridge-ips" blindly for user number estimates
#18167: Don't trust "bridge-ips" blindly for user number estimates
-------------------------+-----------------
Reporter: karsten | Owner:
Type: defect | Status: new
Priority: Medium | Milestone:
Component: Metrics | Version:
Severity: Major | Keywords:
Actual Points: | Parent ID:
Points: | Sponsor:
-------------------------+-----------------
I think I found a bug in the user number estimates that led to the
[https://trac.torproject.org/projects/tor/ticket/13171#comment:14
confusion on #13171].
When I developed the [https://research.torproject.org/techreports
/counting-daily-bridge-users-2012-10-24.pdf algorithm for estimating user
numbers], bridges only reported how many directory requests they responded
to (`"dirreq-v3-resp"`), but not how these directory requests were
distributed to countries (`"dirreq-v3-reqs"`). What they did report was
how many different IP addresses by country connected to the bridge
(`"bridge-ips"`). The goal back then was to provide better user numbers
per country, so I put in the assumption that the geographic distributions
of directory responses and connecting IP addresses would be roughly the
same. And I think that assumption is still valid for most cases.
However, the meek version ''before'' the #13171 fix broke this assumption.
Here's an example from a meek bridge that didn't have this fix yet
(descriptor digest `462a2bcc..`):
{{{
extra-info UtahMeekBridge 88F745840F47CE0C6A4FE61D827950B06F9E4534
published 2015-12-09 22:53:48
dirreq-v3-resp ok=17656,not-enough-sigs=0,unavailable=0,not-found=0,not-
modified=6160,busy=0
bridge-ips de=16,cn=8,us=8
}}}
It's rather unlikely that 17656 responses were sent back to 32 IP
addresses or less. Still, following the assumption above, we're saying
that half of those 17656 responses were sent back to Germany and one
quarter each to China and the U.S.A., and that seems dangerously wrong.
I'm going to attach a scatter plot in a minute, `dirreq-resp-by-bridge-
ips-2016-01-27.png`, that puts the numbers of `"dirreq-v3-resp ok=..."`
and `"bridge-ips"` in relation for statistics reported between December 1,
2015 and last week. The two meek bridges `88F7..` and `AA03..` stand out
quite a bit there as clusters close to the y axis.
I have a few possible fixes in mind. The first part would be to ignore
all statistics where 1 unique IP address was reported to make, say, 10
directory requests or more. That would remove all dots to the left of the
dashed line in the graph.
The second part of the fix would be to switch from combining
`"dirreq-v3-resp"` and `"bridge-ips"` numbers and instead use reported
distributions of directory requests to countries (`"dirreq-v3-reqs"`) that
were not available 3.5 years ago. But
[https://trac.torproject.org/projects/tor/ticket/5824#comment:17 starting
roughly 2 years ago], these statistics are being published by more and
more bridges.
Here's a descriptor (`fe171d40..`) that was published last week by the
same bridge as above, now named `MeekGoogle`, which was after the meek-
specific #13171 fix:
{{{
extra-info MeekGoogle 88F745840F47CE0C6A4FE61D827950B06F9E4534
published 2016-01-22 13:11:10
dirreq-v3-reqs us=7200,ru=1576,de=1520,[..],cn=88,[..]
dirreq-v3-resp ok=22016,not-enough-sigs=0,unavailable=0,not-found=0,not-
modified=6016,busy=0
bridge-ips us=3016,ru=632,gb=536,de=528,[..],cn=40,[..]
bridge-ip-versions v4=8752,v6=64
bridge-ip-transports <OR>=8,meek=8808
}}}
I'm attaching a second scatter plot, `dirreq-resp-by-dirreq-
reqs-2016-01-27.png`, that compares the numbers of `"dirreq-v3-resp
ok=..."` to `"dirreq-v3-reqs"`. The correlation is close to linear, which
makes sense, because the number of directory requests should roughly match
the number of directory responses. I think we can make the user number
estimates a bit more accurate by making this switch. We would still fall
back to `"bridge-ips"` if `"dirreq-v3-reqs"` is empty, but that would
mostly affect older statistics.
Part three of the plan would be to remove the `"bridge-ips"` line entirely
from little-t-tor, because we wouldn't use it anymore. It's worth noting
that we'd lose the ability to filter out meek bridges that don't have the
#13171 fix and that don't report usable `"dirreq-v3-reqs"` statistics. Or
rather, we wouldn't spot future meek-like bridges affected by a similar
bug.
Here's why. The first bridge descriptor above also contained a
`"dirreq-v3-reqs"` line that I left out before:
{{{
extra-info UtahMeekBridge 88F745840F47CE0C6A4FE61D827950B06F9E4534
published 2015-12-09 22:53:48
dirreq-v3-resp ok=17656,not-enough-sigs=0,unavailable=0,not-found=0,not-
modified=6160,busy=0
dirreq-v3-reqs us=17648,cn=8
bridge-ips de=16,cn=8,us=8
}}}
We wouldn't be able to filter out this bridge without the `"bridge-ips"`
line. We would have to assume that the vast majority of requests to this
bridge came from the U.S.A., and a tiny minority from China.
I think this is acceptable, because the purpose of statistics shouldn't be
to validate the correctness of other statistics.
To summarize my plan, here's what I'd like to do:
1. If a bridge reports both a `"dirreq-v3-resp`" and a `"bridge-ips"`
line, check if the first number is smaller than 10 times the second
number; if not, ignore these directory-request statistics reported by this
bridge.
2. If a bridge only reports a `"bridge-ips"` line and no
`"dirreq-v3-reqs"` line, assume that the country distributions are the
same, which is what we're doing right now.
3. If a bridge reports a `"dirreq-v3-reqs"` line, use that for user
number estimates and ignore the `"bridge-ips"` line in case it's present.
Hope this report was not too confusing. Feedback much appreciated.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18167>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs