[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #3261 [Analysis]: Analyze how wrong our bridge usage statistics are
#3261: Analyze how wrong our bridge usage statistics are
----------------------+-----------------------------------------------------
Reporter: karsten | Owner:
Type: task | Status: new
Priority: major | Milestone:
Component: Analysis | Version:
Keywords: | Parent:
Points: | Actualpoints:
----------------------+-----------------------------------------------------
Comment(by karsten):
Replying to [comment:3 arma]:
> Assuming most bridge users find out about a bridge via one of the
bridgedb mechanisms, I think we should look at 'fraction of bridges' as
the primary question rather than 'fraction of bytes'. Bridgedb doesn't
look at capacity after all when deciding what addresses to give out.
>
> So I would ask "Given this hour's networkstatus (written by Tonga), what
fraction of the Running bridges never send us stats covering this hour?"
You're right. Unfortunately, I cannot change the analysis to include
network statuses, at least not easily. I'm only parsing bridge extra-info
descriptor, and even that keeps my machine busy for a few hours for a year
of data, let alone the time I'd have to spend on rewriting the analysis
code.
But I changed the analysis to look at bridge uptime seconds per day that
are covered by stats instead of written bytes. I'm adding up the seconds
for which bridges report usage statistics and the seconds for which they
report written or read bytes. The quotient of the two sums is the
percentage we're looking for. This analysis should be quite close to what
you describe. At least it gives us the idea whether we're talking about
10, 30, 50, 70, or 90% here.
See the attached graph that I just updated. The upper part contains the
old approach where we weight by written bytes, and the lower part is the
new analysis that weights by uptime seconds. So, the fraction of bridges
reporting statistics has been at 20% until August 2011 and has then
magically increased to 40%.
> (Treating load as uniform across bridges is the wrong thing to do for
users who learn their bridge through a non-bridgedb mechanism, like
hearing from a friend what bridge they use. I wonder how we can estimate
what fraction of bridge users learn about their bridge in what way. We
could say that there probably aren't many such users because it involves
manual interaction; or we could say that there aren't many users of the
bridgedb approach because it gives out bridges that don't work in China so
they're moot. I'm inclined toward the former.)
Do we have any data about users who learn about their bridges through a
non-BridgeDB mechanism? You mean public bridges, right? Because we don't
have statistics from private bridges, which is an unrelated problem. I
don't know what data to use here, so I'm going to ignore the fact that
non-BridgeDB bridge discovery mechanisms exist for now.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/3261#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs