[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #6232 [Analysis]: Make entropy-over-time graph
#6232: Make entropy-over-time graph
-------------------------+--------------------------------------------------
Reporter: arma | Owner:
Type: enhancement | Status: needs_revision
Priority: normal | Milestone:
Component: Analysis | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Comment(by gsathya):
Replying to [comment:32 karsten]:
Excellent, more coding!
> A few comments after re-reading the whole ticket:
>
> - I wonder if entropies based on subsets of Exit and Guard flagged
relays are correct. I spent yesterday afternoon on trying to learn how
path selection really works
([https://trac.torproject.org/projects/tor/ticket/5755#comment:11 #5755]).
I think we'll have to take bandwidth weights as reported in the footer
section of a consensus into account, too. Those bandwidth weights
influence, for example, how to weight the consensus weight of a relay with
the Exit flag and a relay with Exit ''and'' Guard flag for the exit
position. In a consensus published yesterday, the former was weighted
with Wee=1.0, whereas the latter was weighted with Wed=0.4272. Similarly,
bandwidth weights for the guard position were Wgd=0.2864 and Wgg=0.6446,
so quite different. If we only look at the Exit ''or'' Guard flag of a
relay, we might be quite off. But before we change anything here, I want
to hear back from Mike or Roger if my understanding of path selection is
correct.
>
> - The GeoIP database is part of the sources in metrics-tasks.git,
right? Can we change that and have users provide their own geoip file?
I'm worried that the current "a1" madness influences the results, and I'd
like to swap the current database with the one from February which didn't
have "a1" relays all over.
>
> - Can we add AS-based entropy values, too? There's an AS database from
Maxmind that we could use here. Again, users could provide that database
file, so there's no need to commit it to the Git repo.
Yep, all the three comments can be done pretty easily.
> - In the longer term, do we want to include family diversity? That
metric would consider all relays in the same relay family as one entity,
similar to how we consider all relays in the same country as one entity in
the country diversity metric. I admit that it's hard to extract families
using the current code, because we'd have to parse server descriptors for
that, too. I'm also not certain that the results will be meaningful. So,
longer-term.
>
> - A shorter-term goal could be to compute bandwidth diversity based on
the relays' advertised bandwidths, not based on their consensus weights.
Relays report their advertised bandwidth in their server descriptor; it's
the minimum of bandwidth rate, burst, and observed bandwidth. We'll want
to compute bandwidth diversity for all relays and for exit/guard subsets
as well as location diversity. This is what Roger was referring to in the
last but one paragraph of the ticket description. Again, I admit that
it's non-trivial to extract advertised bandwidths, because we'll have to
parse server descriptors. But it's easier to compute than relay families.
Actually stem can parse the server descriptors now. So this wouldn't be
hard at all. I can teach the script to use stem for both families and
advertised bandwidths.
> gsathya, are you up for more coding fun? Didn't you worry that this
task might be too trivial for a thesis? Hah! :)
Heh indeed! Fun :)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6232#comment:34>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs