[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #19118 [Metrics/Onionoo]: Add organization name to each relay
#19118: Add organization name to each relay
-----------------------------+-----------------------------------
Reporter: virgil | Owner: karsten
Type: enhancement | Status: needs_information
Priority: Medium | Milestone:
Component: Metrics/Onionoo | Version:
Severity: Normal | Resolution:
Keywords: hardening | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+-----------------------------------
Comment (by virgil):
> The CAIDA data doesn't contain IP address ranges, so we'll have to keep
using !MaxMind data in addition to CAIDA data. Okay. But that means that
CAIDA's comprehensiveness in terms of number of ASNs is meaningless to us,
because we're limited to whatever ASNs are in !MaxMind data. (0)
You're right. We still only start with the IP#, and it would be a pain to
implement a method to learn the AS numbers. Okay, that kills any utility
of CAIDA having more ASs.
> !MaxMind contains 67 of its 2833 ASNs (not sure where your 53k number
comes from) that CAIDA does not know about. Right now we'd have
organization names for these ASNs, but once we switch over to using
CAIDA's organization names we'd provide less information there. And I'm
not willing to provide !MaxMind data if CAIDA doesn't have anything for a
given ASN, because nobody will understand that, nor do I want to provide
both organization names. This is a serious problem that I don't know how
to work around cleanly. (-1)
> CAIDA data is only updated every three months, !MaxMind provides a new
update every month. It already happens that people ping me because
!MaxMind's data is old, and that's only going to get worse with CAIDA.
Somewhat related, !MaxMind has been providing ASN data for many years now
without major issues whereas CAIDA apparently started providing data only
2 years ago. (-1)
[http://www.cidr-report.org/cgi-bin/plota?file=%2fvar%2fdata%2fbgp%2fas2.0
%2fbgp-as-count%2etxt&descr=Unique%20ASes&ylabel=Unique%20ASes&with=step
The 53k figure is actually correct]. Additionally, I would never wholly
replace !MaxMind data with CAIDA---the fields convey very different
things. !MaxMind says which organization is the registered owner, while
CAIDA does some cleverness to learn the parent organization.
Thisveryareverydifferent. I would propose that there be a new field,
called something like !`parent_organization` for each relay which is
populated by CAIDA [when it exists]. ÂI claim this sets both of the above
(-1)s to (0).
> We'd still need to write, review, and test code to handle CAIDA's data
format. This could become a neutral if somebody submits a good patch, but
please only do that if that makes the overall sum positive, or that patch
might not get accepted. (-1)
The CAIDA format is a standard CSV.Âhttps://commons.apache.org/proper
/commons-csv/Â (0)
> Operating an Onionoo server becomes a bit harder with an additional data
source to update. We want more people to run Onionoo servers at some
point, so we should make that process easier not harder. (-1)
This is indeed an issue. It seems entirely reasonable to me if
someonedoesntwant to do the CAIDA data, they simply won't have the
!`parent_organization` field. Totally cool with that. (0?)
> !MaxMind indeed contains similar but not equivalent organization names
which should be exactly the same. However, the actual number is lower than
what your pairwise comparison implies, and somebody measuring organization
diversity could always use a similarity metric as yours when looking at
these strings. Anyway, CAIDA is indeed better here than !MaxMind. (1)
So I actually low-balled this for you.
Here'sthe actual numbers.
* # of ASNs for which MM's organizations are different, yet CAIDA's
'parent organization' are the same: 3299
* # of ASNs for which MM's organizationÂare _very_ different, yet CAIDA's
'parent organization' are the same: 1935
I attach a list of those 1935 pairs as
''[https://trac.torproject.org/projects/tor/attachment/ticket/19118/MMs_very_diff.txt
MMs_very_diff.txt]Â.''
Two AS-ORG names being similar is not sufficient nor necessary for two ASs
to be correctly grouped under the same parent organization. We totally
tried to learn these relationships from themaxminddata,Âand failed. I was
in the process of deriving my own method from the academic literature
until I found the CAIDA data which did everything I needed.
I have no stake in this. We tried to use something like !MaxMind for
Roster, failed, but then discovered CAIDA worked. You then requested that
we move as much functionality intoOnionooas possible. So this is me trying
to do that. It's of course totally fine to say that this is too niche a
need to be worth including intoOnionoo. In which case, Roster will just
continue to use its own database for this---which is totally cool. I'm
just trying to, as you requested, upload the goods we found to
theOnionooMothership. This is me exerting effort to be a good uploader of
candidate good things toOnionoo.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/19118#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs