[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #2763 [Metrics]: Do we collect descriptors that don't get into the consensus?
#2763: Do we collect descriptors that don't get into the consensus?
---------------------+------------------------------------------------------
Reporter: arma | Owner: karsten
Type: task | Status: assigned
Priority: normal | Milestone:
Component: Metrics | Version:
Keywords: | Parent:
Points: | Actualpoints:
---------------------+------------------------------------------------------
Comment(by karsten):
Replying to [comment:5 karsten]:
> Replying to [comment:4 nickm]:
> > I wonder if this approach might be insufficient for your requirements.
It will tell you about descriptors that the authorities 'they have
accepted and have decided to keep. It ''won't'' tell us about descriptors
that the authorities immediately rejected, or ones that they decided (for
whatever reason) to drop or replace.
> >
> > Do we care about those factors?
> That's a fine question. I can't say. I guess Sebastian or arma have an
answer. From a metrics POV, we're only interested in the descriptors that
are referenced from consensuses and maybe votes. But I understand the
need to collect unreferenced descriptors for debugging purposes.
>
> What reasons are there for an authority to reject or drop a descriptor?
a) unable to parse and b) changes are cosmetic come to mind. I'm somewhat
concerned about a) here. If we want to include descriptors that the
directory authorities cannot parse, I'll have to improve the metrics code
for parsing descriptors. I'd prefer to not include descriptors from case
a), though. Descriptors from case b) should be fine to archive. Are
there other reasons for the authorities to drop or reject descriptors?
Without having more information what descriptors people want to collect,
I'll assume that whatever we learn by downloading /tor/server/all.z and
/tor/extra/all once per day is sufficient. Please let me know if it's
not.
> > As for the information about download size: you can make it much
smaller. First, instead of downloading "all", download "all.z".
> Right. We should do that for all downloads, I guess.
I added ".z" to all URLs except for extra-info descriptors. It seems that
directory authorities first compress extra-info descriptors and then
concatenate the results. I know that this is permitted in the
specification. Unfortunately, I cannot handle that easily in Java. After
spending two hours on this problem, I decided that developer time is more
valuable than bandwidth and removed the ".z" for extra-info descriptors.
Everything else works fine with ".z". I'm happy to accept a patch if
someone wants to look closer at the Java problem.
> > Second, instead of downloading all extra-info descriptors, read
through the descriptors in tor/server/all.z to see which ones you are
missing, and download only those. I'd bet these approaches combined would
save 60-80% of the expected download size.
> Okay, that should work. Is once per day enough?
I tried downloading /tor/server/all.z and all the extra-info descriptors
referenced from there, and then downloaded /tor/extra/all. The latter
gave me new descriptors that were not referenced from the server
descriptors I had. We're trying to collect all descriptors in the
network, so I enabled downloading both /tor/server/all.z and
/tor/extra/all once per day.
As the next steps I'm going to check whether we still need to import
gabelmoo's cached-* files, and how we can add a timeout per authority to
avoid being delayed by extremely slow authorities.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2763#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs