[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #20234 [Metrics/CollecTor]: Define CollecTor's file-structure protocol 1.0
#20234: Define CollecTor's file-structure protocol 1.0
-------------------------------+--------------------------------
Reporter: karsten | Owner: iwakeh
Type: enhancement | Status: needs_revision
Priority: High | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+--------------------------------
Changes (by karsten):
* status: needs_review => needs_revision
Comment:
Thanks for starting this! Here are some answers and some feedback:
- It makes sense to specify the web-visible directories in this protocol,
but what's the reason for also specifying the web-invisible `out/`
directory there? If the audience is developers who rely on the directory
structure provided via HTTP, I'd say it's fine and even better to leave
out that last directory. And if the audience is operators and
contributors, then we might have to include even more directories,
including the `stats/` directory and others. For comparison, the Onionoo
protocol specification doesn't say anything about the `status/` directory
which would be important for operators and contributors but which Onionoo
client developers don't need to worry about.
- "Shouldn't 'exit-list' be changed to 'exit-lists'?" -- Yes, we can do
that. In fact, I had this on my local TODO list for years and only
recently dropped it, because meh, but if you also found this confusing,
then it gets above the meh threshold again. Let's do it.
- "Shouldn't there be different markers for different torperf sources?"
-- Maybe, but I'd rather not want to touch anything with the label Torperf
on it unless it breaks apart or explodes. Let's wait for the switch to
OnionPerf and do something reasonable there.
- "The 'compression-type' is one element of "xz", "gz", or "zip". XXXX
Is this true?" -- No, the only compression type that is currently in use
is "xz". We did use "bz2" until a few years ago, but we recompressed all
tarballs, because "xz" compresses much better. Of course, there's no
guarantee that we'll stick with "xz" forever, so it might be fine to
mention all possible compression types there.
- Section 2.4 says that server descriptors are sorted into tarballs by
download date. That's not true, we're using published dates just like
we're sorting extra-info descriptors into tarballs.
- In Section 4.1.1, you ask: "Shouldn't the seconds be dropped?" -- No,
because it's just coincidence that seconds are always zero. That's
because the new scheduler is super precise compared to the cron-based
scheduling which put a 01 or 02 there at times.
- Also in Section 4.1.1, "Why not group extra-info according to published
time?" -- I don't understand that question. Can you rephrase?
- In Section 4.2.1, "What is the reason _not_ to group according to
published time?" -- This question is very related to my recent thoughts on
appending multiple votes to a single file:
https://trac.torproject.org/projects/tor/ticket/20228#comment:2.
Basically, if we were to store server descriptors and extra-info
descriptors in hourly files, I'd expect that we update a couple of those
files during a single update run. (In fact, see the command and output
below.) And a client who wants to stay up to date would have to download
all files that have changed. Therefore it's much easier to append
everything we learn in a single execution to a single file.
{{{
wget -O - https://collector.torproject.org/recent/relay-descriptors
/server-descriptors/2016-09-28-09-05-00-server-descriptors | grep
"^published " | cut -c1-23 | sort | uniq -c
1 published 2016-09-28 04 # <- this comes quite late
7 published 2016-09-28 07 # <- these, too
786 published 2016-09-28 08 # <- one would only expect those
16 published 2016-09-28 09 # <- and maybe a few of those
3 published 2016-09-28 10 # <- hello, future
1 published 2016-09-28 11 # <- and future
1 published 2016-09-28 16 # <- and future
1 published 2016-09-28 18 # <- hello, wrong clock
}}}
- I didn't look at Section 5 yet, because it's yet unclear whether that
section belongs in the protocol.
Again, thanks for writing this document!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20234#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs