[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #17321 [CollecTor]: Index to better support downloaders
#17321: Index to better support downloaders
-----------------------------+-----------------
Reporter: atagar | Owner:
Type: enhancement | Status: new
Priority: major | Milestone:
Component: CollecTor | Version:
Resolution: | Keywords:
Actual Points: | Parent ID:
Points: | Sponsor:
-----------------------------+-----------------
Comment (by karsten):
Thanks for starting this discussion. I spent some time thinking about
this in the past days, too, and while I don't have a complete plan in mind
yet, I'd want to share some ideas:
You placed your example index.json somewhere in the middle of the
directory tree and listed all directly contained directories and files.
That's how Apache's index.html works. But that also means that tools like
Stem would need to navigate through the directory tree and read multiple
of these index.json files. And it means that CollecTor would have to
rewrite all these index.json files after an update. While this could
work, it's somewhat complex.
How about we write a single index.json, say
https://collector.torproject.org/index.json, that contains all directories
and files in the directory tree? This would make processing a lot easier.
The obvious downside is that this file could grow quite big. I'm listing
all directories and the number of contained files here:
{{{
0 https://collector.torproject.org/
0 https://collector.torproject.org/archive/
89 https://collector.torproject.org/archive/bridge-descriptors/
52 https://collector.torproject.org/archive/bridge-pool-assignments/
68 https://collector.torproject.org/archive/exit-lists/
1 https://collector.torproject.org/archive/relay-descriptors/
96 https://collector.torproject.org/archive/relay-
descriptors/consensuses/
98 https://collector.torproject.org/archive/relay-descriptors/extra-
infos/
21 https://collector.torproject.org/archive/relay-
descriptors/microdescs/
117 https://collector.torproject.org/archive/relay-descriptors/server-
descriptors/
76 https://collector.torproject.org/archive/relay-descriptors/statuses/
40 https://collector.torproject.org/archive/relay-descriptors/tor/
96 https://collector.torproject.org/archive/relay-descriptors/votes/
75 https://collector.torproject.org/archive/torperf/
0 https://collector.torproject.org/recent/
0 https://collector.torproject.org/recent/bridge-descriptors/
72 https://collector.torproject.org/recent/bridge-descriptors/extra-
infos/
72 https://collector.torproject.org/recent/bridge-descriptors/server-
descriptors/
72 https://collector.torproject.org/recent/bridge-descriptors/statuses/
72 https://collector.torproject.org/recent/exit-lists/
0 https://collector.torproject.org/recent/relay-descriptors/
72 https://collector.torproject.org/recent/relay-
descriptors/consensuses/
72 https://collector.torproject.org/recent/relay-descriptors/extra-
infos/
0 https://collector.torproject.org/recent/relay-descriptors/microdescs/
72 https://collector.torproject.org/recent/relay-descriptors/microdescs
/consensus-microdesc/
72 https://collector.torproject.org/recent/relay-descriptors/microdescs/
72 https://collector.torproject.org/recent/relay-descriptors/server-
descriptors/
576 https://collector.torproject.org/recent/relay-descriptors/votes/
37 https://collector.torproject.org/recent/torperf/
2090 (total)
}}}
If we assume that each directory or file requires 200 characters/bytes in
the index.json, that's an uncompressed file size of 413 KiB. We can
probably save a bit here by removing whitespace, not repeating the
https://collector.torproject.org/ part over and over, etc. What do you
think, is that still reasonable?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/17321#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs