[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #13600 [Onionoo]: Improve bulk imports of descriptor archives
#13600: Improve bulk imports of descriptor archives
-----------------------------+-----------------
Reporter: karsten | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Onionoo | Version:
Resolution: | Keywords:
Actual Points: | Parent ID:
Points: |
-----------------------------+-----------------
Comment (by leeroy):
Thank you for clearing up the slight differences mentioned. I was hoping
those were minor. There were other differences, but they were clearly
trivial (like omission of rdns, or use of ip for unresolved rdns). I'll
take a look at the code again in NodeStatus.
__Input validation:__ Excellent, I was thinking this too! If extra
validation is going to be performed, it's also worth checking out
streaming data from the archives directly. I suspect this will be to a
significant advantage, as it will no longer be needed to take up extra
space for the uncompressed tarball.
__Parsing archives:__ Sounds good. I was thinking of at least warning the
operator about an accumulation of archives, but with #16424 this isn't as
much of a problem.
__Importing multiple months:__ I was testing this together with looking
into reproducing the smaller directory for parsed data. I got the out-of-
memory-heap error while using --update-only with '''two''' months. It
occurred at approx. 80% (based on time), during consensus parsing (based
on stack trace). So parsing is itself very sensitive to heap memory. I
have some thoughts on how to solve this. Besides the disk-based data
structures to reduce heap dependency, I'll take a look again at metrics-
lib to see if it can benefit from lexer-parser improvements. The heap
dependency during parse could be reduced, while increasing ease-of-
maintenance, by using a grammar-based recognizer, streaming reads (from
archives), and lock-free (cas) lists. It creates a parse-stage that scales
to I/O if done right. Combines parse and write, reducing heap requirement.
__Parsing archives:__ Due to the out-of-memory error I restarted this test
using a smaller data set. I also hope it's harmless, but having seen it I
don't want to rule it out unless provable. I'll notify you here once I
know for sure.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13600#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs