[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #20395 [Metrics/metrics-lib]: metrics-lib should be able to handle large descriptor files
#20395: metrics-lib should be able to handle large descriptor files
---------------------------------+-----------------------------------
Reporter: iwakeh | Owner: karsten
Type: defect | Status: new
Priority: Medium | Milestone: metrics-lib 2.0.0
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+-----------------------------------
Comment (by iwakeh):
I hope I didn't overlook anything:
`DescriptorFile#getDescriptors()` and
`DescriptorParser#parseDescriptors()` don't access files. They receive
Descriptor objects or bytes and will have to keep the bytes, but these
methods don't cause an oom unless their caller provides too much.
The problem lies in the implementation of
`DescriptorReaderImpl$DescriptorReaderRunnable` (which - as an aside -
should be a separate class). There the `readFile` method attempts to read
an entire file and chokes when encountering a huge file.
`DescriptorReaderRunnable` should check the file size before opening in
order to handle the files according to their size. The oom is caused by
reading the entire file into memory and then operating on it in-memory
creating all the Descriptor objects (possibly copying the raw bytes, I
didn't verify) in-memory. Memory usage could be reduced
1. by only reading parts of the huge file and also
2. by not adding the bytes to the descriptor objects and instead simply
keeping the file path and position inside the file in-memory.
Assumptions:
* many Descriptor objects w/o bytes occupy way less space than the
Descriptor objects do currently
* the descriptor containing files are available as long as there are
Descriptor objects referring to them
A sketch of changes:
* Introduce descriptors that either hold their bytes in-memory or have a
file path and in-file position(s) for accessing raw bytes, but don't store
the bytes.
* `DescriptorImpl` parses bytes and produces a list of the adapted
Descriptor objects.
* `DescriptorReaderRunnable` needs to read a certain chunk of a large
file, parse enough to determine the next descriptor, and provide the
parser also with the beginning and end positions in the file.
This stays very closely to the current implementation, the details need
some more work, and it might be necessary to change more.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20395#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs