[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #25329 [Metrics/Library]: Enable metrics-lib to process large (> 2G) logfiles
#25329: Enable metrics-lib to process large (> 2G) logfiles
---------------------------------+--------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Library | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID: #25317
Points: | Reviewer:
Sponsor: |
---------------------------------+--------------------------
Metrics-lib receives compressed logs, usually of sizes below 600kB. As
this can be dealt with in-memory, this ticket is about handling the logs
that deflate to larger files (approx. 2G).
Commons-compressed doesn't provide methods for determining the deflated
content size (as the command line tool xz does). Other compression types
metrics-lib supports have this option, but it also would require more
changes.
Compression can be very effective. Thus, using a cut-off compressed size
is sort of arbitrary. An example for xz compression: the 3G deflated log
has 589492 compressed input array length; using extreme compression it
even shrinks to a length of 405480; on the other hand a deflated 64M file
can have an input array of 509212 length.
For handling larger log files with metrics-lib some interface changes will
be necessary. Here a suggestion:
{{{
public interface LogDescriptor extends Descriptor {
/**
- * Returns the decompressed raw descriptor bytes of the log.
+ * Returns the compressed raw descriptor bytes of the log.
+ *
+ * <p>For access to the log's decompressed bytes
+ * use method {@code decompressedByteStream}.</p>
+ *
* @since 2.2.0
*/
public byte[] getRawDescriptorBytes();
/**
+ * Returns the decompressed raw descriptor bytes of the log as stream.
+ *
+ * @since 2.2.0
+ */
+ public InputStream decompressedByteStream();
+
}}}
I think this might be easiest to understand and use; and of course the
implementation wouldn't need to change processing for large and 'normal'
logs. It also avoids deciding about the method to find out if a file is
large or not.
Thoughts?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25329>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs