[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #16424 [metrics-lib]: Support parsing of .xz compressed tarballs



#16424: Support parsing of .xz compressed tarballs
-----------------------------+---------------------
     Reporter:  karsten      |      Owner:  karsten
         Type:  enhancement  |     Status:  new
     Priority:  normal       |  Milestone:
    Component:  metrics-lib  |    Version:
   Resolution:               |   Keywords:
Actual Points:               |  Parent ID:
       Points:               |
-----------------------------+---------------------

Comment (by leeroy):

 Which other improvements? The parsing improvement or read from archive? If
 the parsing improvement, it's a surprise. As far as the read from archives
 is concerned, I'll paraphrase in pseudo-pseudocode what metrics-lib does
 versus what I do in the benchmark.

 __Metrics-lib__

  1. (8k reads) get bufferedinputstream(get tararchiveinputstream(get
 fileinputstream from archive))
  1. (set position) get a tar entry corresponding to a contained descriptor
 file
  1. construct an unbounded bytearrayoutputstream of unknown size requiring
 JVM management
  1. read 1k from bufferedinputstream into a separate 1k buffer
  1. copy this 1k buffer into bytearrayoutputstream
  1. repeat (4) and (5) until all bytes of the tar entry are read
  1. now bytearrayoutputstream is a copy of the tar entry, convert
 bytearrayoutputstream to bytearray (make another copy)
  1. parse the copy

 __Benchmark__

  1. (8k reads) get archiveinputstream(get bufferedinputstream(get
 fileinputstream from archive))
  1. (set position) get an archive entry corresponding to a contained
 descriptor file
  1. construct a bounded byte array from the known size of the archive
 entry, a primitive which is only GC'd by JVM
  1. read the archive entry into the byte array
  1. parse the byte array

 Maybe it doesn't matter much. Which is why I'll next be doing some testing
 of metrics-lib performance. I'll post some results once had a chance to do
 the tests on metrics-lib. Mostly the read is only noticeable during large
 archive entries. If you want to try it out put the partial 2015-07.tar.xz
 archives in the same folder as the benchmark, and make sure you have
 metrics-lib plus it's dependencies (libcommon-codec, libcommon-compress),
 then compile/run making sure to set your classpath.

 javac -cp /usr/share/java/*:.:descriptor.jar Benchmark16424.java

 java -cp /usr/share/java/*:.:descriptor.jar Benchmark16424

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/16424#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs