[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Comparing Stem, metrics-lib, and zoossh



Hi Karsten, implemented Stem counterparts of these (see attached). On
one hand the code is delightfully simple, but on the other
measurements I got were quite a bit slower. Curious to see what you
get when running at the same place you took your measurements.

Cheers! -Damian


On Thu, Jan 7, 2016 at 8:02 AM, Karsten Loesing <karsten@xxxxxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/01/16 21:25, Damian Johnson wrote:
>> Nice! Few questions...
>>
>> * Where are your metrics-lib scripts used for the benchmarks?
>> Should be easy for me to write stem counterparts once I know what
>> we're running. I'll later be including our demo scripts with the
>> benchmarks later so if possible comments would be nice so they're
>> good examples for newcomers to our libraries.
>
> I'm planning to clean up this code before committing it to a real
> repository, but here's the unclean version in a pastebin:
>
> http://pastebin.com/PEXD09jF
>
>> * Which exact tarballs are you parsing? It would be useful if we
>> ran all our benchmarks on the same host with the same data.
>
> I'm using tarballs from CollecTor, except for microdescriptors which
> I'm processing as described below.
>
> Agreed about running this on the same host in the future.
>
>> * Please take note somewhere of the metric-lib commit id used
>> since I'll want to include that later when we add the results to
>> our site.
>
> Good idea.
>
> For now, I think I'll wait for you to write similar benchmarks for
> Stem to learn whether I need to write any more for metrics-lib.  And
> then I'll clean up things more on my side and commit them somewhere
> more serious than pastebin.
>
>> Sorry I didn't get to this for the task exchange. Been focusing on
>> Nyx so quite a few things have fallen off my radar.
>
> Sure, no worries at all. Very much looking forward to your results!
>
> All the best,
> Karsten
>
>
>>
>> Cheers! -Damian
>>
>>
>> On Sun, Jan 3, 2016 at 9:56 AM, Karsten Loesing
>> <karsten@xxxxxxxxxxxxxx> wrote: Hi Damian,
>>
>> I'm digging out this old thread, because I think it's still
>> relevant.
>>
>> I started writing some performance evaluations for metrics-lib and
>> got some early results.  All examples read a monthly tarball from
>> CollecTor and do something trivial with each contained descriptor
>> that requires parsing them.  Here are the average processing times
>> by type:
>>
>> server-descriptors-2015-11.tar.xz: 0.334261 ms
>> server-descriptors-2015-11.tar: 0.285430 ms
>> extra-infos-2015-11.tar.xz: 0.274610 ms extra-infos-2015-11.tar:
>> 0.215500 ms consensuses-2015-11.tar.xz: 255.760446 ms
>> consensuses-2015-11.tar: 246.713092 ms
>> microdescs-2015-11.tar.xz[*]: 0.099397 ms
>> microdescs-2015-11.tar[*]: 0.066566 ms
>>
>> [*] The microdescs* tarballs contain microdesc consensuses and
>> microdescriptors, but I only cared about the latter; what I did is
>> extract tarballs, delete microdesc consensuses, and re-create and
>> re-compress tarballs
>>
>> These evaluations were all run on a Core i7 with 2GHz using an SSD
>> as storage.
>>
>> Any surprises in these results so far?
>>
>> Would you want to move forward with the comparison and also
>> include Stem?  (And, Philipp, would you want to include Zoossh?)
>>
>> All the best, Karsten
>>
>>
>> On 01/10/15 09:28, Karsten Loesing wrote:
>>>>> Hello Philipp and iwakeh, hello list,
>>>>>
>>>>> Damian and I sat down yesterday at the dev meeting to talk
>>>>> about doing a comparison of the various descriptor-parsing
>>>>> libraries with respect to capabilities, run-time performance,
>>>>> memory usage, etc.
>>>>>
>>>>> We put together a list of things we'd like to compare and
>>>>> tests we'd like to run that we thought we'd want to share
>>>>> with you. Damian and I will both be working on these for
>>>>> metrics-lib for a short while and then switch to Stem.
>>>>> Please feel free to join us in these effort. The result is
>>>>> supposed to live on Stem's home page unless somebody comes up
>>>>> with a better place.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> All the best, Damian and Karsten
>>>>>
>>>>>
>>>>> On 30/09/15 10:57, Karsten Loesing wrote:
>>>>>> 1. capabilities - supported descriptor types - all the ones
>>>>>> on CollecTor's formats.html - hidden service descriptors
>>>>>> (have an agreed @type for that) - getting/producing
>>>>>> descriptors - reading from file/directory - reading from
>>>>>> tarballs - reading from CollecTor's .xz-compressed tarballs
>>>>>> - fetching from CollecTor - downloading from directories
>>>>>> (authorities or mirrors) - generating (for unit test) -
>>>>>> recognizing @type annotation - inferencing from file name -
>>>>>> keeping reading history - user documentation - validation
>>>>>> (format, crypto, successful sanitization) - packages
>>>>>> available - how much usage by (large) applications
>>>>>
>>>>>> 2. performance (CPU time, memory overhead) - compression:
>>>>>> .xz-compressed tarballs/decompressed tarballs/plain-text -
>>>>>> descriptor type: consensus, server descriptor, extra-info
>>>>>> descriptor, microdescriptors - validation: on or off
>>>>>> (allows lazy loading)
>>>>>
>>>>>> 3. tests by descriptor type - @type server-descriptor 1.0
>>>>>> - Stem's "List Outdated Relays" - average advertised
>>>>>> bandwidth - fraction of relays that can exit to port 80 -
>>>>>> @type extra-info 1.0 - sum of all written and read bytes
>>>>>> from write-history/read-history - number of countries from
>>>>>> which v3 requests were received - @type
>>>>>> network-status-consensus-3 - average number of relays with
>>>>>> Exit flag - @type network-status-vote-3 - Stem's "Votes by
>>>>>> Bandwidth Authorities" - @type dir-key-certificate-3 -
>>>>>> @type network-status-microdesc-consensus-3 1.0 - @type
>>>>>> microdescriptor 1.0 - look at single microdesc cons and
>>>>>> microdescs, compile list of extended families - fraction of
>>>>>> relays that can exit to port 80 - @type network-status-2
>>>>>> 1.0 - @type directory 1.0 - @type bridge-network-status -
>>>>>> @type bridge-server-descriptor - @type
>>>>>> bridge-server-descriptor 1.0 - @type bridge-extra-info 1.3
>>>>>> - @type bridge-pool-assignment - @type tordnsel 1.0 - @type
>>>>>> torperf 1.0
>>>>>
>>>>>> 4. action items - get in touch with Dererk for packaging
>>>>>> metrics-lib for Debian
>>>>>
>>>>>
>>>>>
>>
>>
>
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJWjowbAAoJEJD5dJfVqbCr3CwH/iQK7Pj1SqhnjESa2uzGFklZ
> j1qNeS0P2VLbm3HcFZgZd7SOyY90tkYNWAcSxlMlbiLnGyn/eVKkOtlzFyqhiW6O
> y1LF/1Q5a823QFgE0x3y9NcvXmB0LUxnPMzwrWNBVuFtL/HWqF4jfnflNO9HDiCx
> GQi6GtMp+iPwOJgk6lqVpsJ28UqewymxaXHkhA99IKFNOAyMApf+F/vu+HLibQek
> P1b4fUxVPai8a/TB1bg8+Aj6EDB1PvxOBVplneV9tJsDmG8lLZCj/vb4Ko8jiCZ5
> qLndkF0WdraOR82PDeeaf6n62ca/CDxIOWQ3F6Paxa9ZmSiBdAqymBEnIwo7Gfw=
> =PbRy
> -----END PGP SIGNATURE-----
import time

from stem.descriptor import DocumentHandler
from stem.descriptor.reader import DescriptorReader

def measure_average_advertised_bandwidth(path):
  start_time = time.time()
  total_bw, count = 0, 0

  with DescriptorReader(path) as reader:
    for desc in reader:
      total_bw += min(desc.average_bandwidth, desc.burst_bandwidth, desc.observed_bandwidth)
      count += 1

  runtime = time.time() - start_time
  print("Finished measure_average_advertised_bandwidth('%s')" % path)
  print('  Total time: %i seconds' % runtime)
  print('  Processed server descriptors: %i' % count)
  print('  Average advertised bandwidth: %i' % (total_bw / count))
  print('  Time per server descriptor: %0.5f seconds' % (runtime / count))
  print('')

def measure_countries_v3_requests(path):
  start_time = time.time()
  countries, count = set(), 0

  with DescriptorReader(path) as reader:
    for desc in reader:
      if desc.dir_v3_responses:
        countries.update(desc.dir_v3_responses.keys())

      count += 1

  runtime = time.time() - start_time
  print("Finished measure_countries_v3_requests('%s')" % path)
  print('  Total time: %i seconds' % runtime)
  print('  Processed extrainfo descriptors: %i' % count)
  print('  Number of countries: %i' % len(countries))
  print('  Time per extrainfo descriptor: %0.5f seconds' % (runtime / count))
  print('')

def measure_average_relays_exit(path):
  start_time = time.time()
  total_relays, exits, consensuses = 0, 0, 0

  with DescriptorReader(path, document_handler = DocumentHandler.DOCUMENT) as reader:
    for consensus in reader:
      for desc in consensus.routers.values():
        if 'Exit' in desc.flags:
          exits += 1

        total_relays += 1

      consensuses += 1

  runtime = time.time() - start_time
  print("Finished measure_average_relays_exit('%s')" % path)
  print('  Total time: %i seconds' % runtime)
  print('  Processed %i consensuses with %i router status entries' % (consensuses, total_relays))
  print('  Total exits: %i (%0.2f%%)' % (exits, float(exits) / total_relays))
  print('  Time per consensus: %0.5f seconds' % (runtime / consensuses))
  print('')

def measure_fraction_relays_exit_80_microdescriptors(path):
  start_time = time.time()
  exits, count = 0, 0

  with DescriptorReader(path) as reader:
    for desc in reader:
      if desc.exit_policy.can_exit_to(port = 80):
        exits += 1

      count += 1

  runtime = time.time() - start_time
  print("Finished measure_fraction_relays_exit_80_microdescriptors('%s')" % path)
  print('  Total time: %i seconds' % runtime)
  print('  Processed microdescriptors: %i' % count)
  print('  Total exits to port 80: %i (%0.2f%%)' % (exits, float(exits) / count))
  print('  Time per microdescriptor: %0.5f seconds' % (runtime / count))
  print('')

measure_average_advertised_bandwidth('/home/atagar/Desktop/server-descriptors-2015-11.tar')
measure_countries_v3_requests('/home/atagar/Desktop/extra-infos-2015-11.tar')
measure_average_relays_exit('/home/atagar/Desktop/consensuses-2015-11.tar')
measure_fraction_relays_exit_80_microdescriptors('/home/atagar/Desktop/microdescs-2015-11.tar')

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev