Skip to content
Commit 3dac7c51 authored by Damian Johnson's avatar Damian Johnson
Browse files

Tor descriptor lazy loading

I've been wanting to do this for years.

When reading a descriptor we parsed every field in it. This is necessary if
we're validating it, but usually users don't care about validation and only
want an attribute or two.

When parsing without validation we now lazy load the document, meaning we
parse fields on-demand rather than everything upfront. This naturally greatly
improves our performance for reading descriptors...

  Server descriptors: 27% faster
  Extrainfo descriptors: 71% faster
  Microdescriptors: 43% faster
  Consensus: 37% faster

It comes at a small cost to our performance for when we read with validation,
but not big enough for it to be a concern. As an added benefit this actually
makes our code a lot more maintainable too!

  https://trac.torproject.org/projects/tor/ticket/14011

--------------------------------------------------------------------------------
Benchmarking script
--------------------------------------------------------------------------------

import time

from stem.descriptor import parse_file

start_time, fingerprints = time.time(), []

for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = True):
  fingerprints.append(desc.fingerprint)

count, runtime = len(fingerprints), time.time() - start_time
print 'read %i descriptors with validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)

start_time, fingerprints = time.time(), []

for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = False):
  fingerprints.append(desc.fingerprint)

count, runtime = len(fingerprints), time.time() - start_time
print 'read %i descriptors without validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)

--------------------------------------------------------------------------------
Results
--------------------------------------------------------------------------------

Please keep in mind these are just the results on my system. These are, of
course, influenced by your system and background load...

Server descriptors:

  before: read 6679 descriptors with validation, took 10.71 seconds (0.00160 seconds per descriptor)
  before: read 6679 descriptors without validation, took 4.46 seconds (0.00067 seconds per descriptor)

  after: read 6679 descriptors with validation, took 11.48 seconds (0.00172 seconds per descriptor)
  after: read 6679 descriptors without validation, took 3.25 seconds (0.00049 seconds per descriptor)

Extrainfo descriptors:

  before: read 6677 descriptors with validation, took 7.91 seconds (0.00119 seconds per descriptor)
  before: read 6677 descriptors without validation, took 7.64 seconds (0.00114 seconds per descriptor)

  after: read 6677 descriptors with validation, took 8.91 seconds (0.00133 seconds per descriptor)
  after: read 6677 descriptors without validation, took 2.22 seconds (0.00033 seconds per descriptor)

Microdescriptors:

  before: read 10526 descriptors with validation, took 2.41 seconds (0.00023 seconds per descriptor)
  before: read 10526 descriptors without validation, took 2.34 seconds (0.00022 seconds per descriptor)

  after: read 10526 descriptors with validation, took 2.74 seconds (0.00026 seconds per descriptor)
  after: read 10526 descriptors without validation, took 1.34 seconds (0.00013 seconds per descriptor)

Consensus:

  before: read 6688 descriptors with validation, took 2.11 seconds (0.00032 seconds per descriptor)
  before: read 6688 descriptors without validation, took 2.04 seconds (0.00030 seconds per descriptor)

  after: read 6688 descriptors with validation, took 2.47 seconds (0.00037 seconds per descriptor)
  after: read 6688 descriptors without validation, took 1.28 seconds (0.00019 seconds per descriptor)
parents 92dd4648 6484250c
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment