Tor descriptor lazy loading (3dac7c51) · Commits · The Tor Project / Network Health / stem

Commit 3dac7c51 authored Jan 25, 2015 by
Damian Johnson
Tor descriptor lazy loading

I've been wanting to do this for years.

When reading a descriptor we parsed every field in it. This is necessary if
we're validating it, but usually users don't care about validation and only
want an attribute or two.

When parsing without validation we now lazy load the document, meaning we
parse fields on-demand rather than everything upfront. This naturally greatly
improves our performance for reading descriptors...

  Server descriptors: 27% faster
  Extrainfo descriptors: 71% faster
  Microdescriptors: 43% faster
  Consensus: 37% faster

It comes at a small cost to our performance for when we read with validation,
but not big enough for it to be a concern. As an added benefit this actually
makes our code a lot more maintainable too!

  https://trac.torproject.org/projects/tor/ticket/14011

--------------------------------------------------------------------------------
Benchmarking script
--------------------------------------------------------------------------------

import time

from stem.descriptor import parse_file

start_time, fingerprints = time.time(), []

for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = True):
  fingerprints.append(desc.fingerprint)

count, runtime = len(fingerprints), time.time() - start_time
print 'read %i descriptors with validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)

start_time, fingerprints = time.time(), []

for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = False):
  fingerprints.append(desc.fingerprint)

count, runtime = len(fingerprints), time.time() - start_time
print 'read %i descriptors without validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)

--------------------------------------------------------------------------------
Results
--------------------------------------------------------------------------------

Please keep in mind these are just the results on my system. These are, of
course, influenced by your system and background load...

Server descriptors:

  before: read 6679 descriptors with validation, took 10.71 seconds (0.00160 seconds per descriptor)
  before: read 6679 descriptors without validation, took 4.46 seconds (0.00067 seconds per descriptor)

  after: read 6679 descriptors with validation, took 11.48 seconds (0.00172 seconds per descriptor)
  after: read 6679 descriptors without validation, took 3.25 seconds (0.00049 seconds per descriptor)

Extrainfo descriptors:

  before: read 6677 descriptors with validation, took 7.91 seconds (0.00119 seconds per descriptor)
  before: read 6677 descriptors without validation, took 7.64 seconds (0.00114 seconds per descriptor)

  after: read 6677 descriptors with validation, took 8.91 seconds (0.00133 seconds per descriptor)
  after: read 6677 descriptors without validation, took 2.22 seconds (0.00033 seconds per descriptor)

Microdescriptors:

  before: read 10526 descriptors with validation, took 2.41 seconds (0.00023 seconds per descriptor)
  before: read 10526 descriptors without validation, took 2.34 seconds (0.00022 seconds per descriptor)

  after: read 10526 descriptors with validation, took 2.74 seconds (0.00026 seconds per descriptor)
  after: read 10526 descriptors without validation, took 1.34 seconds (0.00013 seconds per descriptor)

Consensus:

  before: read 6688 descriptors with validation, took 2.11 seconds (0.00032 seconds per descriptor)
  before: read 6688 descriptors without validation, took 2.04 seconds (0.00030 seconds per descriptor)

  after: read 6688 descriptors with validation, took 2.47 seconds (0.00037 seconds per descriptor)
  after: read 6688 descriptors without validation, took 1.28 seconds (0.00019 seconds per descriptor)
parents 92dd4648 6484250c
Expand all Hide whitespace changes
Inline Side-by-side
Please register or to comment