[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #9380 [BridgeDB]: BridgeDB should use stem for parsing descriptors according to torspec
#9380: BridgeDB should use stem for parsing descriptors according to torspec
-------------------------+-------------------------------------------------
Reporter: sysrqb | Owner: isis
Type: | Status: needs_revision
enhancement | Milestone:
Priority: normal | Version:
Component: | Keywords: stem,bridgedb-0.2.x,bridgedb-
BridgeDB | parsers
Resolution: | Parent ID:
Actual Points: |
Points: |
-------------------------+-------------------------------------------------
Comment (by atagar):
Hi isis, here's a quick code review for the file you pointed me toward
([https://gitweb.torproject.org/user/isis/bridgedb.git/blob/refs/heads/fix/9380-stem_r3:/lib/bridgedb/parse/descriptors.py
descriptors.py])...
----
{{{
if not descriptors.startswith("published"):
precise = datetime.datetime.now().isoformat(sep=chr(0x20))
timestamp = precise.rsplit('.', 1)[0]
descriptors = "published {t}\n{d}".format(t=timestamp, d=descriptors)
}}}
Interesting, any reason to do 'sep=chr(0x20)' rather than 'sep=' !''?
Regardless, this would be simpler as...
{{{
if not descriptors.startswith("published"):
published_line = "published %s\n" %
datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
descriptor = published_line + descriptor
}}}
----
{{{
routers = networkstatus.BridgeNetworkStatusDocument(descriptors,
validate=validate)
}}}
This is something we've discussed several times but are you sure
BridgeNetworkStatusDocument is what you want? Your earlier comment
indicates that these files are simply a list of router status entries (no
consensus header or footer). If that's the case then you could simplify
this whole function to be somethng like...
{{{
def parseNetworkStatusFile(filename, validate=True):
with open(filename) as descriptor_file:
return list(stem.descriptor.router_status_entry._parse_file(
descriptor_file,
validate,
entry_class =
stem.descriptor.router_status_entry.RouterStatusEntryV2,
))
}}}
This will provide you a list of RouterStatusEntryV2 entries. I think you
really want RouterStatusEntryV3, but your current code is providing you
RouterStatusEntryV2 instances (maybe a bug).
----
{{{
routers = [router for router in document]
return routers
}}}
This could also be...
{{{
return list(document)
}}}
----
{{{
for descriptor in descriptors:
router = descriptors.pop(descriptors.index(descriptor))
}}}
This has a couple issues...
* You're modifying a list while you iterate over it. This is a bad idea
in just about any language (iirc java would raise a
ConcurrentModificationException).
* Doing this is both pointless and very, very inefficient. You're doing a
O(n) iteration over the descripters then doing a O(n) lookup to find the
index then another O(n) modification. In other words this is O(n^2) twice.
This would be both simpler and more efficient as...
{{{
while descriptors:
router = descriptors.pop()
}}}
That said this whole function can simplified as...
{{{
def deduplicate(descriptors):
# Dictionary of {fingerprint: descriptor} for the newest descriptor with
a
# given fingerprint.
fp_to_descriptor = {}
duplicates = []
for desc in descriptors:
if desc.fingerprint in fp_to_descriptor:
conflict = fp_to_descriptor[desc.fingerprint]
if desc.published > conflict.published:
fp_to_descriptor[desc.fingerprint] = desc
duplicates.append(conflict)
else:
duplicates.append(desc)
else:
fp_to_descriptor[desc.fingerprint] = desc
return fp_to_descriptor.values()
}}}
----
{{{
descriptors.extend([router for router in document])
}}}
Again, there's no reason to do list comprehension to just convert an
iterable to a list. Calling...
{{{
[router for router in document]
}}}
... is the same as...
{{{
list(document)
}}}
Cheers! -Damian
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9380#comment:16>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs