[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #6569 [Stem]: Stem network status document parsing
#6569: Stem network status document parsing
--------------------+-------------------------------------------------------
Reporter: neena | Owner: neena
Type: task | Status: needs_review
Priority: normal | Milestone:
Component: Stem | Version:
Keywords: | Parent:
Points: | Actualpoints:
--------------------+-------------------------------------------------------
Comment(by atagar):
Hi Ravi. Just some first thoughts so far from my bus ride...
> + file_parser = lambda f: stem.descriptor.networkstatus.parse_file(f,
True, "microdesc")
Very neat, I've never seen this trick before. I like it.
> + desc =
stem.descriptor.networkstatus.NetworkStatusDocument(descriptor_file.read())
Again this is frontloading the read rather than passing the file object
itself to the NetworkStatusDocument. This means that the memory usage is,
at a minimum, as large as the whole consensus file. If we process
individual consensus entries then that'll be reduced to the size of an
individual consensus entry.
I realize that this is a large change but it's likely necessary to avoid
swapping on some systems.
> + if " " in line:
> + keyword = line.split(" ", 1)[0]
> + if keyword == "opt":
> + keyword = line.split(" ", 2)[1]
> + else: keyword = line.strip()
I probably would have done the same though, on reflection, this can
actually be reduced to...
{{{
if line.startswith("opt "):
line = line[4:]
keyword = line.split(" ", 1)[0].strip()
}}}
... this is because split doesn't need the whitespace to have a first
element...
{{{
>>> line = "foobar"
>>> line.split(" ", 1)[0]
'foobar'
}}}
> +try:
> + from cStringIO import StringIO
> +except:
> + from StringIO import StringIO
Out of curiosity when is cStringIO unavailable? Iirc we're using it
elsewhere too so if it, say, isn't available on python 2.5 then that would
be good to know.
> +Flag = stem.util.enum.Enum(
> + ("AUTHORITY", "Authority"),
> + ("BADEXIT", "BadExit"),
> + ("EXIT", "Exit"),
> + ("FAST", "Fast"),
> + ("GUARD", "Guard"),
> + ("HSDIR", "HSDir"),
> + ("NAMED", "Named"),
> + ("RUNNING", "Running"),
> + ("STABLE", "Stable"),
> + ("UNNAMED", "Unnamed"),
> + ("V2DIR", "V2Dir"),
> + ("VALID", "Valid"),
> + )
> +
> +Flag = stem.util.enum.Enum(*[(flag.upper(), flag) for flag in
["Authority", "BadExit", "Exit", "Fast", ...
Duplicate assignment. Personally I think the first is more readable though
up to you. If you like the second better then please put the flag list
into a temporary variable first to avoid having a single enormous line.
Also, minor nitpick: when doing a multi-line block please place the last
parentheses at the same indentation as the block. Ie...
{{{
Flag = stem.util.enum.Enum(
("AUTHORITY", "Authority"),
...
)
}}}
... mostly just for consistency and to give a little better visual que
that it's the end of the block.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6569#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs