[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #6569 [Stem]: Stem network status document parsing
#6569: Stem network status document parsing
--------------------+-------------------------------------------------------
Reporter: neena | Owner: neena
Type: task | Status: needs_revision
Priority: normal | Milestone:
Component: Stem | Version:
Keywords: | Parent:
Points: | Actualpoints:
--------------------+-------------------------------------------------------
Changes (by atagar):
* status: needs_review => needs_revision
Comment:
Hi Ravi. Great to see that this is progressing!
> stem/descriptor/__init__.py
> + import stem.descriptor.networkstatus_descriptor
Wrong import and mix of new/old usage. I'm guessing that the tests don't
cover parse_file()? If it is being tested then this might be due to a
phantom networkstatus_descriptor.pyc (I hate it when that happens).
> +def _strptime(string, validate = True, optional = False):
> + try:
> + return datetime.datetime.strptime(string, "%Y-%m-%d %H:%M:%S")
> + except ValueError, exc:
> + if validate or not optional: raise exc
Please end with "else: return None". I know this is the default behavior
when there isn't a return, but explicit is better than implicit. :)
> +class DescriptorParser:
> + """
> + Helper class to parse documents.
> +
> + :var str line: current line to be being parsed
> + :var list lines: list of remaining lines to be parsed
> + """
Needs to extend object. I'm also kinda worried about how it holds on to a
copy of both _raw_content and lines. Ideally it would have neither. I
realize that the Descriptor class has a _raw_content attribute for its
__str__() method. In DescriptorParser's case though _raw_content is
presently unused and it should be lazily fetched since we're dealing with
*far* larger amounts of data here. My suggestion would be for the
constructor to accept a file object, then read from it on demand. This
will let us have constant memory usage.
Speaking of which does this now work when processing a full consensus? I'm
not spotting the test we talked about for this though maybe I'm missing
it...
This class reminds me a lot of the ControlLine class. I didn't need this
style of helper for the server or extrainfo descriptor classes... are you
sure that it's really necessary here? The descriptors do a lot of this
(first pass parsing, 'opt' prefix handling, etc) via the
_get_descriptor_components() helper, and your read_block() method reminds
me a bit of the _get_pseudo_pgp_block() helper.
This style of helper class generally hurts code readability. I made the
ControlLine class because controller responses are generally structured
like a stack, and I wanted to avoid the numerous regexes that TorCtl uses
to handle it. That said, I'd love to get rid of the ControlLine class if
we had a good alternative.
So in summary:
* I'd suggest lazily evaluating what you need from a file object. See the
server or extrainfo descriptor's parse_file() functions - they read from a
file line by line and yield descriptors so its memory usage isn't greater
than a single descriptor.
* I'd also suggest replacing this helper class with helper functions that
give you back python primitives. This module has several such helpers that
we might be able to expand to meet your purposes.
Thoughts?
> stem/descriptor/networkstatus.py
> + :var bool is_valid: **\*** router is valid
> + :var bool is_guard: **\*** router is suitable for use as an entry
guard
> + :var bool is_named: **\*** router is named
Interesting, I hadn't thought of doing a series of boolean flags for this.
However, I'd suggest adding a Flags enum and making this a tuple instead.
Users can do the same with fewer methods and also query the
RouterDescriptor for all of its flags (which you can't do with these
booleans). This would look like...
{{{
Flag = stem.util.enum.Enum(
("GUARD", "Guard"),
("NAMED", "Named"),
...
)
}}}
... then used like...
{{{
>>> my_router_descriptor = RouterDescriptor(...)
>>> print Flag.NAMED in my_router_descriptor.flags
True
>>> for flag in my_router_descriptor.flags:
... print flag
Named
Guard
Valid
}}}
I'll hold off on reviewing the rest for the moment since the
DescriptorParser might lead to quite a bit of refactoring (sorry about
that!).
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6569#comment:2>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs