[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #13125 [Tor]: Review the guardiness python script of #9321
#13125: Review the guardiness python script of #9321
------------------------+--------------------------------
Reporter: asn | Owner:
Type: defect | Status: needs_review
Priority: normal | Milestone: Tor: 0.2.6.x-final
Component: Tor | Version:
Resolution: | Keywords: tor-guard tor-auth
Actual Points: | Parent ID: #9321
Points: |
------------------------+--------------------------------
Comment (by asn):
Replying to [comment:12 nickm]:
>
Thanks for the review. I fixed some but not all stuff and pushed the
changes to my branch.
Some comments:
> * Hmmmmm. I worry slightly about using stem for parsing here
> ... remind me, does stem accept unrecognized elements and fields and
> so forth in documents, or does it reject those? I thought
> it sometimes needed to get patched when we added new fields. If
> that's right, this will make us fail when the format evolves.
>
> * If I'm wrong about stem's behavior, great.
>
> * If I'm right, we need to give it a way to be more tolerant.
>
> * Regardless, we should make sure that we can recover from
> misformatted cnsensuses.
>
Good point. As Damian said, stem is filtering unknown lines and putting
them in a special list. I can do some testing wrt soon, to make sure it
works fine.
> * The output format should be made extensible so that it can handle more
than
> one ID format. Soon, Ed25519+RSA identity pairs will exist. After
> that, Ed25519-alone will exist.
>
> * The schema should also probably handle this situation.
>
That's also a good point.
Currently, I'm using the fingerprint of the guard as its unique
identifier. The problem I guess is that in the future, each guard will
have 2 such fingerprints. And in the future future, it will have one but
not the one it has now.
How should this be fixed I wonder? Should I change the schema and output
file format to support multiple fingerprints? So that we include both RSA
and Ed25519?
To do so, I will have to change the output file format quite a bit
again... Both Python and Tor code will need to change, and unittests will
need to be rewritten. Should I change the output file format to be yaml or
something? :/ But then how will Tor parse it...
> * Probably this needs a LICENSE or COPYING file. And a license.
>
Done. Used unlicense.
> * Do we verify consensuses now?
>
No we don't. #11045 is the corresponding ticket. Damian asked me to take
care of that ticket, which might happen but not any time very soon.
However, I don't worry *too* much about this. The cron script for now,
will just copy the consensus from Tor's data directory to the
guardfraction directory every hour. So the consensus should be authentic.
I'm also going to suggest to dirauth operators to just run the cron script
for a month or so before enabling the code, to have a sufficiently
populated guardfraction db.
> * README should explain how and what to backup.
>
Hm, what do you mean backup?
> * The cron script probably shouldn't be called cron.sh, right?
>
True. Renamed to `guardfraction_cron.sh`.
> * Is the cron script smart enough to exit if something fails?
>
Done. Now it's smarter!
> * In guardfraction.py ... double-check that max_days is positive? And
not huge?
>
Done.
> * Check that we're not going backwards in time?
>
Hm, how should I do that?
Should I take the latest consensus from the database, and make sure that
the current time is later than the `valid-after` of that consensus?
> * Do you need an index on consensus.consensus_date to make delete and
count fast enough?
>
I think that's a good idea since it's used in multiple `WHERE` clauses,
but I'm not very good at databases. I added the index in any case.
> * find_missing_hours_from_list() isn't actually right: We don't
guarantee a one-hour timer. Instead you need to check for gaps in the
(valid-after, fresh-until) series of times. But this would mean that just
counting consensuses isn't right. Maybe each consensus needs a duration-
fresh field.
>
Yeah this was a hard problem. The current implementation is a rough hack,
to give dirauth operators a rough idea of which consensuses are missing...
> * How does the nested query in read_db_file() perform? Is it even
right? I don't think the parentheses balance.
I ```think``` it's right. At least its output seems to be correct, and the
unittests test it.
Which parentheses don't match?
Thanks for the code review again! We can discuss this during the next tor
meeting if you want :)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13125#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs