[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #10680 [Analysis]: Obtain attributes of current public bridges
#10680: Obtain attributes of current public bridges
--------------------------+------------------------------
Reporter: sysrqb | Owner:
Type: task | Status: new
Priority: normal | Milestone:
Component: Analysis | Version:
Resolution: | Keywords: bridgedb-parsers
Actual Points: | Parent ID:
Points: |
--------------------------+------------------------------
Comment (by karsten):
Looks like a fine start! I'll comment on the output csv files first:
- Instead of "date", can you change the first column to
"status_published" and put in the publication time of the bridge network
status? I'll aggregate that file using R, similar to how I'm aggregating
the advbwdist-validafter.csv file here: https://gitweb.torproject.org
/metrics-web.git/blob/HEAD:/modules/advbwdist/aggregate.R
- The "ec2bridge" column in current servers.csv is actually a boolean
type, not a number type. It means that whenever there's a "t" in that
column, the "bridges" column contains the number of bridges that in the
EC2 cloud. What you're doing is you're combining two dimensions, version
and ec2bridge, by reporting how many of the EC2 bridges are running Linux.
The current server.csv does not combine dimensions, so there's just one
line for the number of Linux bridges and one line for the number of EC2
bridges. That's sufficient for most use cases, so I'd say let's not
combine dimensions for now.
- The column headers should not be repeated for every bridge status. You
could check if the output csv file exists and only write the header line
if it doesn't.
Regarding options to run your script: I'd appreciate a default mode of
operation that processes only those bridge statuses that it did not
process in an earlier run. I think stem has an option to keep a parse
history of some kind that you might be able to use here. Note that you'll
have to re-read server descriptors and extra-info descriptors in any case,
because they might be referenced from many statuses.
And finally, here are some quick comments on the code, though I can do
another, more thorough code review later:
- Bridge has quite a few attributes that we won't need. For example,
os_version isn't something we include in the output. And we wouldn't
include versions of other Tor-speaking programs like nTor anytime soon
(but rather count them as "other" versions). Oh, and there are no usable
contact lines in bridge descriptors, so we don't need the contact
attribute. I guess what I'm saying is that this is dead code that
shouldn't be there. YAGNI.
- I didn't see where you store the bridge status publication time in
Bridge.
- Both __init__ and set_descriptor_details could accept stem objects
rather than several single parameters.
- unpadded_base64_to_base_16 looks like something that stem should do for
you. If it doesn't, you should ask atagar to implement it in stem.
I didn't make it further through the code yet, but I'm happy to do another
review soon. Let me know!
Thanks!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/10680#comment:17>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs