[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #21588 [- Select a component]: Rewrite the censorship detector used by the Tor Metrics website in Java
#21588: Rewrite the censorship detector used by the Tor Metrics website in Java
--------------------------------------+-----------------
Reporter: karsten | Owner:
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: - Select a component | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
--------------------------------------+-----------------
The censorship detector written by George Danezis in 2011 is the only part
of the Tor Metrics website that is written in Python. We should consider
rewriting it in Java in order to integrate it more closely into the rest
of the Tor Metrics website code. This is also related to #19754.
iwakeh, want to comment on whether this makes sense or not, before
somebody else comes and picks this up?
(The following thoughts depend on whether we reach consensus in the
metrics team that this is even a good idea.)
The first step of this rewrite should be to create a minimal setup of the
Python file that doesn't require setting up an own instance of the Tor
Metrics website. I'll attach a compressed version of the input file
`userstats-detector.csv` to this ticket. Running the Python version
should be as simple as downloading that attachment and the two Python
files `detector.py` and `country_info.py` from
[https://gitweb.torproject.org/metrics-web.git/tree/modules/clients
metrics-web's clients module] and running:
{{{
unxz userstats-detector.csv.xz
python detector.py
}}}
That command should run for a few minutes and produce a couple of files
including `userstats-ranges.csv`, which is the only output file we care
about:
{{{
date,country,minusers,maxusers
2011-09-08,a1,559.698186453,1399.64885163
2011-09-09,a1,469.497090181,1451.46081727
2011-09-11,a1,639.857484235,1457.19233381
2011-09-12,a1,597.260782974,1312.46735446
[...]
}}}
Step two could be to throw out any unused code that is not required to
produce this output file. Ideally, this would happen in one or more
separate commits.
Step three would be to look at required external dependencies to rewrite
the remaining code in Java. I haven't looked at all at this yet, so maybe
this is doable without adding external dependencies, which would be best.
But if external dependencies are necessary, maybe there's something in
Apache Commons that we can use here. In any case, adding external
dependencies requires discussion on this ticket.
Step four would be to do the rewrite and to try out that it produces
roughly the same results (we're cutting off decimal places, for example).
There's a guide on coding style
[wiki:org/teams/MetricsTeam/MetricsJavaStyleGuide#CodingStyle here].
Step five would be to review the new code and integrate it into metrics-
web.
All in all, I could imagine that steps 1 to 4 might be an interesting task
for a new volunteer. Optimistically adding the `metrics-help` keyword.
But let's first discuss whether this rewrite makes sense, or whether
there's a better plan to do it!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21588>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs