[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #2718 [Metrics]: Analyze Tor usage data for ways to automatically detect country-wide blockings
#2718: Analyze Tor usage data for ways to automatically detect country-wide
blockings
---------------------+------------------------------------------------------
Reporter: karsten | Owner: karsten
Type: task | Status: new
Priority: normal | Milestone:
Component: Metrics | Version:
Keywords: | Parent:
Points: | Actualpoints:
---------------------+------------------------------------------------------
Every now and then, there are country-wide blockings of Tor. In most
cases we learn about these events from users telling us that Tor has
stopped working from them. This may work okay, but given that we already
have usage data per country, we should be able to detect blockings
ourselves, preferrably automatically and with as few false positives as
possible.
I already spent some time on a censorship detector that takes our usage
data as input and tells us whenever the usage on a given day falls outside
an expected interval. But I'm afraid I don't know enough math to push
this further, at least not without reading more about time series
analysis. Maybe someone wants to pick this up?
Here's where I am:
We take our estimated daily user numbers as input. Our goal is to give
out a warning whenever the estimated user number from a given country
drops below a predicted value. This predicted value is not static, but
should depend on previous values, therefore we should use time series
analysis. We want to model the user numbers for days 1..n-1, predict a
value for day n, and warn if the actual value for day n is lower than the
predicted value minus some error.
I read some stuff about time series analysis and came up with the ARIMA
model. Thankfully, the ARIMA model is already implemented in R.
I'm going to upload some R code to the [http://gitweb.torproject.org
/metrics-tasks.git metrics-tasks] repository once I have a ticket number
(see comment below). The R code generates a PDF that shows on which days
we'd receive a warning. I'm also going to attach the PDf to this ticket.
Here's how you can run the R code yourself:
{{{
$ wget https://metrics.torproject.org/csv/direct-users.csv
$ R --slave -f detect-censorship.R
}}}
Possible next steps are a) finding good parameters for the ARIMA model, b)
trying other time series models, and c) extending the approach to bridge
users. Once we have a useful approach for estimated daily user numbers,
we should d) try to get rid of day-based statistics which have a delay of
1--2 days and make the approach work for directory request stats and
connecting bridge user stats to get results more quickly. The final step
is to e) integrate the R code with the metrics website and execute it
every few hours.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2718>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs