[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-dev] Looking at Bridge users oddities



Hi,

Jake made me notice that there has been a huge spike in tor bridge users
in Italy in the past week
(https://metrics.torproject.org/users.html?graph=bridge-users&start=2011-02-19&end=2011-06-17&country=it&dpi=72#bridge-users).

This prompted me to hack up a quick little script to analyse the Tor
bridge usage data and see if it is possible to draw some conclusions
from some patterns that emerge in it.
I don't feel like expressing a personal an opinion at this point on what
this data might mean, but I think spikes in Bridge usage could be used
do draw conclusions on socio-political happenings.

What the script does is it basically looks for spikes in Bridge traffic
by giving it a time frame and a factor for triggering the alert. As
already said it's just a quick hack so don't expect it to be performing
or even well written :P.

A couple of neat things that came out:

factor | date1 date2 1: users at date1 2: users at date2 (country code)

23.0512820513 | 2011-06-10 2011-05-29 1: 1798 2: 78 (es)

22.0707070707 | 2011-06-10 2011-06-05 1: 4370 2: 198 (it)

4.14 | 2011-06-11 2011-06-04 1: 207 2: 50 (tw)

5.13333333333 | 2011-06-11 2011-06-05 1: 231 2: 45 (ar)

5.42857142857 | 2011-06-11 2011-06-05 1: 836 2: 154 (br)

4.38181818182 | 2011-06-11 2011-06-01 1: 241 2: 55 (il)

16.5208333333 | 2011-02-15 2011-01-18 1: 48 2: 793 (de)

15.4285714286 | 2010-08-23 2010-07-26 1: 648 2: 42 (au)

13.7916666667 | 2011-03-11 2011-02-15 1: 1655 2: 120 (sa)

A one liner to get some interesting stats:

python bridge-user-alert.py -f 4 | sort -g

Hope somebody finds this little research useful,

Cheers,
Art.
#!/usr/bin/env python

import csv
import pprint,sys
from datetime import date
from optparse import OptionParser

class BridgeCSV:
	def __init__(self):
		fp = open('bridge-users.csv', 'r')
		self.bridgeReader = csv.reader(fp,delimiter=',')
		self.names = self.bridgeReader.next()
		self.minimum = 40
		self.data = []

	def read(self):
		for row in self.bridgeReader:
			self.data.append(row)

	def get_date(self, date):
		for row in self.data:
			if(row[0] == date):
				return row
		return False

	def anomaly(self, row1, row2, fact):
		if(row1 == False or row2 == False):
			a = 0
			#print "ERROR!"
		else:
			for i in range(1,len(row1)):
				if(row1[i] != "NA" and row2[i] != "NA"):
					if(int(row2[i]) > self.minimum and int(row1[i]) > self.minimum): 
						if(float(row2[i]) > float(row1[i])):
							#print "%s %s %s 1: %s 2: %s (%s)" % (float(row2[i])/float(row1[i]),row1[0],row2[0],row1[i],row2[i],self.names[i])
							if(float(row2[i])/float(row1[i]) > float(fact)):
								print "%s | %s %s 1: %s 2: %s (%s)" % (float(row2[i])/float(row1[i]),row1[0],row2[0],row1[i],row2[i],self.names[i])
						else:
							if(float(row1[i])/float(row2[i]) > float(fact)):
								print "%s | %s %s 1: %s 2: %s (%s)" % (float(row1[i])/float(row2[i]),row1[0],row2[0],row1[i],row2[i],self.names[i])
			

	def print_row(self, row):
		if(row == False):
			print "error no data found!"
			return False
		for i in range(0,len(row)):
			print "%s: %s" % (self.names[i],row[i])

a = BridgeCSV()
a.read()

parser = OptionParser()
parser.add_option("-s", "--start", dest="startdate",
                  help="Start date in format X day after 1. 1. 0001", default=734000)
parser.add_option("-e", "--end",
                  dest="enddate",
                  help="End date in format X day after 1. 1. 0001", default=734300)
parser.add_option("-p", "--period",
									help="timeframe to use when looking for anomalies", dest="period", default=30)

parser.add_option("-f", "--factor",
									help="The factor to use when detecting anomalies, low factor means sensitive, factor < 1 looks for decreases", dest="factor",
									default=3)

(options, args) = parser.parse_args()


print "Analysing from %s to %s" % (date.fromordinal(options.startdate).strftime("%Y-%m-%d"),date.fromordinal(options.enddate).strftime("%Y-%m-%d"))
print "using a factor of %s and a timeframe of %s" % (options.factor, options.period)

for i in range(options.startdate,options.enddate):
	for j in range(i-options.period,i):
		date1 = date.fromordinal(i).strftime("%Y-%m-%d")
		date2 = date.fromordinal(j).strftime("%Y-%m-%d")
		#print "%s %s"  % (date1,date2)
		a.anomaly(a.get_date(date1),a.get_date(date2),options.factor)
#a.print_row(a.get_date("2010-12-10"))




_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev