[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #5047 [Obfsproxy]: Implement basic usage statistics in obfsproxy
#5047: Implement basic usage statistics in obfsproxy
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: asn
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Obfsproxy | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
We should implement some basic usage statistics in obfsproxy to learn
about usage as long as Tor doesn't have support for obfsproxy statistics
(#5040). Once Tor supports these statistics, the implementation in
obfsproxy can be removed. Both Tor's and obfsproxy's statistics should be
equivalent or at least easily comparable.
The idea is to have obfsproxy log incoming connections in a privacy-aware
way and provide a simple script to convert these logs into a format that
can be published without issues. Bridge operators can periodically run
the script and send the output to the Tor developers who publish and
analyze them. The implementation in obfsproxy should be quite simple in
order not to break too much stuff. The conversion script should be dead
simple, so that bridge operators can understand what's going on.
Here's a possible approach:
We want to count daily connections by country and daily unique IP
addresses by country. Similar to other statistics in Tor, we want to
aggregate data over 24-hour periods, resolve IP addresses to country
codes, and round up frequencies to multiples of 8.
1. When obfsproxy starts, it does three things: a) generate a secret
string S that it only keeps in memory; b) note the timestamp TS when it
started; c) create a buffer B with a capacity of 100 log messages.
2. Whenever obfsproxy receives a client connection, it runs steps 3 to 5:
3. It checks whether at least 24 hours have passed since TS. If so, it
flushes all log messages from buffer B, shuffles them, and appends them to
a file on disk. It also increments TS in 24-hour steps until TS is not
more than 24 hours in the past.
4. It checks whether B is full, i.e., contains 100 messages. If so, it
flushes B and appends messages to a file on disk in random order.
5. It creates a new log message containing a) timestamp TS (which is NOT
the current timestamp!), b) the country code of the connecting IP as
resolved by a GeoIP database, c) the hashed IP address using secret S,
i.e., `H(IP || S)` with a cryptographic hash function of the implementor's
choice. An example log message would be `"2012-02-07 14:01:04 de
1234567890123456789012345678901234567890"`.
6. When obfsproxy stops, it does NOT flush the contents of B to disk. It
forgets about S, possibly in a cryptographically secure manner.
The buffer has two functions here. First, it removes the original order
of connections, which may still be meaningful if it contains connections
from countries with few connections. Second, the buffer protects the
timing of single client connections that occur when obfsproxy is
terminated and restarted shortly after a 24-hour interval ends. The
buffer size of 100 was arbitrarily chosen to avoid memory problems on
heavily used bridges. Higher numbers are preferred, but if that makes
things more complicated, 100 should be a large enough number.
The log messages still reveal too much information to be published. They
shouldn't contain IP hashes, and frequencies still need to be rounded up
to the next multiple of 8. The following bash script, which probably
requires a lot more comments, converts a log message file into a format
that can be published by bridge operators.
{{{
#!/bin/bash
echo "Daily rounded total requests by country"
cut -d" " -f1-3 data | sort | uniq -c | \
awk '{printf "%s %s %s %d\n", $2, $3, $4, 8*(int(($1+7)/8))}'
echo "Daily rounded unique IPs by country"
sort data | uniq | cut -d" " -f1-3 | uniq -c | \
awk '{printf "%s %s %s %d\n", $2, $3, $4, 8*(int(($1+7)/8))}'
}}}
Note that the approach taken here was designed to keep the changes to
obfsproxy small. Of course, we could implement everything in obfsproxy
and write nice files that bridge operators can mail to the Tor devs
directly. That would be an implementation similar to what Tor does for
the various statistics. The buffered logging approach seemed to be a good
compromise between not logging sensitive data and not adding too much
code. Whether that is true is a question for the obfsproxy developers.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5047>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs