[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-talk] Exit Traffic classification and discrimination

To: tor-talk@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [tor-talk] Exit Traffic classification and discrimination
From: "Fabio Pietrosanti (naif) - lists" <lists@xxxxxxxxxxxxxxx>
Date: Tue, 9 Feb 2016 22:48:00 +0100
Delivered-to: archiver@xxxxxxxx
Delivery-date: Tue, 09 Feb 2016 16:48:21 -0500
In-reply-to: <20160202125014.GM7734@xxxxxxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-talk/>
List-help: <mailto:tor-talk-request@lists.torproject.org?subject=help>
List-id: "all discussion about theory, design, and development of Onion Routing" <tor-talk.lists.torproject.org>
List-post: <mailto:tor-talk@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk>, <mailto:tor-talk-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-talk>, <mailto:tor-talk-request@lists.torproject.org?subject=unsubscribe>
References: <56AE1D6B.6060804@xxxxxxxxxxxxxxx> <20160202125014.GM7734@xxxxxxxxxxxxxx>
Reply-to: tor-talk@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-talk" <tor-talk-bounces@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 2/2/16 1:50 PM, Roger Dingledine wrote:
> On Sun, Jan 31, 2016 at 03:42:51PM +0100, Fabio Pietrosanti (naif) - lists wrote:
>> But 90% of my resources (given the previous hypotetical assumption)
>> would be happily pumping non-abuse-generating Tor exit traffic.
>>
>> Does anyone ever done some kind of testing or analysis about that kind
>> of approach?
> 
> Well, the first question there is to learn whether your assumption
> about destinations is actually true -- is most Tor traffic going to a
> small number of sites, or are many Tor destinations in the "long tail"?
> 
> I spoke to Tariq Elahi at length about exactly this research question,
> because they want to run some exit relays and try to answer it. They had
> some good plans for how to do it safely -- use Privex to combine views
> from several exits so you can't go back and learn which exit saw which
> destination, write nothing to disk except the final answer, etc.

I think that a possible cheap approach could be done, for what's related
to AS-aware approach.

Let's say that we make up an iptables chain rules that just load into a
chain all the destinations of well-known high-traffic destinations based
on the netblocks part of their own Autonomous System.

For example if i want to know from my Tor exit how many connections is
done to twitter and how much traffic is done to Twitter.

We learn that Twitter own AS54888, AS35995 and AS13414 .

We get their IP address netblocks with:

$ whois -h whois.radb.net '!gAS54888'
A182
199.96.56.0/21 199.96.56.0/24 199.96.57.0/24 199.96.58.0/24
199.96.59.0/24 199.96.60.0/24 199.96.61.0/24 103.252.114.0/23
185.45.6.0/23 104.244.43.0/24 199.96.56.0/23 199.96.60.0/23

$ whois -h whois.radb.net '!gAS35995'
8.25.194.0/23 8.25.196.0/23 192.133.78.0/23 8.25.194.0/24 8.25.195.0/24
8.25.196.0/24 8.25.197.0/24 185.45.4.0/24 103.252.112.0/23 185.45.4.0/23

$ whois -h whois.radb.net '!gAS13414'
199.96.57.0/24 199.16.156.0/22 199.59.148.0/22 192.133.76.0/22
192.133.76.0/23 199.96.59.0/24 199.96.58.0/24 199.96.63.0/24
199.96.56.0/21 103.252.112.0/22 103.252.114.0/23 185.45.4.0/23
199.96.62.0/23 199.96.58.0/23 185.45.6.0/23 192.44.68.0/23
192.48.236.0/23 69.12.56.0/21 104.244.40.0/21 104.244.42.0/24
103.252.112.0/23 104.244.43.0/24 185.45.5.0/24 185.45.4.0/24
199.96.56.0/24 202.160.128.0/22 202.160.128.0/24 202.160.129.0/24
202.160.130.0/24 202.160.131.0/24 188.64.224.0/24 188.64.225.0/24
188.64.226.0/24 188.64.227.0/24 188.64.228.0/24 188.64.229.0/24
188.64.230.0/24 188.64.231.0/24 188.64.224.0/21 199.16.156.0/22
199.96.57.0/24 199.96.63.0/24 192.133.76.0/22 8.25.194.0/23
199.96.61.0/24 192.133.78.0/23 199.96.59.0/24 8.25.196.0/23
199.96.60.0/24 199.96.56.0/24 199.96.58.0/24 199.59.148.0/22
199.96.56.0/21 192.133.76.0/23 8.25.195.0/24 8.25.194.0/24 8.25.197.0/24
8.25.196.0/24 103.252.114.0/23 103.252.112.0/23 103.252.112.0/22

Then we create one rule to catch all traffic going to that destination
in a chain, in the same way it's doable with iptables traffic accounting:
http://www.catonmat.net/blog/traffic-accounting-with-iptables/

We put the netblocks in file twitter-netblocks.txt and then make:
for i in `cat twitter-netblocks.txt` ; do echo iptables -I OUTPUT -p tcp
-d $i -m state --state NEW,ESTABLISHED -j twitter ; done

That way it would be possible to have a chain named "twitter" where all
the twitter traffic went trough in the Linux kernels trough the
"twitter" named chain.

By doing the following command it would be possible to know how much
traffic has been related to twitter:
$ iptables -L OUTPUT -n -v -x | grep twitter | awk '{ print $2}' | awk
'{ sum+=$1} END {print sum}'
1596

Knowing the specific total of only Tor Exit traffic (that today can't be
look at with iptables because of the missing
https://trac.torproject.org/projects/tor/ticket/17975 or
https://trac.torproject.org/projects/tor/ticket/18142) and knowing the
amount of Twitter traffic in the linux kernel's iptables chain
accounting, would make possible to say that X% of traffic where Twitter.

That process, established by scripting the process with all the AS of:
Google (17 AS)
Facebook (1 AS)
Twitter (3 AS)
Microsoft (28 AS)
Yahoo (59 AS)
Wikipedia (3 AS)
Linkedin (9 AS)
Github (1 AS)
Cloudflare (5 AS)

It would be possible to extract the information on how much traffic is
going destinated to those company/services in % to the total amount of
Tor Exit traffic.

A smart linux hacker with some iptables and shell script skills could
automate the process, including extraction of those information within a
weekend of fun.

What would be logged would be in the linux kernel buffers will be the
total amount of bytes exchanged for each of the netblocks defined in the
AS of the "destinations", that must be computed.

No source IP addresses or timing on when a new session has been
established would be logged.

Technically it's likely (but must be tested) that iptables could easily
keep some ten thousands of entry, representing all the netblocks of the
autonomous systems of all the destinations that someone would like to
measure that way, because of the linux kernel optimization stuff.

-naif
-- 
tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk

References:
- Re: [tor-talk] Exit Traffic classification and discrimination
  - From: Roger Dingledine

Prev by Author: Re: [tor-talk] A possible way to make end-users to contribute to Tor exit traffic
Next by Author: Re: [tor-talk] A possible way to make end-users to contribute to Tor exit traffic
Previous by thread: Re: [tor-talk] Exit Traffic classification and discrimination
Next by thread: [tor-talk] Poor Browser
Index(es):
- Author
- Thread