[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[tor-dev] Export BridgeDB usage statistics

To: tor-dev@xxxxxxxxxxxxxxxxxxxx
Subject: [tor-dev] Export BridgeDB usage statistics
From: Philipp Winter <phw@xxxxxxxxx>
Date: Tue, 23 Apr 2019 17:50:02 -0700
Delivered-to: archiver@xxxxxxxx
Delivery-date: Tue, 23 Apr 2019 20:50:19 -0400
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
Mail-followup-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>

Hi Karsten,

I'm working on <https://bugs.torproject.org/9316>, which will make
BridgeDB export usage statistics.  I would like these statistics to be
public, privacy-preserving, and -- ideally -- added to Tor Metrics.  I
wanted to hear your thoughts on 1) what statistics we should collect,
2) how we can collect these statistics safely, and 3) what format these
statistics should have.

Broadly speaking, these statistics should answer the following
questions:

  * How many requests does BridgeDB see per day?
  * What obfuscation protocols are the most popular?
  * What bridge distribution mechanisms are the most popular?
  * From what countries do we see the most bridge requests?
  * How many BridgeDB requests fail and succeed, respectively?
  * How many requests does BridgeDB see from Yahoo/Gmail/Riseup?
  * How many HTTPS requests are coming from proxies?
  * How many requests are suspicious, and likely issued by bots?

Each request to BridgeDB carries with it some information, which allows
us to answer the above questions.  I suggest that we collect the
following:

  * The distribution mechanism.  Currently, this is HTTPS, email, or
    Moat.

  * The requested transport.  Currently this is vanilla, fte, obfs3,
    obfs4, or scramblesuit.

  * The request's origin.  For Moat and HTTPS, it's the two-letter
    country code, e.g., IT for Italy.  For email, it's the user's email
    domain (Gmail, Yahoo, or Riseup).

  * Whether the request was successful or unsuccessful, i.e., resulted
    in BridgeDB handing out bridges or not.

  * Whether the request was issued by a user or a bot.
    David suggested heuristics that would allow us to estimate if a
    request came from a bot:
    <https://bugs.torproject.org/9316#comment:19> I like these
    suggestions but I'm not sure yet how to encode them -- it's more
    complex than a simple binary flag.

The combination of these statistics results in ~16,800 buckets (3
mechanisms * 5 transports * ~280 ISO country codes * 2 success states *
2 bot states).  We only need to export statistics with non-empty
buckets.  To protect users whose request is the only one in a given
bucket (e.g., there may be only one user in Turkmenistan who
successfully requested an FTE bridge over HTTPS on 2019-04-02), we
should bin the statistics by rounding them up to the next multiple of,
say, 10.  We should further export statistics infrequently -- maybe once
a day.

Here's an example of a simple CSV format that takes into account the
above:

  timestamp,mechanism,transport,country|domain,success,count,origin
  1555977600,https,vanilla,it,successful,40,user
  1555977600,https,obfs4,ca,unsuccessful,10,user
  1555977600,email,vanilla,yahoo.com,successful,50,user
  ...

What are your thoughts?

Thanks,
Philipp
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Follow-Ups:
- Re: [tor-dev] Export BridgeDB usage statistics
  - From: teor

Prev by Author: Re: [tor-dev] Tor Directory Meta-Format + Line Wrapping?
Next by Author: [tor-dev] Support for clients using shutdown(SHUT_WR)
Previous by thread: Re: [tor-dev] Proposing sbws changes
Next by thread: Re: [tor-dev] Export BridgeDB usage statistics
Index(es):
- Author
- Thread