[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-commits] [torspec/master] Close proposal 166 and make xxx-geoip-survey-plan obsolete
commit 6501e1e80a6eb44aa1ff089ced2870b6728865a8
Author: Nick Mathewson <nickm@xxxxxxxxxxxxxx>
Date: Wed Mar 2 11:20:33 2011 -0500
Close proposal 166 and make xxx-geoip-survey-plan obsolete
Karsten confirms that 166 is implemented, and xxx-geoip-survey-plan is
superseded by this tech report:
https://metrics.torproject.org/papers/countingusers-2010-11-30.pdf
---
proposals/000-index.txt | 4 +-
proposals/166-statistics-extra-info-docs.txt | 2 +-
proposals/ideas/old/xxx-geoip-survey-plan.txt | 137 +++++++++++++++++++++++++
proposals/ideas/xxx-geoip-survey-plan.txt | 137 -------------------------
4 files changed, 140 insertions(+), 140 deletions(-)
diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index 48ec6a8..91c2f27 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -86,7 +86,7 @@ Proposals by number:
163 Detecting whether a connection comes from a client [OPEN]
164 Reporting the status of server votes [OPEN]
165 Easy migration for voting authority sets [OPEN]
-166 Including Network Statistics in Extra-Info Documents [ACCEPTED]
+166 Including Network Statistics in Extra-Info Documents [CLOSED]
167 Vote on network parameters in consensus [CLOSED]
168 Reduce default circuit window [OPEN]
169 Eliminate TLS renegotiation for the Tor connection handshake [SUPERSEDED]
@@ -137,7 +137,6 @@ Proposals by status:
140 Provide diffs between consensuses [for 0.2.2.x]
147 Eliminate the need for v2 directories in generating v3 directories [for 0.2.1.x]
157 Make certificate downloads specific [for 0.2.1.x]
- 166 Including Network Statistics in Extra-Info Documents [for 0.2.2]
172 GETINFO controller option for circuit information
173 GETINFO Option Expansion
174 Optimistic Data for Tor: Server Side
@@ -179,6 +178,7 @@ Proposals by status:
148 Stream end reasons from the client side should be uniform [in 0.2.1.9-alpha]
150 Exclude Exit Nodes from a circuit [in 0.2.1.3-alpha]
152 Optionally allow exit from single-hop circuits [in 0.2.1.6-alpha]
+ 166 Including Network Statistics in Extra-Info Documents [for 0.2.2]
167 Vote on network parameters in consensus [in 0.2.2]
SUPERSEDED:
112 Bring Back Pathlen Coin Weight
diff --git a/proposals/166-statistics-extra-info-docs.txt b/proposals/166-statistics-extra-info-docs.txt
index ab2716a..8b0c6a1 100644
--- a/proposals/166-statistics-extra-info-docs.txt
+++ b/proposals/166-statistics-extra-info-docs.txt
@@ -3,7 +3,7 @@ Title: Including Network Statistics in Extra-Info Documents
Author: Karsten Loesing
Created: 21-Jul-2009
Target: 0.2.2
-Status: Accepted
+Status: Closed
Change history:
diff --git a/proposals/ideas/old/xxx-geoip-survey-plan.txt b/proposals/ideas/old/xxx-geoip-survey-plan.txt
new file mode 100644
index 0000000..49c6615
--- /dev/null
+++ b/proposals/ideas/old/xxx-geoip-survey-plan.txt
@@ -0,0 +1,137 @@
+
+
+Abstract
+
+ This document explains how to tell about how many Tor users there
+ are, and how many there are in which country. Statistics are
+ involved.
+
+Motivation
+
+ There are a few reasons we need to keep track of which countries
+ Tor users (in aggregate) are coming from:
+
+ - Resource allocation. Knowing about underserved countries with
+ lots of users can let us know about where we need to direct
+ translation and outreach efforts.
+
+ - Anticensorship. Sudden drops in usage on a national basis can
+ indicate the arrival of a censorious firewall.
+
+ - Sponsor outreach and self-evalutation. Many people and
+ organizations who are interested in funding The Tor Project's
+ work want to know that we're successfully serving parts of the
+ world they're interested in, and that efforts to expand our
+ userbase are actually succeeding. So do we.
+
+Goals
+
+ We want to know approximately how many Tor users there are, and which
+ countries they're in, even in the presence of a hypothetical
+ "directory guard" feature. Some uncertainty is okay, but we'd like
+ to be able to put a bound on the uncertainty.
+
+ We need to make sure this information isn't exposed in a way that
+ helps an adversary.
+
+Methods for current clients:
+
+ Every client downloads network status documents. There are
+ currently three methods (one hypothetical) for clients to get them.
+ - 0.1.2.x clients (and earlier) fetch a v2 networkstatus
+ document about every NETWORKSTATUS_CLIENT_DL_INTERVAL [30
+ minutes].
+
+ - 0.2.0.x clients fetch a v3 networkstatus consensus document
+ at a random interval between when their current document is no
+ longer freshest, and when their current document is about to
+ expire.
+
+ [In both of the above cases, clients choose a running
+ directory cache at random with odds roughly proportional to
+ its bandwidth. If they're just starting, they know a XXXX FIXME -NM]
+
+ - In some future version, clients will choose directory caches
+ to serve as their "directory guards" to avoid profiling
+ attacks, similarly to how clients currently start all their
+ circuits at guard nodes.
+
+ We assume that a directory cache can tell which of these three
+ categories a client is in by the format of its status request.
+
+ A directory cache can be made to count distinct client IP
+ addresses that make a certain request of it in a given timeframe,
+ and total requests made to it over that timeframe. For the first
+ two cases, a cache can get a picture of the overall
+ number and countries of users in the network by dividing the IP
+ count by the probability with which they (as a cache) would be
+ chosen. Assuming that our listed bandwidth is such that we expect
+ to be chosen with probability P for any given request, and we've
+ been counting IPs for long enough that we expect the average
+ client to have made N requests, they will have visited us at least
+ once with probability P' = 1-(1-P)^N, and so we divide the IP
+ counts we've seen by P' for our estimate. To estimate total
+ number of clients of a given type, determine how many requests a
+ client of that type will make over that time, and assume we'll
+ have seen P of them.
+
+ Both of these numbers are useful: the IP counts will give the
+ total number of IPs connecting to the network, and the request
+ counts will give the total number of users on the network at any
+ given time.
+
+ Notes:
+ - [Over H hours, the N for V2 clients is 2*H, and the N for V3
+ clients is currently around H/2 or H/3.]
+
+ - (We should only count requests that we actually intend to answer;
+ 503 requests shouldn't count.)
+
+ - These measurements should also be taken at a directory
+ authority if possible: their picture of the network is skewed
+ by clients that fetch from them directly. These clients,
+ however, are all the clients that are just bootstrapping
+ (assuming that the fallback-consensus feature isn't yet used
+ much).
+
+ - These measurements also overestimate the V2 download rate if
+ some downloads fail and clients retry them later after backing
+ off.
+
+Methods for directory guards:
+
+ If directory guards are in use, directory guards get a picture of
+ all those users who chose them as a guard when they were listed
+ as a good choice for a guard, and who are also on the network
+ now. The cleanest data here will come from nodes that were listed
+ as good new-guards choices for a while, and have not been so for a
+ while longer (to study decay rates); nodes that have been listed
+ as good new-guard choices consistently for a long time (to get a
+ sample of the network); and nodes that have been listed as good
+ new-guard choices only recently (to get a sample of new users and
+ users whose guards have died out.)
+
+ Since directory guards are currently unspecified, we'll need to
+ make some guesses about how they'll turn out to work. Here are
+ a couple of approaches that could work.
+ - We could have clients pick completely new directory guards on
+ a rolling basis every two months or so. This would ensure
+ that staying as a guard for a while would be sufficient to
+ see a sample of users. This is potentially advantageous for
+ load-balancing the network as well, though it might lose some
+ of the benefits of directory guard. We need to quantify the
+ impact of this; it might not actually make stuff worse in
+ practice, if most guards don't stay good guards for a month
+ or two.
+
+ - We could try to collect statistics at several directory
+ guards and combine their statisics, but we would need to make
+ sure that for all time, at least one of the directory guards
+ had been recommended as a good choice for new guards. By
+ looking at new-IP rates for guards, we could get an idea of
+ user uptake; for looking at old-IP decay rates, we could get
+ an idea of turnover. This approach would entail significant
+ complexity, and we'd probably need to record more information
+ than we'd really like to.
+
+
diff --git a/proposals/ideas/xxx-geoip-survey-plan.txt b/proposals/ideas/xxx-geoip-survey-plan.txt
deleted file mode 100644
index 49c6615..0000000
--- a/proposals/ideas/xxx-geoip-survey-plan.txt
+++ /dev/null
@@ -1,137 +0,0 @@
-
-
-Abstract
-
- This document explains how to tell about how many Tor users there
- are, and how many there are in which country. Statistics are
- involved.
-
-Motivation
-
- There are a few reasons we need to keep track of which countries
- Tor users (in aggregate) are coming from:
-
- - Resource allocation. Knowing about underserved countries with
- lots of users can let us know about where we need to direct
- translation and outreach efforts.
-
- - Anticensorship. Sudden drops in usage on a national basis can
- indicate the arrival of a censorious firewall.
-
- - Sponsor outreach and self-evalutation. Many people and
- organizations who are interested in funding The Tor Project's
- work want to know that we're successfully serving parts of the
- world they're interested in, and that efforts to expand our
- userbase are actually succeeding. So do we.
-
-Goals
-
- We want to know approximately how many Tor users there are, and which
- countries they're in, even in the presence of a hypothetical
- "directory guard" feature. Some uncertainty is okay, but we'd like
- to be able to put a bound on the uncertainty.
-
- We need to make sure this information isn't exposed in a way that
- helps an adversary.
-
-Methods for current clients:
-
- Every client downloads network status documents. There are
- currently three methods (one hypothetical) for clients to get them.
- - 0.1.2.x clients (and earlier) fetch a v2 networkstatus
- document about every NETWORKSTATUS_CLIENT_DL_INTERVAL [30
- minutes].
-
- - 0.2.0.x clients fetch a v3 networkstatus consensus document
- at a random interval between when their current document is no
- longer freshest, and when their current document is about to
- expire.
-
- [In both of the above cases, clients choose a running
- directory cache at random with odds roughly proportional to
- its bandwidth. If they're just starting, they know a XXXX FIXME -NM]
-
- - In some future version, clients will choose directory caches
- to serve as their "directory guards" to avoid profiling
- attacks, similarly to how clients currently start all their
- circuits at guard nodes.
-
- We assume that a directory cache can tell which of these three
- categories a client is in by the format of its status request.
-
- A directory cache can be made to count distinct client IP
- addresses that make a certain request of it in a given timeframe,
- and total requests made to it over that timeframe. For the first
- two cases, a cache can get a picture of the overall
- number and countries of users in the network by dividing the IP
- count by the probability with which they (as a cache) would be
- chosen. Assuming that our listed bandwidth is such that we expect
- to be chosen with probability P for any given request, and we've
- been counting IPs for long enough that we expect the average
- client to have made N requests, they will have visited us at least
- once with probability P' = 1-(1-P)^N, and so we divide the IP
- counts we've seen by P' for our estimate. To estimate total
- number of clients of a given type, determine how many requests a
- client of that type will make over that time, and assume we'll
- have seen P of them.
-
- Both of these numbers are useful: the IP counts will give the
- total number of IPs connecting to the network, and the request
- counts will give the total number of users on the network at any
- given time.
-
- Notes:
- - [Over H hours, the N for V2 clients is 2*H, and the N for V3
- clients is currently around H/2 or H/3.]
-
- - (We should only count requests that we actually intend to answer;
- 503 requests shouldn't count.)
-
- - These measurements should also be taken at a directory
- authority if possible: their picture of the network is skewed
- by clients that fetch from them directly. These clients,
- however, are all the clients that are just bootstrapping
- (assuming that the fallback-consensus feature isn't yet used
- much).
-
- - These measurements also overestimate the V2 download rate if
- some downloads fail and clients retry them later after backing
- off.
-
-Methods for directory guards:
-
- If directory guards are in use, directory guards get a picture of
- all those users who chose them as a guard when they were listed
- as a good choice for a guard, and who are also on the network
- now. The cleanest data here will come from nodes that were listed
- as good new-guards choices for a while, and have not been so for a
- while longer (to study decay rates); nodes that have been listed
- as good new-guard choices consistently for a long time (to get a
- sample of the network); and nodes that have been listed as good
- new-guard choices only recently (to get a sample of new users and
- users whose guards have died out.)
-
- Since directory guards are currently unspecified, we'll need to
- make some guesses about how they'll turn out to work. Here are
- a couple of approaches that could work.
- - We could have clients pick completely new directory guards on
- a rolling basis every two months or so. This would ensure
- that staying as a guard for a while would be sufficient to
- see a sample of users. This is potentially advantageous for
- load-balancing the network as well, though it might lose some
- of the benefits of directory guard. We need to quantify the
- impact of this; it might not actually make stuff worse in
- practice, if most guards don't stay good guards for a month
- or two.
-
- - We could try to collect statistics at several directory
- guards and combine their statisics, but we would need to make
- sure that for all time, at least one of the directory guards
- had been recommended as a good choice for new guards. By
- looking at new-IP rates for guards, we could get an idea of
- user uptake; for looking at old-IP decay rates, we could get
- an idea of turnover. This approach would entail significant
- complexity, and we'd probably need to record more information
- than we'd really like to.
-
-
_______________________________________________
tor-commits mailing list
tor-commits@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits