[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[or-cvs] r15142: Update geoip proposal draft to more closely match reality , (in tor/trunk: . doc/spec/proposals/ideas)
Author: nickm
Date: 2008-06-11 16:44:22 -0400 (Wed, 11 Jun 2008)
New Revision: 15142
Modified:
tor/trunk/
tor/trunk/doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt
Log:
r16178@tombo: nickm | 2008-06-11 16:33:06 -0400
Update geoip proposal draft to more closely match reality , and include slightly better ideas about dir guards.
Property changes on: tor/trunk
___________________________________________________________________
svk:merge ticket from /tor/trunk [r16178] on 49666b30-7950-49c5-bedf-9dc8f3168102
Modified: tor/trunk/doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt
===================================================================
--- tor/trunk/doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt 2008-06-11 20:44:17 UTC (rev 15141)
+++ tor/trunk/doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt 2008-06-11 20:44:22 UTC (rev 15142)
@@ -22,8 +22,7 @@
organizations who are interested in funding The Tor Project's
work want to know that we're successfully serving parts of the
world they're interested in, and that efforts to expand our
- userbase are actually succeeding. So, when you come right
- down to it, do we.
+ userbase are actually succeeding. So do we.
Goals
@@ -35,7 +34,7 @@
We need to make sure this information isn't exposed in a way that
helps an adversary.
-Methods:
+Methods for curent clients:
Every client downloads network status documents. There are
currently three methods (one hypothetical) for clients to get them.
@@ -48,8 +47,9 @@
longer freshest, and when their current document is about to
expire.
- [In both of the above cases, clients choose a directory cache at
- random with odds roughly proportional to its bandwidth.]
+ [In both of the above cases, clients choose a running
+ directory cache at random with odds roughly proportional to
+ its bandwidth.]
- In some future version, clients will choose directory caches
to serve as their "directory guards" to avoid profiling
@@ -60,8 +60,9 @@
categories a client is in by the format of its status request.
A directory cache can be made to count distinct client IP
- addresses that make a certain request of it in a given timeframe.
- For the first two cases, a cache can get a picture of the overall
+ addresses that make a certain request of it in a given timeframe,
+ and total requests made to it over that timeframe. For the first
+ two cases, a cache can get a picture of the overall
number and countries of users in the network by dividing the IP
count by the probability with which they (as a cache) would be
chosen. Assuming that our listed bandwidth is such that we expect
@@ -69,8 +70,30 @@
been counting IPs for long enough that we expect the average
client to have made N requests, they will have visited us at least
once with probability P' = 1-(1-P)^N, and so we divide the IP
- counts we've seen by P' for our estimate.
+ counts we've seen by P' for our estimate. To estimate total
+ number of clients of a given type, determine how many requests a
+ client of that type will make over that time, and assume we'll
+ have seen P of them.
+ Both of these numbers are useful: the IP counts will give the
+ total number of IPs connecting to the network, and the request
+ counts will give the total number of users on the network at any
+ given time.
+
+ Notes:
+ - [Over H hours, the N for V2 clients is 2*H, and the N for V3
+ clients is currently around N/2 or N/3. [***FIGURE THIS
+ OUT***XXXX]]
+
+ - (We should only count requests that we actually intend to answer;
+ 503 requests shouldn't count.)
+
+ - These measurements *shouldn't* be taken at directory
+ authorities: their picture of the network is too skewed by the
+ special cases in which clients fetch from them directly.
+
+Methods for directory guards:
+
If directory guards are in use, directory guards get a picture of
all those users who chose them as a guard when they were listed
as a good choice for a guard, and who are also on the network
@@ -82,7 +105,27 @@
new-guard choices only recently (to get a sample of new users and
users whose guards have died out.)
- Note that these measurements *shouldn't* be taken at directory
- authorities: their picture of the network is too skewed by the
- special cases in which clients fetch from them directly.
+ Since directory guards are currently unspecified, we'll need to
+ make some guesses about how they'll turn out to work. Here are
+ a couple of approaches that could work.
+ - We could have clients pick completely new directory guards on
+ a rolling basis every two months or so. This would ensure
+ that staying as a guard for a while would be sufficient to
+ see a sample of users. This is potentially advantageous for
+ load-balancing the network as well, though it might lose some
+ of the benefits of directory guard. We need to quantify the
+ impact of this; it might not actually make stuff worse in
+ practice, if most guards don't stay good guards for a month
+ or two.
+ - We could try to collect statistics at several directory
+ guards and combine their statisics, but we would need to make
+ sure that for all time, at least one of the directory guards
+ had been recommended as a good choice for new guards. By
+ looking at new-IP rates for guards, we could get an idea of
+ user uptake; for looking at old-IP decay rates, we could get
+ an idea of turnover. This approach would entail significant
+ complexity, and we'd probably need to record more information
+ than we'd really like to.
+
+