[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[or-cvs] r12566: draft of a proposal: Fetching GeoIP databases for clients, r (tor/trunk/doc/spec/proposals)



Author: arma
Date: 2007-11-24 10:28:08 -0500 (Sat, 24 Nov 2007)
New Revision: 12566

Added:
   tor/trunk/doc/spec/proposals/126-geoip-reporting.txt
Modified:
   tor/trunk/doc/spec/proposals/000-index.txt
   tor/trunk/doc/spec/proposals/123-autonaming.txt
Log:
draft of a proposal: Fetching GeoIP databases for clients, relays, and bridges


Modified: tor/trunk/doc/spec/proposals/000-index.txt
===================================================================
--- tor/trunk/doc/spec/proposals/000-index.txt	2007-11-24 12:22:56 UTC (rev 12565)
+++ tor/trunk/doc/spec/proposals/000-index.txt	2007-11-24 15:28:08 UTC (rev 12566)
@@ -48,6 +48,7 @@
 123  Naming authorities automatically create bindings [OPEN]
 124  Blocking resistant TLS certificate usage [ACCEPTED]
 125  Behavior for bridge users, bridge relays, and bridge authorities [OPEN]
+126  Fetching GeoIP databases for clients, relays, and bridges [OPEN]
 
 
 Proposals by status:
@@ -63,6 +64,7 @@
    121  Hidden Service Authentication
    123  Naming authorities automatically create bindings
    125  Behavior for bridge users, bridge relays, and bridge authorities
+   126  Fetching GeoIP databases for clients, relays, and bridges
  ACCEPTED:
    105  Version negotiation for the Tor protocol
    124  Blocking resistant TLS certificate usage

Modified: tor/trunk/doc/spec/proposals/123-autonaming.txt
===================================================================
--- tor/trunk/doc/spec/proposals/123-autonaming.txt	2007-11-24 12:22:56 UTC (rev 12565)
+++ tor/trunk/doc/spec/proposals/123-autonaming.txt	2007-11-24 15:28:08 UTC (rev 12566)
@@ -1,4 +1,4 @@
-Filename: xxx-autonaming.txt
+Filename: 123-autonaming.txt
 Title: Naming authorities automatically create bindings
 Version: $Revision$
 Last-Modified: $Date$
@@ -52,3 +52,4 @@
 
  This automaton does not necessarily need to live in the Tor code, it
  can do its job just as well when it's an external tool.
+

Added: tor/trunk/doc/spec/proposals/126-geoip-reporting.txt
===================================================================
--- tor/trunk/doc/spec/proposals/126-geoip-reporting.txt	                        (rev 0)
+++ tor/trunk/doc/spec/proposals/126-geoip-reporting.txt	2007-11-24 15:28:08 UTC (rev 12566)
@@ -0,0 +1,124 @@
+Filename: 126-geoip-fetching.txt
+Title: Fetching GeoIP databases for clients, relays, and bridges
+Version: $Revision: 11988 $
+Last-Modified: $Date: 2007-10-16 12:59:42 -0400 (Tue, 16 Oct 2007) $
+Author: Roger Dingledine
+Created: 2007-11-24
+Status: Open
+
+1. Background and motivation
+
+  Right now we can keep a rough count of Tor users, both total and by
+  country, by watching connections to a single directory mirror. Being
+  able to get usage estimates is useful both for our funders (to
+  demonstrate progress) and for our own development (so we know how
+  quickly we're scaling and can design accordingly, and so we know which
+  countries and communities to focus on more). This need for information
+  is the only reason we haven't deployed "directory guards" (think of
+  them like entry guards but for directory information; in practice,
+  it would seem that Tor clients should simply use their entry guards
+  as their directory guards).
+
+  With the move toward bridges, we will no longer be able to track Tor
+  clients that use bridges, since they use their bridges as directory
+  guards. Further, we need to be able to learn which bridges stop seeing
+  use from certain countries (and are thus likely blocked), so we can
+  avoid giving them out to other users in those countries.
+
+  Right now we support GeoIP lookups through Vidalia: Vidalia draws relays
+  and circuits on its 'network map', and it performs anonymized GeoIP
+  lookups to its central servers to know where to put the dots. Vidalia
+  caches answers it gets -- to reduce delay, to reduce overhead on
+  the network, and to reduce anonymity issues where users reveal their
+  behavior through which IP addresses they ask about.
+
+  But with the advent of bridges, Tor clients are asking about IP
+  addresses that aren't in the main directory. In particular, bridge
+  users tell the central Vidalia servers about each bridge as they
+  discover it and their Vidalia tries to map it.
+
+  Also, we wouldn't mind letting Vidalia do a GeoIP lookup on the client's
+  own IP address, so it can provide a more useful map.
+
+  Also, Vidalia's central servers leave users open to partitioning
+  attacks, even if they can't target specific users. Further, as we
+  start using GeoIP results for more operational or security-relevant
+  goals, such as avoiding or including particular countries in circuits,
+  it becomes more important that users can't be singled out in terms of
+  their IP-to-country mapping beliefs.
+
+  This proposal describes a way for Tor relays, bridges, and clients to
+  download a local copy of a GeoIP database, so they can do local private
+  queries. Thus we can avoid sending detailed queries to central servers.
+
+2. Publishing and caching the GeoIP database
+
+  We assume that we use a free GeoIP db, like ip2country. We will need
+  to standardize on its format; see Section 5.
+
+  Each v3 directory authority should put a copy of the "geoip" file in
+  its datadirectory. Then its votes should include a hash of this file,
+  and the resulting consensus directory should specify the consensus hash.
+
+  There should be a new URL for fetching this geoip db (by "current.z"
+  for testing purposes, and by hash.z for typical downloads). Authorities
+  should fetch and serve the one listed in the consensus, even when they
+  vote for their own. This would argue for storing the cached version
+  in a better filename than "geoip".
+
+  Directory mirrors should keep a copy of this file available via the
+  same URLs.
+
+  We assume that the file would change at most a few times a month. Should
+  Tor ship with a bootstrap geoip file?
+
+3. Clients use it for Vidalia
+
+  Tor fetches the geoip file as above, and puts it in Tor's DataDirectory.
+  Then we could have a status event that tells controllers that a new
+  geoip file has arrived.
+
+  Then Vidalia would either read the file directly, or we would add
+  a control protocol interface for querying. Since Tor probably needs
+  to parse the file itself (see Section 4 below), offering the control
+  interface is probably cleanest.
+
+  There should be a config option to disable updating the geoip file,
+  in case users want to use their own file (e.g. they have a proprietary
+  GeoIP file they prefer to use). In that case we leave it up to the
+  user to update his geoip file out-of-band.
+
+4. Bridges use it for usage summaries
+
+  Once bridges have a GeoIP database locally, they can start to publish
+  sanitized summaries of client usage -- how many users they see and from
+  what countries. This might also be a more useful way for ordinary Tor
+  relays to convey the level of usage they see.
+
+  But how to safely summarize this information without opening too many
+  anonymity leaks seems hard, so I'm going to leave it for a different
+  proposal.
+
+5. Which db to use?
+
+  A recent ip-to-country.csv is 3421362 bytes. Compressed, it is 564252
+  bytes. This isn't so bad. But we can easily cut it down further; some
+  sample lines are:
+    "205500992","208605279","US","USA","UNITED STATES"
+    "208605280","208605311","CA","CAN","CANADA"
+    "208605312","210784255","US","USA","UNITED STATES"
+  My guess is the compression will solve most of the redundancy, so we
+  can stick with the default format.
+  http://ip-to-country.webhosting.info/node/view/5
+
+  The maxmind GeoLite Country database is also about 500KB compressed.
+  http://www.maxmind.com/app/geolitecountry
+
+  The maxmind GeoLite City database gives more finegrained detail, such
+  as geo coordinates and city name. Vidalia currently makes use of this
+  information. On the other hand it's 16MB compressed, which would seem
+  to be out of our reach.
+  http://www.maxmind.com/app/geolitecity
+
+  What other options are there?
+