Hi everybody, as you may or may not know, Tor uses a GeoIP database to resolve client IP addresses to country codes for statistics. That's how we get the data to plot usage-by-country graphs such as these: http://metrics.torproject.org/recurring-users-graphs.html However, our GeoIP database was last updated on June 6, 2009, because a subsequent update from our GeoIP database provider (ip-to-country) was broken and declared most of the US IP addresses as unassigned. After waiting for a few weeks for the database provider to fix this, we gave up and kept on using the June 6, 2009 database. Another issue with our current GeoIP database is that it didn't resolve a single observed IP address to Tunisia, for example. But we know we have users in Tunisia, so this cannot be true. Time to look for alternatives. The alternatives I investigated are: - Update to the most recent ip-to-country database from April 26, 2010, available at http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip - Switch to the Host IP database from April 26, 2010, available at http://www.hostip.info/faq.html - Switch to Maxmind's free GeoLite Country database from April 1, 2010, available at http://www.maxmind.com/app/geolitecountry In addition to these databases, I looked at two more sources, mostly for comparison, not for shipping them with Tor: - Maxmind's commercial GeoIP Country database, last updated on April 20, 2010 - Jake's blockfinder tool http://github.com/ioerror/blockfinder In particular, I looked at IP address ranges assigned to Tunisia, because our current database fails entirely here and because Tunisia has a nice small number of IP ranges in all the databases that makes it easy to handle for this comparison. I attached the analysis to this mail (32K), so that it's in the mail archives. The PDF has IP address ranges in its rows and country codes resolved by the various databases in its columns. Larger IP address ranges are bold and bigger. Here are some observations: - Our current database (ip-to-country 6/3/09) resolves almost zero IP addresses to Tunisia, just one /29, one /30, and one /28. No wonder we're seeing no users from Tunisia. Let's give up on this database. - The most recent ip-to-country database from 4/26/10 has the very same IP address ranges for Tunisia. No need to upgrade, IMO. - Host IP disagrees most with the other databases. Host IP has lots of /24 ranges that it thinks are Tunisia, but which no other database agrees with. Host IP fails to identify the two largest ranges 41.224.0.0/13 and 196.203.0.0/16 as Tunisia, as opposed to the two Maxminds and blockfinder. I should also say that /24 is the smallest unit that Host IP knows, which is quite imprecise. I think Host IP is out, too. - blockfinder knows about the largest ranges only, which are all at least /24 in size. It agrees with both Maxmind databases in all these ranges. - Both Maxmind databases have quite a few smaller IP address ranges that none or few of the other databases know. One of them is a /26, the others are /28 or smaller. - Commercial and free Maxmind have almost the same ranges for Tunisia. One example for a difference is 196.203.0.0/16 which is split into 5 ranges in the commercial database covering 65470 addresses (compared to 65536 in the free database) which is an overlap of 99.899%. From this comparison it looks like the free Maxmind database is a far better choice than ip-to-country or Host IP. Jake was so kind to run his fast directory mirror trusted with the free Maxmind database for 24 hours. I compared the output of these 24 hours with the 24 hours before and made these observations: - With the free Maxmind database we suddenly have 320 requests from Tunisia in 24 hours. Before that, we had constantly zero requests from Tunisia. Yay! - With the new database we only have 8 unresolved IP addresses. Before that, trusted was unable to resolve 9024 requests with the ip-to-country database, right after US with 17808 requests, DE with 11336 requests, and KR with 9368 requests. That's a lot of unresolved IP addresses. - Interestingly, trusted resolves 7368 requests to Nigeria with the Maxmind database, which were only 8 with ip-to-country. I wonder if this can be correct. Is Tor this well-known in Nigeria? In summary, I think we should ship Maxmind's free database with Tor, if the license permits it. I'm hoping to get much better usage statistics results for smaller countries like Tunisia from the free Maxmind. Also, Maxmind has more users and is therefore less likely to provide a broken update and not noticing for weeks. Thoughts? Best, --Karsten
Attachment:
geoipdb-comparison-TN-2010-04-29.pdf
Description: Adobe PDF document