[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

GeoIP database comparison



Hi everybody,

as you may or may not know, Tor uses a GeoIP database to resolve client
IP addresses to country codes for statistics. That's how we get the data
to plot usage-by-country graphs such as these:

  http://metrics.torproject.org/recurring-users-graphs.html

However, our GeoIP database was last updated on June 6, 2009, because a
subsequent update from our GeoIP database provider (ip-to-country) was
broken and declared most of the US IP addresses as unassigned. After
waiting for a few weeks for the database provider to fix this, we gave
up and kept on using the June 6, 2009 database.

Another issue with our current GeoIP database is that it didn't resolve
a single observed IP address to Tunisia, for example. But we know we
have users in Tunisia, so this cannot be true.

Time to look for alternatives. The alternatives I investigated are:

- Update to the most recent ip-to-country database from April 26, 2010,
available at
http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip

- Switch to the Host IP database from April 26, 2010, available at
http://www.hostip.info/faq.html

- Switch to Maxmind's free GeoLite Country database from April 1, 2010,
available at http://www.maxmind.com/app/geolitecountry

In addition to these databases, I looked at two more sources, mostly for
comparison, not for shipping them with Tor:

- Maxmind's commercial GeoIP Country database, last updated on April 20,
2010

- Jake's blockfinder tool http://github.com/ioerror/blockfinder

In particular, I looked at IP address ranges assigned to Tunisia,
because our current database fails entirely here and because Tunisia has
a nice small number of IP ranges in all the databases that makes it easy
to handle for this comparison. I attached the analysis to this mail
(32K), so that it's in the mail archives.

The PDF has IP address ranges in its rows and country codes resolved by
the various databases in its columns. Larger IP address ranges are bold
and bigger. Here are some observations:

- Our current database (ip-to-country 6/3/09) resolves almost zero IP
addresses to Tunisia, just one /29, one /30, and one /28. No wonder
we're seeing no users from Tunisia. Let's give up on this database.

- The most recent ip-to-country database from 4/26/10 has the very same
IP address ranges for Tunisia. No need to upgrade, IMO.

- Host IP disagrees most with the other databases. Host IP has lots of
/24 ranges that it thinks are Tunisia, but which no other database
agrees with. Host IP fails to identify the two largest ranges
41.224.0.0/13 and 196.203.0.0/16 as Tunisia, as opposed to the two
Maxminds and blockfinder. I should also say that /24 is the smallest
unit that Host IP knows, which is quite imprecise. I think Host IP is
out, too.

- blockfinder knows about the largest ranges only, which are all at
least /24 in size. It agrees with both Maxmind databases in all these
ranges.

- Both Maxmind databases have quite a few smaller IP address ranges that
none or few of the other databases know. One of them is a /26, the
others are /28 or smaller.

- Commercial and free Maxmind have almost the same ranges for Tunisia.
One example for a difference is 196.203.0.0/16 which is split into 5
ranges in the commercial database covering 65470 addresses (compared to
65536 in the free database) which is an overlap of 99.899%.

From this comparison it looks like the free Maxmind database is a far
better choice than ip-to-country or Host IP.

Jake was so kind to run his fast directory mirror trusted with the free
Maxmind database for 24 hours. I compared the output of these 24 hours
with the 24 hours before and made these observations:

- With the free Maxmind database we suddenly have 320 requests from
Tunisia in 24 hours. Before that, we had constantly zero requests from
Tunisia. Yay!

- With the new database we only have 8 unresolved IP addresses. Before
that, trusted was unable to resolve 9024 requests with the ip-to-country
database, right after US with 17808 requests, DE with 11336 requests,
and KR with 9368 requests. That's a lot of unresolved IP addresses.

- Interestingly, trusted resolves 7368 requests to Nigeria with the
Maxmind database, which were only 8 with ip-to-country. I wonder if this
can be correct. Is Tor this well-known in Nigeria?

In summary, I think we should ship Maxmind's free database with Tor, if
the license permits it. I'm hoping to get much better usage statistics
results for smaller countries like Tunisia from the free Maxmind. Also,
Maxmind has more users and is therefore less likely to provide a broken
update and not noticing for weeks.

Thoughts?

Best,
--Karsten

Attachment: geoipdb-comparison-TN-2010-04-29.pdf
Description: Adobe PDF document