[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-commits] [onionoo/master] Use recent GeoIP database without A1 entries.
commit 95623efb0e415d1c9c9fa176a967f1a05f942b45
Author: Karsten Loesing <karsten.loesing@xxxxxxx>
Date: Mon Feb 11 08:12:39 2013 +0100
Use recent GeoIP database without A1 entries.
The IP-to-city database to be deployed with Onionoo needs to have its "A1"
("Anonymous Proxy") entries fixed just like Tor's IP-to-country file. See
Tor's src/config/README.geoip for detailed information.
- Ship with a variant of Tor's deanonymind.py that removes A1 entries from
IP-to-city databases. Also ship with a custom geoip-manual for manual
replacements..
- Use our own GeoIP file parser, because MaxMind's library doesn't work
with .csv files. On the plus side this removes a dependency and makes
it easier to build Onionoo. On the minus side it adds a bunch of new
code.
- Update index.html to say that some _name entries may be missing if
empty.
- Update .gitignore and INSTALL.
---
.gitignore | 29 ++-
INSTALL | 79 +++++--
geoip/deanonymind.py | 175 ++++++++++++
geoip/geoip-manual | 354 +++++++++++++++++++++++++
src/org/torproject/onionoo/CurrentNodes.java | 364 +++++++++++++++++++++++---
src/org/torproject/onionoo/Main.java | 3 +-
web/index.html | 6 +-
7 files changed, 936 insertions(+), 74 deletions(-)
diff --git a/.gitignore b/.gitignore
index 40f5895..ac44c7d 100755
--- a/.gitignore
+++ b/.gitignore
@@ -1,16 +1,21 @@
-relay-search-data.csv
-in/
-status/
-lib/
+.classpath
+.project
classes/
-out/
-onionoo.war
-etc/web.xml
etc/context.xml
-GeoIP.dat
-GeoIPASNum.dat
-GeoLiteCity.dat
+etc/web.xml
+geoip/Automatic-GeoLiteCity-Blocks.csv
+geoip/GeoIPASNum2.csv
+geoip/GeoIPASNum2.zip
+geoip/GeoLiteCity-Blocks.csv
+geoip/GeoLiteCity-Location.csv
+geoip/GeoLiteCity-latest.zip
+geoip/Manual-GeoLiteCity-Blocks.csv
+geoip/iso3166.csv
+geoip/region.csv
+in/
+lib/
log
-.classpath
-.project
+onionoo.war
+out/
+status/
diff --git a/INSTALL b/INSTALL
index 0e6269d..b3d5d0a 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,9 +1,14 @@
Clone the Onionoo server repository
-----------------------------------
-Clone the Onionoo server repository into /srv/onionoo/.
+Create working directory /srv/onionoo/, make it writable for the metrics
+user, and clone the Onionoo server repository into it. Commands prefixed
+with # are meant to be run by root, commands with $ by user metrics:
-$ git clone git://github.com/kloesing/Onionoo /srv/onionoo/
+# mkdir /srv/onionoo
+# chown metrics:metrics /srv/onionoo
+$ git clone https://git.torproject.org/onionoo.git /srv/onionoo/
+$ cd /srv/onionoo
Install Java 1.5 or higher, ant 1.8 or higher, and Tomcat 6
@@ -20,13 +25,13 @@ Provide required .jar files
---------------------------
Download or build the following .jar files and put them in the lib/
-directory using the given filename (or update build.xml if filenames are
-different):
+directory:
-- Apache Commons Codec 1.4, lib/commons-codec-1.4.jar
-- Servlet API, e.g., from Tomcat 6, lib/servlet-api.jar
-- Maxmind GeoIP Java API, lib/maxmindgeoip.jar
-- Tor Metrics Descriptor Library, lib/descriptor.jar
+- Apache Commons Codec 1.4
+- Apache Commons Compress 1.4.1
+- Apache Commons Lang 2.6
+- Servlet API, e.g., from Tomcat 6
+- Tor Metrics Descriptor Library, metrics-lib
Attempt to compile the Java sources to make sure that everything works
correctly:
@@ -37,14 +42,50 @@ $ ant compile
Download GeoIP and ASN database files
-------------------------------------
-Download the GeoLite City database from Maxmind and put it in
-/srv/onionoo/GeoLiteCity.dat. If no such file is found, relay IP
-addresses will not be resolved to country codes, latitudes, and
-longitudes.
+Onionoo uses an IP-to-city database and an IP-to-ASN database to provide
+additional information about a relay's location.
-Also download the GeoLite ASN database from Maxmind and put it in
-/srv/onionoo/GeoIPASNum.dat. If no such file is found, relay IP
-addresses will not be resolved to AS numbers and names.
+The IP-to-city database to be deployed with Onionoo needs to have its "A1"
+("Anonymous Proxy") entries fixed just like Tor's IP-to-country file. See
+Tor's src/config/README.geoip for detailed information.
+
+First, change to the geoip/ directory:
+
+$ cd geoip/
+
+Download the most recent MaxMind GeoLite City database and unzip it in the
+current directory, junking paths:
+
+$ wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
+$ unzip -j GeoLiteCity-latest.zip
+
+Run deanonymind.py in the local directory:
+
+$ python deanonymind.py
+
+Review the output to learn about applied automatic/manual changes and
+watch out for any warnings. Possibly edit geoip-manual to make
+more/fewer/different manual changes and re-run deanonymind.py. To look at
+automatic and manual changes, run:
+
+$ diff -U1 GeoLiteCity-Blocks.csv Automatic-GeoLiteCity-Blocks.csv
+$ diff -U1 Automatic-GeoLiteCity-Blocks.csv Manual-GeoLiteCity-Blocks.csv
+
+Download MaxMind's country and region codes files to the current
+directory:
+
+$ wget http://dev.maxmind.com/static/csv/codes/iso3166.csv
+$ wget http://dev.maxmind.com/static/csv/codes/maxmind/region.csv
+
+Download the most recent MaxMind ASN database file and unzip it in the
+current directory:
+
+$ wget http://www.maxmind.com/download/geoip/database/asnum/GeoIPASNum2.zip
+$ unzip GeoIPASNum2.zip
+
+Change back to the root working directory:
+
+$ cd ../
Test the rsync of descriptors from metrics.torproject.org
@@ -57,10 +98,10 @@ $ rsync -arz metrics.torproject.org::metrics-recent in
The result should be around 1G of data in the in/ directory, as of January
2012.
-(If you want to pre-populate the bandwidth data with archived data,
-download the tarballs from https://metrics.torproject.org/data.html and
-process them one after the other. There is no requirement to process data
-in any given order.)
+(If you want to pre-populate bandwidth and weights data with archived
+data, download the tarballs from https://metrics.torproject.org/data.html
+and process them one after the other. There is no requirement to process
+data in any given order.)
Test the hourly data processing process
diff --git a/geoip/deanonymind.py b/geoip/deanonymind.py
new file mode 100755
index 0000000..9ac3568
--- /dev/null
+++ b/geoip/deanonymind.py
@@ -0,0 +1,175 @@
+#!/usr/bin/env python
+import optparse
+import os
+import sys
+import zipfile
+
+"""
+Take a MaxMind GeoLite City blocks file as input and replace A1 entries
+with the block number of the preceding entry iff the preceding
+(subsequent) entry ends (starts) directly before (after) the A1 entry and
+both preceding and subsequent entries contain the same block number.
+
+Then apply manual changes, either replacing A1 entries that could not be
+replaced automatically or overriding previously made automatic changes.
+"""
+
+def main():
+ options = parse_options()
+ assignments = read_file(options.in_maxmind)
+ assignments = apply_automatic_changes(assignments,
+ options.block_number)
+ write_file(options.out_automatic, assignments)
+ manual_assignments = read_file(options.in_manual, must_exist=False)
+ assignments = apply_manual_changes(assignments, manual_assignments)
+ write_file(options.out_manual, assignments)
+
+def parse_options():
+ parser = optparse.OptionParser()
+ parser.add_option('-i', action='store', dest='in_maxmind',
+ default='GeoLiteCity-Blocks.csv', metavar='FILE',
+ help='use the specified MaxMind GeoLite City blocks .csv '
+ 'file as input [default: %default]')
+ parser.add_option('-b', action='store', dest='block_number',
+ default=242, metavar='NUM',
+ help='replace entries with this block number [default: '
+ '%default]')
+ parser.add_option('-g', action='store', dest='in_manual',
+ default='geoip-manual', metavar='FILE',
+ help='use the specified .csv file for manual changes or to '
+ 'override automatic changes [default: %default]')
+ parser.add_option('-a', action='store', dest='out_automatic',
+ default="Automatic-GeoLiteCity-Blocks.csv", metavar='FILE',
+ help='write full input file plus automatic changes to the '
+ 'specified .csv file [default: %default]')
+ parser.add_option('-m', action='store', dest='out_manual',
+ default='Manual-GeoLiteCity-Blocks.csv', metavar='FILE',
+ help='write full input file plus automatic and manual '
+ 'changes to the specified .csv file [default: %default]')
+ (options, args) = parser.parse_args()
+ return options
+
+def read_file(path, must_exist=True):
+ if not os.path.exists(path):
+ if must_exist:
+ print 'File %s does not exist. Exiting.' % (path, )
+ sys.exit(1)
+ else:
+ return
+ csv_file = open(path)
+ csv_content = csv_file.read()
+ csv_file.close()
+ assignments = []
+ for line in csv_content.split('\n'):
+ stripped_line = line.strip()
+ if len(stripped_line) > 0 and not stripped_line.startswith('#'):
+ assignments.append(stripped_line)
+ return assignments
+
+def apply_automatic_changes(assignments, block_number):
+ print '\nApplying automatic changes...'
+ result_lines = []
+ prev_line = None
+ a1_lines = []
+ block_number_str = '"%d"' % (block_number, )
+ for line in assignments:
+ if block_number_str in line:
+ a1_lines.append(line)
+ else:
+ if len(a1_lines) > 0:
+ new_a1_lines = process_a1_lines(prev_line, a1_lines, line)
+ for new_a1_line in new_a1_lines:
+ result_lines.append(new_a1_line)
+ a1_lines = []
+ result_lines.append(line)
+ prev_line = line
+ if len(a1_lines) > 0:
+ new_a1_lines = process_a1_lines(prev_line, a1_lines, None)
+ for new_a1_line in new_a1_lines:
+ result_lines.append(new_a1_line)
+ return result_lines
+
+def process_a1_lines(prev_line, a1_lines, next_line):
+ if not prev_line or not next_line:
+ return a1_lines # Can't merge first or last line in file.
+ if len(a1_lines) > 1:
+ return a1_lines # Can't merge more than 1 line at once.
+ a1_line = a1_lines[0].strip()
+ prev_entry = parse_line(prev_line)
+ a1_entry = parse_line(a1_line)
+ next_entry = parse_line(next_line)
+ touches_prev_entry = int(prev_entry['end_num']) + 1 == \
+ int(a1_entry['start_num'])
+ touches_next_entry = int(a1_entry['end_num']) + 1 == \
+ int(next_entry['start_num'])
+ same_block_number = prev_entry['block_number'] == \
+ next_entry['block_number']
+ if touches_prev_entry and touches_next_entry and same_block_number:
+ new_line = format_line_with_other_country(a1_entry, prev_entry)
+ print '-%s\n+%s' % (a1_line, new_line, )
+ return [new_line]
+ else:
+ return a1_lines
+
+def parse_line(line):
+ if not line:
+ return None
+ keys = ['start_num', 'end_num', 'block_number']
+ stripped_line = line.replace('"', '').strip()
+ parts = stripped_line.split(',')
+ entry = dict((k, v) for k, v in zip(keys, parts))
+ return entry
+
+def format_line_with_other_country(original_entry, other_entry):
+ return '"%s","%s","%s"' % (original_entry['start_num'],
+ original_entry['end_num'], other_entry['block_number'], )
+
+def apply_manual_changes(assignments, manual_assignments):
+ if not manual_assignments:
+ return assignments
+ print '\nApplying manual changes...'
+ manual_dict = {}
+ for line in manual_assignments:
+ start_num = parse_line(line)['start_num']
+ if start_num in manual_dict:
+ print ('Warning: duplicate start number in manual '
+ 'assignments:\n %s\n %s\nDiscarding first entry.' %
+ (manual_dict[start_num], line, ))
+ manual_dict[start_num] = line
+ result = []
+ for line in assignments:
+ entry = parse_line(line)
+ start_num = entry['start_num']
+ if start_num in manual_dict:
+ manual_line = manual_dict[start_num]
+ manual_entry = parse_line(manual_line)
+ if entry['end_num'] == manual_entry['end_num']:
+ if len(manual_entry['block_number']) == 0:
+ print '-%s' % (line, ) # only remove, don't replace
+ else:
+ new_line = format_line_with_other_country(entry,
+ manual_entry)
+ print '-%s\n+%s' % (line, new_line, )
+ result.append(new_line)
+ del manual_dict[start_num]
+ else:
+ print ('Warning: only partial match between '
+ 'original/automatically replaced assignment and '
+ 'manual assignment:\n %s\n %s\nNot applying '
+ 'manual change.' % (line, manual_line, ))
+ result.append(line)
+ else:
+ result.append(line)
+ if len(manual_dict) > 0:
+ print ('Warning: could not apply all manual assignments: %s' %
+ ('\n '.join(manual_dict.values())), )
+ return result
+
+def write_file(path, assignments):
+ out_file = open(path, 'w')
+ out_file.write('\n'.join(assignments))
+ out_file.close()
+
+if __name__ == '__main__':
+ main()
+
diff --git a/geoip/geoip-manual b/geoip/geoip-manual
new file mode 100644
index 0000000..6188957
--- /dev/null
+++ b/geoip/geoip-manual
@@ -0,0 +1,354 @@
+# This file contains manual overrides of A1 entries (and possibly others)
+# in MaxMind's GeoLite City database. Use deanonymind.py in the same
+# directory to process this file when producing a new geoip file. See
+# INSTALL for details.
+
+# From geoip-manual (country):
+# Remove MaxMind entry 0.116.0.0-0.119.255.255 which MaxMind says is AT,
+# but which is part of reserved range 0.0.0.0/8. -KL 2012-06-13
+"7602176","7864319",""
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"135013632","135013887","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"520493568","520494079","77"
+
+# From geoip-manual (country):
+# NL, because previous MaxMind entry 31.171.128.0-31.171.133.255 is NL,
+# and RIR delegation files say 31.171.128.0-31.171.135.255 is NL.
+# -KL 2012-11-27
+"531334656","531335167","161"
+
+# From geoip-manual (country):
+# EU, because next MaxMind entry 37.139.64.1-37.139.64.9 is EU, because
+# RIR delegation files say 37.139.64.0-37.139.71.255 is EU, and because it
+# just makes more sense for the next entry to start at .0 and not .1.
+# -KL 2012-11-27
+"629882880","629882880","3"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"644048128","644048383","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"644121856","644122111","223"
+
+# From geoip-manual (country):
+# CH, because previous MaxMind entry 46.19.141.0-46.19.142.255 is CH, and
+# RIR delegation files say 46.19.136.0-46.19.143.255 is CH.
+# -KL 2012-11-27
+"773033728","773033983","44"
+
+# From geoip-manual (country):
+# GB, because next MaxMind entry 46.166.129.0-46.166.134.255 is GB, and
+# RIR delegation files say 46.166.128.0-46.166.191.255 is GB.
+# -KL 2012-11-27
+"782663680","782663935","77"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"786817152","786817215","195"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"846537728","846537983","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"846542848","846543103","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1077383168","1077384191","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1077840384","1077840639","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1083264384","1083264447","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1083264464","1083264511","223"
+
+# From geoip-manual (country):
+# US, though could as well be CA. Previous MaxMind entry
+# 64.237.32.52-64.237.34.127 is US, next MaxMind entry
+# 64.237.34.144-64.237.34.151 is CA, and RIR delegation files say the
+# entire block 64.237.32.0-64.237.63.255 is US. -KL 2012-11-27
+"1089282688","1089282703","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1093730816","1093731071","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1095314944","1095314944","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1109848832","1109849087","39"
+
+# From geoip-manual (country):
+# US, though could as well be UY. Previous MaxMind entry
+# 67.15.170.0-67.15.182.255 is US, next MaxMind entry
+# 67.15.183.128-67.15.183.159 is UY, and RIR delegation files say the
+# entire block 67.15.0.0-67.15.255.255 is US. -KL 2012-11-27
+"1125103360","1125103487","223"
+
+# From geoip-manual (country):
+# US, because next MaxMind entry 67.43.145.0-67.43.155.255 is US, and RIR
+# delegation files say 67.43.144.0-67.43.159.255 is US.
+# -KL 2012-11-27
+"1126928384","1126928639","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1126931456","1126931711","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1138622208","1138622463","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1145334528","1145335039","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1159676928","1159677183","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1160905216","1160905471","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1170375168","1170375679","223"
+
+# From geoip-manual (country):
+# US, because previous MaxMind entry 70.159.21.51-70.232.244.255 is US,
+# because next MaxMind entry 70.232.245.58-70.232.245.59 is A2 ("Satellite
+# Provider") which is a country information about as useless as A1, and
+# because RIR delegation files say 70.224.0.0-70.239.255.255 is US.
+# -KL 2012-11-27
+"1189672192","1189672249","223"
+
+# From geoip-manual (country):
+# US, because next MaxMind entry 70.232.246.0-70.240.141.255 is US,
+# because previous MaxMind entry 70.232.245.58-70.232.245.59 is A2
+# ("Satellite Provider") which is a country information about as useless
+# as A1, and because RIR delegation files say 70.224.0.0-70.239.255.255 is
+# US. -KL 2012-11-27
+"1189672252","1189672447","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1249050624","1249051135","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1249051904","1249052671","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1249091584","1249092607","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1286389760","1286390271","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1286390528","1286390783","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1286391296","1286391807","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1286393856","1286394623","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1286395392","1286396159","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1286398976","1286399487","223"
+
+# From geoip-manual (country):
+# GB, despite neither previous (GE) nor next (LV) MaxMind entry being GB,
+# but because RIR delegation files agree with both previous and next
+# MaxMind entry and say GB for 91.228.0.0-91.228.3.255. -KL 2012-11-27
+"1541668864","1541669887","77"
+
+# From geoip-manual (country):
+# GB, because next MaxMind entry 91.232.125.0-91.232.125.255 is GB, and
+# RIR delegation files say 91.232.124.0-91.232.125.255 is GB.
+# -KL 2012-11-27
+"1541962752","1541963007","77"
+
+# From geoip-manual (country):
+# GB, despite neither previous (RU) nor next (PL) MaxMind entry being GB,
+# but because RIR delegation files agree with both previous and next
+# MaxMind entry and say GB for 91.238.214.0-91.238.215.255.
+# -KL 2012-11-27
+"1542379008","1542379519","77"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1632587008","1632587263","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1673576896","1673576959","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1795558656","1795558911","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"1933909760","1933910015","17"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"2360215808","2360216063","223"
+
+# From geoip-manual (country):
+# US, because next MaxMind entry 173.0.16.0-173.0.65.255 is US, and RIR
+# delegation files say 173.0.0.0-173.0.15.255 is US. -KL 2012-11-27
+"2902458368","2902462463","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"2918536448","2918536703","223"
+
+# From geoip-manual (country):
+# US, because next MaxMind entry 176.67.84.0-176.67.84.79 is US, and RIR
+# delegation files say 176.67.80.0-176.67.87.255 is US. -KL 2012-11-27
+"2957201408","2957202431","223"
+
+# From geoip-manual (country):
+# US, because previous MaxMind entry 176.67.84.192-176.67.85.255 is US,
+# and RIR delegation files say 176.67.80.0-176.67.87.255 is US.
+# -KL 2012-11-27
+"2957202944","2957203455","223"
+
+# From geoip-manual (country):
+# EU, despite neither previous (RU) nor next (UA) MaxMind entry being EU,
+# but because RIR delegation files agree with both previous and next
+# MaxMind entry and say EU for 193.200.150.0-193.200.150.255.
+# -KL 2012-11-27
+"3251148288","3251148543","3"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3341849376","3341853471","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3341873152","3341875199","223"
+
+# From geoip-manual (country):
+# US, because previous MaxMind entry 199.96.68.0-199.96.87.127 is US, and
+# RIR delegation files say 199.96.80.0-199.96.87.255 is US.
+# -KL 2012-11-27
+"3344979840","3344979967","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3346193920","3346194431","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3355430912","3355432959","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3450078464","3450079487","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3483239424","3483239679","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3483240704","3483240959","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3483247360","3483247871","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3485724672","3485728767","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3500664576","3500664831","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3500666752","3500666879","223"
+
+# From geoip-manual (country):
+# US, because previous MaxMind entry 209.58.176.144-209.59.31.255 is US,
+# and RIR delegation files say 209.59.32.0-209.59.63.255 is US.
+# -KL 2012-11-27
+"3510312960","3510321151","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3519352832","3519352959","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3519354048","3519354111","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3519355392","3519355519","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3520644608","3520644863","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3520656384","3520656639","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3632994048","3632994303","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3633782528","3633782783","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3633823488","3633823743","223"
+
+# Previous and next entry are same country, set to country number without
+# city information. -KL 2013-02-10
+"3634982400","3634982655","223"
+
+# From geoip-manual (country):
+# FR, because previous MaxMind entry 217.15.166.0-217.15.166.255 is FR,
+# and RIR delegation files contain a block 217.15.160.0-217.15.175.255
+# which, however, is EU, not FR. But merging with next MaxMind entry
+# 217.15.176.0-217.15.191.255 which is KZ and which fully matches what
+# the RIR delegation files say seems unlikely to be correct.
+# -KL 2012-11-27
+"3641681664","3641683967","75"
+
diff --git a/src/org/torproject/onionoo/CurrentNodes.java b/src/org/torproject/onionoo/CurrentNodes.java
index 9e5d0db..487cf4d 100644
--- a/src/org/torproject/onionoo/CurrentNodes.java
+++ b/src/org/torproject/onionoo/CurrentNodes.java
@@ -11,13 +11,17 @@ import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
import java.util.Iterator;
-import java.util.Locale;
+import java.util.Map;
+import java.util.Set;
import java.util.SortedMap;
import java.util.SortedSet;
import java.util.TimeZone;
import java.util.TreeMap;
import java.util.TreeSet;
+import java.util.regex.Pattern;
import org.torproject.descriptor.BridgeNetworkStatus;
import org.torproject.descriptor.Descriptor;
@@ -27,10 +31,6 @@ import org.torproject.descriptor.DescriptorSourceFactory;
import org.torproject.descriptor.NetworkStatusEntry;
import org.torproject.descriptor.RelayNetworkStatusConsensus;
-import com.maxmind.geoip.Location;
-import com.maxmind.geoip.LookupService;
-import com.maxmind.geoip.regionName;
-
/* Store relays and bridges that have been running in the past seven
* days. */
public class CurrentNodes {
@@ -343,53 +343,341 @@ public class CurrentNodes {
}
}
- public void lookUpCountries() {
- File geoLiteCityDatFile = new File("GeoLiteCity.dat");
- if (!geoLiteCityDatFile.exists()) {
- System.err.println("No GeoLiteCity.dat file in /.");
+ public void lookUpCitiesAndASes() {
+
+ /* Make sure we have all required .csv files. */
+ File[] geoLiteCityBlocksCsvFiles = new File[] {
+ new File("geoip/Manual-GeoLiteCity-Blocks.csv"),
+ new File("geoip/Automatic-GeoLiteCity-Blocks.csv"),
+ new File("geoip/GeoLiteCity-Blocks.csv")
+ };
+ File geoLiteCityBlocksCsvFile = null;
+ for (File file : geoLiteCityBlocksCsvFiles) {
+ if (file.exists()) {
+ geoLiteCityBlocksCsvFile = file;
+ break;
+ }
+ }
+ if (geoLiteCityBlocksCsvFile == null) {
+ System.err.println("No *GeoLiteCity-Blocks.csv file in geoip/.");
+ return;
+ }
+ File geoLiteCityLocationCsvFile =
+ new File("geoip/GeoLiteCity-Location.csv");
+ if (!geoLiteCityLocationCsvFile.exists()) {
+ System.err.println("No GeoLiteCity-Location.csv file in geoip/.");
+ return;
+ }
+ File iso3166CsvFile = new File("geoip/iso3166.csv");
+ if (!iso3166CsvFile.exists()) {
+ System.err.println("No iso3166.csv file in geoip/.");
+ return;
+ }
+ File regionCsvFile = new File("geoip/region.csv");
+ if (!regionCsvFile.exists()) {
+ System.err.println("No region.csv file in geoip/.");
+ return;
+ }
+ File geoIPASNum2CsvFile = new File("geoip/GeoIPASNum2.csv");
+ if (!geoIPASNum2CsvFile.exists()) {
+ System.err.println("No GeoIPASNum2.csv file in geoip/.");
+ return;
+ }
+
+ /* Obtain a map from relay IP address strings to numbers. */
+ Map<String, Long> addressStringNumbers = new HashMap<String, Long>();
+ Pattern ipv4Pattern = Pattern.compile("^[0-9\\.]{7,15}$");
+ for (Node relay : this.currentRelays.values()) {
+ String addressString = relay.getAddress();
+ long addressNumber = -1L;
+ if (ipv4Pattern.matcher(addressString).matches()) {
+ String[] parts = addressString.split("\\.", 4);
+ if (parts.length == 4) {
+ addressNumber = 0L;
+ for (int i = 0; i < 4; i++) {
+ addressNumber *= 256L;
+ int octetValue = -1;
+ try {
+ octetValue = Integer.parseInt(parts[i]);
+ } catch (NumberFormatException e) {
+ }
+ if (octetValue < 0 || octetValue > 255) {
+ addressNumber = -1L;
+ break;
+ }
+ addressNumber += octetValue;
+ }
+ }
+ }
+ if (addressNumber >= 0L) {
+ addressStringNumbers.put(addressString, addressNumber);
+ }
+ }
+ if (addressStringNumbers.isEmpty()) {
+ System.err.println("No relay IP addresses to resolve to cities or "
+ + "ASN.");
return;
}
+
+ /* Obtain a map from IP address numbers to blocks. */
+ Map<Long, Long> addressNumberBlocks = new HashMap<Long, Long>();
try {
- LookupService ls = new LookupService(geoLiteCityDatFile,
- LookupService.GEOIP_MEMORY_CACHE);
- for (Node relay : currentRelays.values()) {
- Location location = ls.getLocation(relay.getAddress());
- if (location != null) {
- relay.setLatitude(String.format(Locale.US, "%.6f",
- location.latitude));
- relay.setLongitude(String.format(Locale.US, "%.6f",
- location.longitude));
- relay.setCountryCode(location.countryCode.toLowerCase());
- relay.setCountryName(location.countryName);
- relay.setRegionName(regionName.regionNameByCode(
- location.countryCode, location.region));
- relay.setCityName(location.city);
+ SortedSet<Long> sortedAddressNumbers = new TreeSet<Long>(
+ addressStringNumbers.values());
+ long firstAddressNumber = sortedAddressNumbers.first();
+ BufferedReader br = new BufferedReader(new FileReader(
+ geoLiteCityBlocksCsvFile));
+ String line;
+ long previousStartIpNum = -1L;
+ while ((line = br.readLine()) != null) {
+ if (!line.startsWith("\"")) {
+ continue;
+ }
+ String[] parts = line.replaceAll("\"", "").split(",", 3);
+ if (parts.length != 3) {
+ System.err.println("Illegal line '" + line + "' in "
+ + geoLiteCityBlocksCsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ try {
+ long startIpNum = Long.parseLong(parts[0]);
+ if (startIpNum <= previousStartIpNum) {
+ System.err.println("Line '" + line + "' not sorted in "
+ + geoLiteCityBlocksCsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ previousStartIpNum = startIpNum;
+ while (firstAddressNumber < startIpNum &&
+ firstAddressNumber != -1L) {
+ sortedAddressNumbers.remove(firstAddressNumber);
+ if (sortedAddressNumbers.isEmpty()) {
+ firstAddressNumber = -1L;
+ } else {
+ firstAddressNumber = sortedAddressNumbers.first();
+ }
+ }
+ long endIpNum = Long.parseLong(parts[1]);
+ while (firstAddressNumber <= endIpNum &&
+ firstAddressNumber != -1L) {
+ long blockNumber = Long.parseLong(parts[2]);
+ addressNumberBlocks.put(firstAddressNumber, blockNumber);
+ sortedAddressNumbers.remove(firstAddressNumber);
+ if (sortedAddressNumbers.isEmpty()) {
+ firstAddressNumber = -1L;
+ } else {
+ firstAddressNumber = sortedAddressNumbers.first();
+ }
+ }
+ if (firstAddressNumber == -1L) {
+ break;
+ }
+ }
+ catch (NumberFormatException e) {
+ System.err.println("Number format exception while parsing line "
+ + "'" + line + "' in "
+ + geoLiteCityBlocksCsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
}
}
- ls.close();
+ br.close();
} catch (IOException e) {
- System.err.println("Could not look up countries for relays.");
+ System.err.println("I/O exception while reading "
+ + geoLiteCityBlocksCsvFile.getAbsolutePath() + ".");
+ return;
+ }
+
+ /* Obtain a map from relevant blocks to location lines. */
+ Map<Long, String> blockLocations = new HashMap<Long, String>();
+ try {
+ Set<Long> blockNumbers = new HashSet<Long>(
+ addressNumberBlocks.values());
+ BufferedReader br = new BufferedReader(new FileReader(
+ geoLiteCityLocationCsvFile));
+ String line;
+ while ((line = br.readLine()) != null) {
+ if (line.startsWith("C") || line.startsWith("l")) {
+ continue;
+ }
+ String[] parts = line.replaceAll("\"", "").split(",", 9);
+ if (parts.length != 9) {
+ System.err.println("Illegal line '" + line + "' in "
+ + geoLiteCityLocationCsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ try {
+ long locId = Long.parseLong(parts[0]);
+ if (blockNumbers.contains(locId)) {
+ blockLocations.put(locId, line);
+ }
+ }
+ catch (NumberFormatException e) {
+ System.err.println("Number format exception while parsing line "
+ + "'" + line + "' in "
+ + geoLiteCityLocationCsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ }
+ br.close();
+ } catch (IOException e) {
+ System.err.println("I/O exception while reading "
+ + geoLiteCityLocationCsvFile.getAbsolutePath() + ".");
+ return;
}
- }
- public void lookUpASes() {
- File geoIPASNumDatFile = new File("GeoIPASNum.dat");
- if (!geoIPASNumDatFile.exists()) {
- System.err.println("No GeoIPASNum.dat file in /.");
+ /* Read country names to memory. */
+ Map<String, String> countryNames = new HashMap<String, String>();
+ try {
+ BufferedReader br = new BufferedReader(new FileReader(
+ iso3166CsvFile));
+ String line;
+ while ((line = br.readLine()) != null) {
+ String[] parts = line.replaceAll("\"", "").split(",", 2);
+ if (parts.length != 2) {
+ System.err.println("Illegal line '" + line + "' in "
+ + iso3166CsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ countryNames.put(parts[0].toLowerCase(), parts[1]);
+ }
+ br.close();
+ } catch (IOException e) {
+ System.err.println("I/O exception while reading "
+ + iso3166CsvFile.getAbsolutePath() + ".");
return;
}
+
+ /* Read region names to memory. */
+ Map<String, String> regionNames = new HashMap<String, String>();
+ try {
+ BufferedReader br = new BufferedReader(new FileReader(
+ regionCsvFile));
+ String line;
+ while ((line = br.readLine()) != null) {
+ String[] parts = line.replaceAll("\"", "").split(",", 3);
+ if (parts.length != 3) {
+ System.err.println("Illegal line '" + line + "' in "
+ + regionCsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ regionNames.put(parts[0].toLowerCase() + ","
+ + parts[1].toLowerCase(), parts[2]);
+ }
+ br.close();
+ } catch (IOException e) {
+ System.err.println("I/O exception while reading "
+ + regionCsvFile.getAbsolutePath() + ".");
+ return;
+ }
+
+ /* Obtain a map from IP address numbers to ASN. */
+ Map<Long, String> addressNumberASN = new HashMap<Long, String>();
try {
- LookupService ls = new LookupService(geoIPASNumDatFile);
- for (Node relay : currentRelays.values()) {
- String org = ls.getOrg(relay.getAddress());
- if (org != null && org.indexOf(" ") > 0 && org.startsWith("AS")) {
- relay.setASNumber(org.substring(0, org.indexOf(" ")));
- relay.setASName(org.substring(org.indexOf(" ") + 1));
+ SortedSet<Long> sortedAddressNumbers = new TreeSet<Long>(
+ addressStringNumbers.values());
+ long firstAddressNumber = sortedAddressNumbers.first();
+ BufferedReader br = new BufferedReader(new FileReader(
+ geoIPASNum2CsvFile));
+ String line;
+ long previousStartIpNum = -1L;
+ while ((line = br.readLine()) != null) {
+ String[] parts = line.replaceAll("\"", "").split(",", 3);
+ if (parts.length != 3) {
+ System.err.println("Illegal line '" + line + "' in "
+ + geoIPASNum2CsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ try {
+ long startIpNum = Long.parseLong(parts[0]);
+ if (startIpNum <= previousStartIpNum) {
+ System.err.println("Line '" + line + "' not sorted in "
+ + geoIPASNum2CsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
+ }
+ previousStartIpNum = startIpNum;
+ while (firstAddressNumber < startIpNum &&
+ firstAddressNumber != -1L) {
+ sortedAddressNumbers.remove(firstAddressNumber);
+ if (sortedAddressNumbers.isEmpty()) {
+ firstAddressNumber = -1L;
+ } else {
+ firstAddressNumber = sortedAddressNumbers.first();
+ }
+ }
+ long endIpNum = Long.parseLong(parts[1]);
+ while (firstAddressNumber <= endIpNum &&
+ firstAddressNumber != -1L) {
+ if (parts[2].startsWith("AS") &&
+ parts[2].split(" ", 2).length == 2) {
+ addressNumberASN.put(firstAddressNumber, parts[2]);
+ }
+ sortedAddressNumbers.remove(firstAddressNumber);
+ if (sortedAddressNumbers.isEmpty()) {
+ firstAddressNumber = -1L;
+ } else {
+ firstAddressNumber = sortedAddressNumbers.first();
+ }
+ }
+ if (firstAddressNumber == -1L) {
+ break;
+ }
+ }
+ catch (NumberFormatException e) {
+ System.err.println("Number format exception while parsing line "
+ + "'" + line + "' in "
+ + geoIPASNum2CsvFile.getAbsolutePath() + ".");
+ br.close();
+ return;
}
}
- ls.close();
+ br.close();
} catch (IOException e) {
- System.err.println("Could not look up ASes for relays.");
+ System.err.println("I/O exception while reading "
+ + geoIPASNum2CsvFile.getAbsolutePath() + ".");
+ return;
+ }
+
+ /* Finally, set relays' city and ASN information. */
+ for (Node relay : currentRelays.values()) {
+ String addressString = relay.getAddress();
+ if (addressStringNumbers.containsKey(addressString)) {
+ long addressNumber = addressStringNumbers.get(addressString);
+ if (addressNumberBlocks.containsKey(addressNumber)) {
+ long blockNumber = addressNumberBlocks.get(addressNumber);
+ if (blockLocations.containsKey(blockNumber)) {
+ String[] parts = blockLocations.get(blockNumber).
+ replaceAll("\"", "").split(",", -1);
+ String countryCode = parts[1].toLowerCase();
+ relay.setCountryCode(countryCode);
+ if (countryNames.containsKey(countryCode)) {
+ relay.setCountryName(countryNames.get(countryCode));
+ }
+ String regionCode = countryCode + ","
+ + parts[2].toLowerCase();
+ if (regionNames.containsKey(regionCode)) {
+ relay.setRegionName(regionNames.get(regionCode));
+ }
+ if (parts[3].length() > 0) {
+ relay.setCityName(parts[3]);
+ }
+ relay.setLatitude(parts[5]);
+ relay.setLongitude(parts[6]);
+ }
+ }
+ if (addressNumberASN.containsKey(addressNumber)) {
+ String[] parts = addressNumberASN.get(addressNumber).split(" ", 2);
+ relay.setASNumber(parts[0]);
+ relay.setASName(parts[1]);
+ }
+ }
}
}
diff --git a/src/org/torproject/onionoo/Main.java b/src/org/torproject/onionoo/Main.java
index 41af72c..e3e7c5b 100644
--- a/src/org/torproject/onionoo/Main.java
+++ b/src/org/torproject/onionoo/Main.java
@@ -14,8 +14,7 @@ public class Main {
cn.readRelaySearchDataFile(new File("out/summary"));
cn.readRelayNetworkConsensuses();
cn.setRelayRunningBits();
- cn.lookUpCountries();
- cn.lookUpASes();
+ cn.lookUpCitiesAndASes();
cn.readBridgeNetworkStatuses();
cn.setBridgeRunningBits();
diff --git a/web/index.html b/web/index.html
index 5087a01..4c3491c 100755
--- a/web/index.html
+++ b/web/index.html
@@ -153,17 +153,17 @@ database.</li>
resolving the relay's first onion-routing IP address.
Optional field.
Omitted if the relay IP address could not be found in the GeoIP
-database.</li>
+database, or if the GeoIP database did not contain a country name.</li>
<li><b>"region_name":</b> Region name as found in a GeoIP database by
resolving the relay's first onion-routing IP address.
Optional field.
Omitted if the relay IP address could not be found in the GeoIP
-database.</li>
+database, or if the GeoIP database did not contain a region name.</li>
<li><b>"city_name":</b> City name as found in a
GeoIP database by resolving the relay's first onion-routing IP address.
Optional field.
Omitted if the relay IP address could not be found in the GeoIP
-database.</li>
+database, or if the GeoIP database did not contain a city name.</li>
<li><b>"latitude":</b> Latitude as found in a GeoIP database by resolving
the relay's first onion-routing IP address.
Optional field.
_______________________________________________
tor-commits mailing list
tor-commits@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits