[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format
#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
Reporter: rransom | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Tor Relay | Version:
Keywords: | Points:
Parent: |
-------------------------+--------------------------------------------------
Our current text-based GeoIP file (as of commit
e9803aa71003079cc00a8b3c80324581758a36be; from the January 2011 !MaxMind
!GeoLite Country dataset) is 3460049 bytes long (or 955382 bytes gzipped).
In !MaxMind's binary format, the February 2011 dataset is 1126966 bytes
long, and gzips to about half that size. But we can do much better than
that, and without having to use (or reverse-engineer and clone) their LGPL
library.
The January 2011 !GeoLite database contains 138658 data lines, each of
which specifies a sequence of consecutive IPs assigned to a single
country. The file contains runs of 4070 distinct lengths, and maps runs
to 241 distinct countries. Even doubling the number of runs in order to
account for the fact that some IPs are not contained in any run (which we
should consider as a run assigned to 'no country'), and padding each run
to a 3-byte field, we can store the mapping itself in at most 813 kiB,
with a run-length table and country table totalling under 17 kiB. We can
fit an additional random-access index consisting of one 4-byte starting IP
for each 768-byte (256-run) block in just over 4 kiB if we want to keep
the database itself in its packed form, whether in memory or on disk.
813 kiB is probably a wild overestimate for the size of the mapping; I
haven't checked how many 'fake runs' we would need to add, but I would
expect there are far fewer unassigned runs than runs assigned to a country
in the database. I'm also not relying on any fancy encoding that would
fit each run in less than 3 bytes.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs