[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format
#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
Reporter: rransom | Owner: endian7000
Type: enhancement | Status: assigned
Priority: normal | Milestone: Tor: 0.2.3.x-final
Component: Tor Relay | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Changes (by rransom):
* milestone: Tor: unspecified => Tor: 0.2.3.x-final
Comment:
Replying to [comment:5 endian7000]:
> I wrote a simple encoder with these results:
>
> 317,108 bytes: geoip-encoded
Great! That's better than I expected!
Keep in mind that if you want to use the dataset efficiently without
unpacking all of it into a larger in-memory form, you will need to make
some changes:
* Put the country-code and run-length maps in separate files, rather than
sticking all of the data in a single file. These files will be easiest to
use if their elements have constant length (so all we need to do is mmap
them or read them into buffers, then access them as arrays).
* Generate an index file. (The index format I suggested requires that
each run be encoded with the same length; you'll need to specify the
position of each block of runs in addition to the IP address of the first
run in the block with your variable-length encoding.)
But this would be quite an improvement even if you just modified our
current geoip.c to read a dataset from your format.
> 205,123 bytes: geoip-encoded.gz
That's even better!
> https://github.com/andrewschaaf/geoip-packing
>
> Does this look good? I can write a C decoder.
Great! Start by reading src/or/geoip.c and its header file on
[https://gitweb.torproject.org/tor.git/shortlog/refs/heads/master master],
since that's where this change is likely to be merged to.
> Also, I'm new to Tor development (thanks, Chuck Schumer!) -- how does
one submit pull requests for the Tor repo? I'm only familiar with
contributing to projects via GitHub.
Post the Git URL and branch name on the relevant Trac ticket, and set the
ticket's status to `needs_review`. I include a link to Gitweb for my
branches as well (see #3000 for an example), but that's optional.
Also see the [WikiFormatting] page for how to escape an URL so Trac will
let us select it (to paste it into a terminal).
Also, please avoid committing large binaries into Git branches; we will
want to translate the GeoIP database into binary form either during the
usual `make` process or when building a tarball. (Yes, 200 kB is large
for a binary file in Git.)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs