[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format



#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
 Reporter:  rransom      |          Owner:  endian7000        
     Type:  enhancement  |         Status:  assigned          
 Priority:  normal       |      Milestone:  Tor: 0.2.3.x-final
Component:  Tor Relay    |        Version:                    
 Keywords:               |         Parent:                    
   Points:               |   Actualpoints:                    
-------------------------+--------------------------------------------------
Changes (by rransom):

  * milestone:  Tor: unspecified => Tor: 0.2.3.x-final


Comment:

 Replying to [comment:5 endian7000]:
 > I wrote a simple encoder with these results:
 >
 > 317,108 bytes: geoip-encoded

 Great!  That's better than I expected!

 Keep in mind that if you want to use the dataset efficiently without
 unpacking all of it into a larger in-memory form, you will need to make
 some changes:

  * Put the country-code and run-length maps in separate files, rather than
 sticking all of the data in a single file.  These files will be easiest to
 use if their elements have constant length (so all we need to do is mmap
 them or read them into buffers, then access them as arrays).
  * Generate an index file.  (The index format I suggested requires that
 each run be encoded with the same length; you'll need to specify the
 position of each block of runs in addition to the IP address of the first
 run in the block with your variable-length encoding.)

 But this would be quite an improvement even if you just modified our
 current geoip.c to read a dataset from your format.

 > 205,123 bytes: geoip-encoded.gz

 That's even better!

 > https://github.com/andrewschaaf/geoip-packing
 >
 > Does this look good? I can write a C decoder.

 Great!  Start by reading src/or/geoip.c and its header file on
 [https://gitweb.torproject.org/tor.git/shortlog/refs/heads/master master],
 since that's where this change is likely to be merged to.

 > Also, I'm new to Tor development (thanks, Chuck Schumer!) -- how does
 one submit pull requests for the Tor repo? I'm only familiar with
 contributing to projects via GitHub.

 Post the Git URL and branch name on the relevant Trac ticket, and set the
 ticket's status to `needs_review`.  I include a link to Gitweb for my
 branches as well (see #3000 for an example), but that's optional.

 Also see the [WikiFormatting] page for how to escape an URL so Trac will
 let us select it (to paste it into a terminal).

 Also, please avoid committing large binaries into Git branches; we will
 want to translate the GeoIP database into binary form either during the
 usual `make` process or when building a tarball.  (Yes, 200 kB is large
 for a binary file in Git.)

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs