[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format
#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
Reporter: rransom | Owner: endian7000
Type: enhancement | Status: needs_review
Priority: normal | Milestone: Tor: 0.2.3.x-final
Component: Tor Relay | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Comment(by rransom):
Replying to [comment:11 nickm]:
> Replying to [comment:9 rransom]:
> > Replying to [comment:7 nickm]:
> > > I'm not so sure that having this stuff in separate run-length and cc
files will actually be needed; endianness issues will keep us from reading
any portable file into an array-of-country verbatim, I think.
> >
> > The country codes are two-character ASCII strings, and are thus
endianness-independent. The run lengths are integers, but could be
encoded in big-endian form everywhere.
>
> I thought that the whole point of endian7000's idea was that a lot of
the savings came from variable-length run-length encoding. In the database
I'm looking at, there are 4212 distinct run-length encodings. Lots of the
win comes from encoding the more frequent run-lengths as a single byte and
the less frequent ones as two bytes.
>
> To quantify: 136810 of the runs in my geoip file would have their
lengths represented as one byte in the var-length encoding, whereas 11586
would take two bytes. Using a fixed-width two-byte encoding for run
lengths would add another 133K to the file size.
The mapping of run-length codes to run lengths should be stored in a
separate file, in which the run lengths are fixed-width big-endian
integers, and each run-length code should be an index into that array.
The mapping of country identifiers to two-character ISO country codes
should be stored similarly. The list of runs should be stored in
endian7000's variable-length-record format.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs