[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format
#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
Reporter: rransom | Owner: endian7000
Type: enhancement | Status: needs_review
Priority: normal | Milestone: Tor: 0.2.3.x-final
Component: Tor Relay | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Comment(by nickm):
Replying to [comment:9 rransom]:
> Replying to [comment:7 nickm]:
> > I'm not so sure that having this stuff in separate run-length and cc
files will actually be needed; endianness issues will keep us from reading
any portable file into an array-of-country verbatim, I think.
>
> The country codes are two-character ASCII strings, and are thus
endianness-independent. The run lengths are integers, but could be
encoded in big-endian form everywhere.
I thought that the whole point of endian7000's idea was that a lot of the
savings came from variable-length run-length encoding. In the database I'm
looking at, there are 4212 distinct run-length encodings. Lots of the win
comes from encoding the more frequent run-lengths as a single byte and the
less frequent ones as two bytes.
To quantify: 136810 of the runs in my geoip file would have their lengths
represented as one byte in the var-length encoding, whereas 11586 would
take two bytes. Using a fixed-width two-byte encoding for run lengths
would add another 133K to the file size.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs