[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format
#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
Reporter: rransom | Owner: endian7000
Type: enhancement | Status: needs_review
Priority: normal | Milestone: Tor: 0.2.3.x-final
Component: Tor Relay | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Comment(by rransom):
Replying to [comment:8 nickm]:
> Other thoughts: elsewhere we have used the country code "??" for an
unassigned mapping.
>
> We don't have a consistent internal uint8->country mapping in Tor: that
needs to get saved too.
>
> Right now the db has 246 country codes; it's be nice to have a good
answer here for what we would do if we had to go over 255.
We could use a variable-length integer (possibly encoded as one of a small
number of escape values followed by a protobuf variable-length integer).
> One possibility for making this format self-synchronizing is to reserve
the run-length index 0 and the country code 0 such that neither
corresponds to a run length or to a country. There are probably better
ways too.
As I explained in my previous comment, we need an index format anyway, so
making the run-list format self-synchronizing won't help us.
> Another useful observation: there seem to be less than 32 run lengths
that ever occur more than once. Not sure what we can use that for,
though.
I suspect that we could get a space win by constraining run lengths to a
small set such that every run length in the dataset can be expressed as a
sum of encodable run lengths (possibly the powers of two, although we
could automatically search for a better set) and Huffman-coding or
arithmetic-coding the run records; since we have to scan each chunk of
runs linearly anyway, this wouldn't make our speed ''much'' worse. I
think the added code complexity (in both the encoder and decoder) makes
fancy coding schemes a bad idea, though.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs