[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #2506 [Tor Relay]: Design and implement a more compact GeoIP file format
#2506: Design and implement a more compact GeoIP file format
-------------------------+--------------------------------------------------
Reporter: rransom | Owner: endian7000
Type: enhancement | Status: needs_review
Priority: normal | Milestone: Tor: 0.2.3.x-final
Component: Tor Relay | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Comment(by rransom):
Replying to [comment:7 nickm]:
> I'm not so sure that having this stuff in separate run-length and cc
files will actually be needed; endianness issues will keep us from reading
any portable file into an array-of-country verbatim, I think.
The country codes are two-character ASCII strings, and are thus
endianness-independent. The run lengths are integers, but could be
encoded in big-endian form everywhere.
> Let's see how tricky the read code is before we decide that
complexifying the format is worth it in order to make the read code
simpler.
I think putting each separate array in a separate file would give us a
simpler format than putting all of the arrays in a single file.
> I'm also not clear how best to read this format quickly on the fly:
unpacking it all into ram seems like a lose if we don't have to; a
workable index format would be neat (and would be much easier for fixed-
length or self-synchronizing records.
We would need an index format even with fixed-length records -- each
record corresponds to a wildly varying amount of IP address space. We
look up the country associated with an IP address, so we need to read the
database in order from some starting point for which we know both the
starting IP address and the offset into the packed dataset. I suggest an
index format consisting of (IP address, offset) pairs as fixed-length
records; we can look up an IP address by performing a binary search
through the index, then searching linearly through the runs in the piece
of the packed dataset starting at the specified offset. Specifying the
offset has the additional advantage that (if we know how to find the end
of the index array) we can later put the packed dataset in the same file
as (and following) the index.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2506#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs