[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A little more exact specification of the hash-table



Actually, now I'm thinking about it, is it you (Christian) or me who is
going to do the hash-table thingie? I am under the impression it is me, but
I don't remember this explicitly being stated. Just an effort to eliminate
possible duplicate effort.

Anyways, I thougth I'd write up a little more specific specification of the
hash-table and its capabilities. Its only a draft. The hash-table is from
here on referred to as "the container". The data it contains, dir and file
data, is from here on referred to as "entries". The class ppf_URLInfo is
simply referred to as "a path".

** Access
The container allows access to its entries by means of a key. This key is a
path. The container will search incrementally for an entry, searching
containers that are also entries in the container, as specified by the path.

The container does not allow for the capability to iterate over all entries.

The container searches first by means of a hash-table, then uses a binary
search algorithm to narrow the search.

The container guarantees that access will, in the best, average and worst
case, have a performance of O(log n).

** Serialization
The container can tell how much space it would take up were it to be asked
to serialize itself and its entries to a target in the state it currently
is.

[this depends a little on the IO utilised, so either this:]
The container supports serialization of itself and its entries to any target
that fully implements the interface of the standard library class ostream,
and from any source that fully implements the interface of the standard
library class istream.
[or this:]
The container supports serialization of itself and its entries to any target
handle opened via the ppfOpen() function, and from any source handle opened
via the ppfOpen() function.
[one of those]

The container guarantees that its serialized output will be of this format,
and can only serialize itself from input that takes this format:

4 bytes           - 32 bit unsigned integer, having a value equal to N
4 bytes           - 32 bit unsigned integer, having a value of S
size of N entries   - N entries, sorted in ascending order (small-to-large)
as based on the value returned by the modulus operator applied to the
hash-value of each entry as the first operand, and the value S as the second
operand (ie, hash % S). The hash-value stored, however, is not this
modulus-returned value, but the actual hash-value of each entry, as produced
by the hash-function. For those entries where this modulus-returned value
match, they are sorted in ascending order in accordance with their
hash-value, and if this also matches, then in accordance to the outcome of a
comparision of their respective names.

Here, of course, N is the amount of entries, and S is the table-size used by
the table that produced the serialized representation.

If the table-size desired by a container that is serializing from sotrage,
is different from S, performance will be noticeably impacted, though the
container will be able to do the conversion.

The container guarantees that serialization will, in the best, average and
worst case, have performance of O(n).

** Internal representation

The container is represented internally by two arrays. The first array is as
large as the table-size times 8 (ie, 8 bytes per position). The second array
is as large as the amount of entries times the size of each entry.

The first array holds a pointer and an unsigned integer value at each
position. The pointer points to somewhere in the second array, or is equal
to 0 (NULL). The integer holds a value here called D. The entry pointed to
(if any), and the next D-1 entries, all share the characteristic that the
value returned by the modulus operator given each entries hash-value as its
first value, and the table-size as the second (ie, hash % table-size) is
equal to the offset into the first array the position in the first array
has.

The entries in this sub-part of the second array are sorted in ascending
order as based on their hash-value. In cases the hash-value matches, the
name of the respective entries are used as the key for the comparision.

--

Did I forget something?