[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A little more exact specification of the hash-table



Bjarke Hammersholt Roune wrote:

>Actually, now I'm thinking about it, is it you (Christian) or me who is
>going to do the hash-table thingie? I am under the impression it is me, but
>I don't remember this explicitly being stated. Just an effort to eliminate
>possible duplicate effort.

Er, you.

>Anyways, I thougth I'd write up a little more specific specification of the
>hash-table and its capabilities. Its only a draft. The hash-table is from
>here on referred to as "the container". The data it contains, dir and file
>data, is from here on referred to as "entries". The class ppf_URLInfo is
>simply referred to as "a path".
>
>** Access
>The container allows access to its entries by means of a key. This key is a
>path. The container will search incrementally for an entry, searching
>containers that are also entries in the container, as specified by the path.
>
>The container does not allow for the capability to iterate over all entries.

Wrong. Iteration has to be possible for ppfReadDir () to work.
A seperate Iterator class should be best for this.

>The container guarantees that access will, in the best, average and worst
>case, have a performance of O(log n).

Best case is O(1) with hashes IIRC.

Another thing: ppfFILE contains a pointer to the respective ppf_File object,
so the address of that must not change due to container resorting etc.

>** Serialization
>The container can tell how much space it would take up were it to be asked
>to serialize itself and its entries to a target in the state it currently
>is.

Bad. That depends on the PakFile format (e.g. with Format 1 it has to be
possible to add single entries later on).
BTW don't forget that datatype sizes in PakFiles might be different from
the ones of the runtime system (offsets/sizes/times are always 64Bit
(little endian) in Paks).

>[this depends a little on the IO utilised, so either this:]
>The container supports serialization of itself and its entries to any target
>that fully implements the interface of the standard library class ostream,
>and from any source that fully implements the interface of the standard
>library class istream.
>[or this:]
>The container supports serialization of itself and its entries to any target
>handle opened via the ppfOpen() function, and from any source handle opened
>via the ppfOpen() function.
>[one of those]

Bad. Again problems with the different Pak formats. Not only serialization
is required but also updating of an existing in-pak-structure.

>** Internal representation
>
>The container is represented internally by two arrays. The first array is as
>large as the table-size times 8 (ie, 8 bytes per position). The second array
>is as large as the amount of entries times the size of each entry.
>
>The first array holds a pointer and an unsigned integer value at each

sizeof (void *) == 8 on e.g. Alpha and on some systems sizeof (int) == 8.

Don't guarantee a fixed array size. It's unneccessary anyway.

>position. The pointer points to somewhere in the second array, or is equal
>to 0 (NULL). The integer holds a value here called D. The entry pointed to
>(if any), and the next D-1 entries, all share the characteristic that the
>value returned by the modulus operator given each entries hash-value as its
>first value, and the table-size as the second (ie, hash % table-size) is
>equal to the offset into the first array the position in the first array
>has.
>
>The entries in this sub-part of the second array are sorted in ascending
>order as based on their hash-value. In cases the hash-value matches, the
>name of the respective entries are used as the key for the comparision.
>
>--
>
>Did I forget something?

Perhaps the other three cases in which we require dynamically resizable
hashes ? ;)


	Christian
--

Drive A: not responding...Formatting C: instead