[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sv: A little more exact specification of the hash-table



>>The container does not allow for the capability to iterate over all
entries.
>>
>Wrong. Iteration has to be possible for ppfReadDir () to work.
>A seperate Iterator class should be best for this.
>
Ok, an external iterator it is.

>>The container guarantees that access will, in the best, average and worst
>>case, have a performance of O(log n).
>
>Best case is O(1) with hashes IIRC.
>
I hadn't thought of that. It's not exactly O(1), though, but rather O(c)
where c is a constant. Ie, constant-time performance.

>Another thing: ppfFILE contains a pointer to the respective ppf_File
object,
>so the address of that must not change due to container resorting etc.
>
Ok.

>>** Serialization
>>The container can tell how much space it would take up were it to be asked
>>to serialize itself and its entries to a target in the state it currently
>>is.
>
>Bad. That depends on the PakFile format (e.g. with Format 1 it has to be
>possible to add single entries later on).
>
That's why it says "in the state it currently is". Why wouldn't the
container (or, if the container isn't responsible for this blank-space
behavior, the calling code) be aware of this and include it in the resulting
value? You just need to tell it "I need info for format 0" or "I need info
for format 1".

>BTW don't forget that datatype sizes in PakFiles might be different from
>the ones of the runtime system (offsets/sizes/times are always 64Bit
>(little endian) in Paks).
>
I'll take note of that.

>>[this depends a little on the IO utilised, so either this:]
>>The container supports serialization of itself and its entries to any
target
>>that fully implements the interface of the standard library class ostream,
>>and from any source that fully implements the interface of the standard
>>library class istream.
>>[or this:]
>>The container supports serialization of itself and its entries to any
target
>>handle opened via the ppfOpen() function, and from any source handle
opened
>>via the ppfOpen() function.
>>[one of those]
>>
>Bad. Again problems with the different Pak formats. Not only serialization
>is required but also updating of an existing in-pak-structure.
>
How would you implement that? Keep a change log? Keep two in-memory
structures, one altered and one not, and compute the difference and only
write this? Considering that updating would only be time-critical with
format 1 paks. Those have very small file counts, so updating performance
shouldn't matter that much.

Serializing will also update an existing in-pak-structure. Format 1 paks
need special writing code anyway.

>Don't guarantee a fixed array size. It's unneccessary anyway.
>
Agreed.

>>Did I forget something?
>>
>Perhaps the other three cases in which we require dynamically resizable
>hashes ? ;)
>
Yes. I think we need two container classes for this. One for format 0, and
another one for the others. Or, perhaps that's not really needed. Random
data can be converted to the highly optimised internal used in the container
outlined here. Not very complicated to do. However, I think that is a little
ineffecient to do for large directories that are only used sparingly.

Imagine a directory cached from the internet. We don't have the hash-values
for each file-/dir-name, so we have to compute those for each string. Then
we need to sort the entries in ascending order based on their hash-value
modulus'ed with the string size (which we also need to compute, btw).

Now, all this work, and then the client really only needs to access 3 files,
and it doesn't even get to do that, because rigth after the first access,
the internet directory is changed... :(