[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sv: A little more exact specification of the hash-table



Bjarke Hammersholt Roune wrote:

>>Best case is O(1) with hashes IIRC.
>>
>I hadn't thought of that. It's not exactly O(1), though, but rather O(c)
>where c is a constant. Ie, constant-time performance.

i.e. O(1) ;)
The O notation always means "in the order of ...", i.e. 
O(1) is constant (any constant, no matter how big),
O(n) is linear (i.e. 1*n or 1000000000*n, doesn't matter)

etc. 

>>>** Serialization
>>>The container can tell how much space it would take up were it to be asked
>>>to serialize itself and its entries to a target in the state it currently
>>>is.
>>
>>Bad. That depends on the PakFile format (e.g. with Format 1 it has to be
>>possible to add single entries later on).
>>
>That's why it says "in the state it currently is". Why wouldn't the
>container (or, if the container isn't responsible for this blank-space
>behavior, the calling code) be aware of this and include it in the resulting
>value? You just need to tell it "I need info for format 0" or "I need info
>for format 1".

Right. But with Format 1 e.g. it isn't guaranteed that all entries of a
given directory are stored en bloc (i.e. entries added later on are
somewhere else.)

>>>[this depends a little on the IO utilised, so either this:]
>>>The container supports serialization of itself and its entries to any
>target
>>>that fully implements the interface of the standard library class ostream,
>>>and from any source that fully implements the interface of the standard
>>>library class istream.
>>>[or this:]
>>>The container supports serialization of itself and its entries to any
>target
>>>handle opened via the ppfOpen() function, and from any source handle
>opened
>>>via the ppfOpen() function.
>>>[one of those]
>>>
>>Bad. Again problems with the different Pak formats. Not only serialization
>>is required but also updating of an existing in-pak-structure.
>>
>How would you implement that? Keep a change log? Keep two in-memory
>structures, one altered and one not, and compute the difference and only

Nope. Write on access.
As soon as a new file is added the neccessary info is written to the pak,
and the ppfWrite () calls to that file also "directly" write to the Pak and
update the entry properties (mtime, size). Otherwise we'd have silly limits
on file size as we'd need to store everything in RAM until the thing is
finally serialized.

>Serializing will also update an existing in-pak-structure. Format 1 paks
>need special writing code anyway.

Right.

>>>Did I forget something?
>>>
>>Perhaps the other three cases in which we require dynamically resizable
>>hashes ? ;)
>>
>Yes. I think we need two container classes for this. One for format 0, and
>another one for the others. Or, perhaps that's not really needed. Random
>data can be converted to the highly optimised internal used in the container
>outlined here. Not very complicated to do. However, I think that is a little
>ineffecient to do for large directories that are only used sparingly.
>
>Imagine a directory cached from the internet. We don't have the hash-values
>for each file-/dir-name, so we have to compute those for each string. Then
>we need to sort the entries in ascending order based on their hash-value
>modulus'ed with the string size (which we also need to compute, btw).
>
>Now, all this work, and then the client really only needs to access 3 files,
>and it doesn't even get to do that, because rigth after the first access,
>the internet directory is changed... :(

Unimportant. Reading ftp dirs is slow anyway:
(1) get the dir listing over the net
(2) parse the text and extract file names and their attributes
(3) construct the proper file and dir objects

This will take far longer than reorganizing the hash


	Christian
--

Drive A: not responding...Formatting C: instead