[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sv: Yet another take on ppfDirTree



>Well, but resizing arrays is expensive. Wait - actually it isnt, I mean, it
>*is* expensive but we don't have to do it ;)
>
:)

>The nice thing is that (for Format 0 Paks, i.e. the only place where the
>hash construction speed really matters) we know exactly how many entries
>the hash (in total) will have. That means we can just allocate an array for
>all these entries and (given that the entries are stored in the Pak sorted
>by hash key, which is only natural) we can just use an algo like that:
>
>EntryArray = new Entry [count]; // the array holding all entries
>Hashtable = new  Entry* [HTSize]; // hash table
>
>[...]
>
Actually, I think there's a both simpler and faster way to do it:

--
int entryCount = ReadUInt();
int hashTableSize = ReadUInt();

Entry* pEntryArray = new Entry[entryCount];
Entry** pHashTable = new Entry*[hashTableSize];

ReadBytes(pEntryArray, entryCount * sizeof(Entry));
ReadBytes(pHashTable, entryCount * 4);

Entry* pTerm = pHashTable + hashTableSize;
for (Entry* pPos = pHashTable; pPos != pTerm; ++pPos)
    pPos = pHashTable + ((ppSizeT)pPos);

--

And that's all. First, the sizes is read. Then, the complete array of
entries is read. Then data is read on how the array of entries is to be
divided up in arrays. Let me give an example:

hashtable:

position 1 (offset 0):   5 entries
position 2 (offset 1):   2 entries
position 3 (offset 2):   0 entries
position 4 (offset 3):   10 entries

For suchs a hash-table, this would be the data stored:

17 (number of entries)
4 (size of the hashtable)
17xEntry (you get ONE guess :)
0 5 7 7

The last 4 numbers are the interesting part. The first number is 0. This
means that the first array of entries begin at offset 0. The next number is
5. This means that the second array of entries begin at offset 5. The next
number is 7, so the third array begin at offset 7. The next number is also
7, so the fourth array begins at offset 7 also (meaning that the third array
is empty). The fourth array is the last, so it can safely be assumed to take
up the rest of the entries. We really could also skip the first number, as
it will always be 0 anyway.

Now, of course, the values 0, 5 and 7 are not much use as pointer values:
they are offsets. This is easily fixed by adding them to the memory address
of the beginning of the array. That will turn them into valid pointers into
the array.

Its also possible to store the length of each array rather then the offset
of its start. I don't think it makes alot of difference, though, as we need
to know both the length and the offset, so we're going to be doing some
computation anyway.

Both simpler and faster :)

>>LoadFactor = NrOfEntries / HashTableSize
>>
>>So, with a load-factor of 2, we have 2 times as many entries as we have
>>positions, so, in average, each position has 2 entries.
>
>Ah, ok. I thought it was the inverse of that. I'd rather use a value of 1
>or even a bit lower. The extra memory is almost neglible and the (average)
>speedup is really high (as the average chain length would be 1).
>
This is taken right out of the book "data structures and other objects using
C++":

Loadfactor - average number of entries that is examined
0,5 - 1,25
0,6 - 1,30
0,7 - 1,35
0,8 - 1,40
0,9 - 1,45
1,0 - 1,50
2,0 - 2,00
4,0 - 3,00

with a loadfactor of 1, we only need to examine 1,5 entries averagely. With
a loadfactor of 1, we use overhead memory of 8 bytes per entry (each array
position takes a pointer and a size, totalling 8 bytes). For 10.000 entries,
that's 80.000 bytes. Actually, we're also storing hash values, so that's an
extra 4 bytes, making it more like 120.000 bytes. That's 117 KBs.

For a loadfactor of 0,5, we have two array positions per entry, adding
another 4 bytes per entry, making the total 160.000 bytes, or 156 KBs.

The conclusion, of course, is that I've just shown the opposite of what I
wanted to show: we *do* want a load factor of 1, or perhaps a little less :)

>Cool, another misunderstanding ;)
>Perhaps we could optimize our conversation by simply assuming beforehand
>that we *will* misunderstand each other ;)
>
Yeah, like an auto-responder that responds to any e-mail from one-another
with a "what do you mean by that?" :)

>>>We'd of course have to do some benchmarking & tuning with it, but the
>>>basics should be fine.
>>>
>>Outside of assembler-optimizations (which could take 4 to 8 bytes of the
>>string in one iteration, based on the architeture), I don't see how you
>>could make it faster.
>
>I don't mean making it faster, but making it produce better (more evenly
>distributed) hash keys for our situation (e.g. most filenames will have a
>'.' character plus three alphabetic characters at the end, and these three
>letter are often the same for all files in a given directory)
>
Hmm... I don't think its possible to make it better, or, if it is, I suspect
it will be more work finding out that its worth. You'd have to search for
the '.', not all files have it. some filenames will be less than 4
characters, so you can't just exclude the last 4 characters from the
algorithm.

>Perhaps we can add some config option to change that value, but growing
>that array shouldn't matter that much as well (after all it's just a matter
>of allocating an additional array).
>
Exactly. Read my other mail on "config options".

>>There are three drawbacks, however: Each allocator can only allocate
memory
>>in blocks of one size. This size is a constructor parameter, and cannot be
>
>Usually ok. Is it possible to give it a hint about how many blocks will be
>needed?
>
It wouldn't make it faster to preallocate the blocks before they were
needed, as it takes N amounts of calls to new to allocate N blocks, nomatter
what. The allocation method has to check that there is free units in any
case, and if there is, it just returns a free unit. If not, it allocates a
new block and returns one of the new free units.

Ok, you could move the time for allocating the blocks and adding the new
free units to the free list from allocation-time to
allocator-construction-time, but the effect I'm quite sure will be minimal.
Put on top of that that once the peak of the amount of units that are needed
is reached, and the allocator is not asked to return unused blocks to the
heap, the allocator will never again have to allocate new blocks.

The short answer is "no" :)

>>The third drawback is that the allocator cannot allocate stuff that is
less
>>than 4 bytes.
>
>No problem. You won't find any objects that are smaller than that on a
>modern system.
>
Actually, now it can. It just allocates 4 bytes for each object, even if its
smaller than 4 bytes. Its a simple check in the constructor that sets the
unit size to 4 if its less than 4.

>>That string thing... I'll write an allocator that doesn't support
>>deallocation, except for "everything at once", and that requires its exact
>>size to be specified at construction. That could be used for strings, and
>>would be very fast.
>
>Keep the "exact size to be specified" out if possible. Making it growable
>isn't too difficult (e.g. allocating memory in, say 4k blocks and on each
>allocation checking ig the string completely fits in the remaining space of
>the last block, allocating another one if not) and slow.
>
Its certainly possible. However, rigth now, the allocator does only 2
assignments and one addition. All of this is with 32-bit values. If I had to
add a check for memory underflow, I'd have to add a comparision to that. Not
too big a deal, but I suspect you would be able to see it on performance,
even if I suspect that the impact would be somewhat slight.

If its not nessecary, I really don't think its a good idea to add it. If it
is nessecary, oh well, its not a very big deal. Quite small, actually.