[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Yet another take on ppfDirTree

I bet everybody thougth I had said everything I had to say about how to most
effeciently find filenames in a tree by now. Well, think again :)

I've implemented yet another feature into the tree (or, well, it's actually
a seperate class that uses the tree class, but anyway) to speed the process
of finding up.

This time, the idea is hashing the top of the tree with a standard
hash-table. It's like chained hashing, just with trees instead of linked
lists (the specific ADT used is a template parameter, so you could use a
linked list without changing the class if it had the right interface).

The hashtable computes the offset into the array by taking the hashvalue of
the entry-to-find (which we are computing anyway) and using the % operator
to get it withind the valid range of the array. Each position in the table
is a seperate tree, and when the key is mapped to one location, a search is
made in that tree for that key exactly as if a hash-table had not been used.

The benefit of this is that each tree is smaller by a factor of S,
where S is the size of the hash-table, assuming a perfect-distribution hash

(notice that the benchmarks here are pretty approximate. The 20% speed
should be pretty accurate, though).

The important thing is searching. We get a 20% speed increase over a normal
hashed tree in this regard with a hash-table size of 16. a hash-table size
of 16 takes an array of 16 trees, which is an additional 16*16 bytes of
memory consumed. That's 256 bytes of memory. I think that's pretty good,
especially considering how easy it was to implement this.

The effect of the speed of adding elements one-by-one to the tree is much
more profound. The gain is in excess of 100%, with a hash-table size of 16.
Adding elements one-by-one isn't that important, but still, 100% is quite a
substantial gain.

With small trees, the gain for seraching gets below 20%, but it's never
slower than not doing top-hashing at all (except for perhaps trees with 1,2
or 3 entries). Small trees are trees with an entry count of 10-20 entries.

The speed gain is substantial (remember than those 20% is not to be added to
the speed gain of hashed trees over non-hashed trees, but multiplied) and
the extra memory consumption is very small. I think we should use this idea.


On another note, I think it might be worthwhile to implement a
memory-handler for ppfFile objects (or something like that. Those things you
get a pointer to when opening a file), as many games are likely to be
causing the allocation of alot of these. For some games, the number will be
thousands and thousands (especially if each level causes a reopening of

This will not only cause alot of calls to new, which isn't that fast, but
also the creation of alot of small objects, many of which may not have a
very long life-span. That's not very good for the state of the system, as it
can very easily cause alot of fragmentation of memory, which not only makes
everything slower because of worse locality-of-reference, but also makes the
effective amount of free memory diminish, as you probably already know.

btw, hash-table size 16 is of course not the optional for all sizes of n
where n is the amount of entries. For many trees, its probably somewhat
smaller, for a few larger. After a bit of benchmarking, I've found, though,
that its a good estimate for all trees.

I haven't the patience to find out the precise mathmatical function that
will compute the optimum tablesize. If somebody has too much free time, then
perhaps that someone could do it? Fat chance... :)