[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Had a look at ppf_DirTree



I'm a bit rusty on my tree-theory, so if I happen to say something
nonsensical or downright wrong in this Email, please point it out to me...

ppf_DirTree is implemented as a binary search tree in which each node is
dynamicly allocated and contains a pointer to its entry data (either a
ppfDirectory or a ppfFile object) aswell as pointers to its children and
members to access the data of its entry data. The tree is kept balanced at
all times, except if the tree it is responsible for is not generated by
itself, in which case the tree does not even keep the property of being a
binary search tree.

I propose a somewhat different way of handling the need for ppf_Directory to
keep track of files and directories. First off, I propose that seperate
trees should be maintained for files and directories. This will eliminate
the need for each node to have two entry data pointers, shaving 4 bytes off
every node, while only making ppf_Directory grow in size by 8 bytes: IMHO a
very beneficial deal. Search performance will not be hampered by this, it
will in fact be increased, as it can often be known wheter what is searched
for is a file or a directory, thereby in many cases cutting search time in
half. If this is not known, search performance will in best, worst and
average cases be exactly the same as for the solution with only 1 tree (I
believe).

To make it easier to implement this, I propose templatising ppf_DirTree
(possibly changing its name) and thereby also make it possible to reuse this
class in other contexts. If template-bloat is too great a consearn,
implement the tree class using void* (ie, without templates) and create a
templatised wrapper class with only inline methods that will wrap all the
methods of the tree class that takes or returns void*. The template class
will have all its functions inlined, and there will be no template bloat
what-so-ever.

Also, I propose that the tree be implemented as a constantly complete tree.
This has the benefit of making it possible to remove the two child pointers
in the node class, shaving off 8 bytes of each node. This then means we
don't even need a node class, as all we have left of the node class is a
pointer to the data entry.

How is this possible? Well, the case is that complete trees can be quite
effeciently stored in an array. If we know how many nodes we will be
handling before hand, we can avoid all reallocation, thereby making a single
array the optimum solution for this. If the amount of contained nodes will
often change drastically in size, an arrays of pointers to arrays. There
would then never be any need for an array to grow or shrink, as each level
of the tree would be allocated and deallocated dynamicly. ok, the "master"
array would have to be reallocated each time the depth of the tree changed,
but that usually happends only rarely, and it would also be very small (a
tree with a depth of 10 can hold 1024 entries).

Maintaining a complete tree takes a little more work than maintaining a
standard binary search tree, but the memory gain is quite substantial,
especially for large trees. The current implementation is constantly kept
balanced, so the difference shouldn't be a major one (I believe).

The current implementation of ppf_DirTree can handle a preconstructed tree
by passing it the root node of this tree. ppf_DirTree becomes read-only in
this case, though (why?). This can easily be implemented with this
implementation too, thougth I don't see why this feature is nessecary.

To minimize the initial sorting time, I propose that mounting should happen
in sessions. A client program would then call StartMountSession() when he
wanted to begin mounting his pak files, and call EndMountSession() when he
had finished. This way, all data would be present before the sorting
algoritm began constructing the trees, thereby making it possible to
allocate the needed memory all at once aswell as make it faster to generate
the trees (I believe). Sorting would, btw, only be needed if writing pak
files, or if pak file overloading was utilised (depends a little on how this
is implemented, though; some implementations wouldn't need a sort). Just
mounting a pak file that didn't overlap with any other pak files would not
cause a sort as it would already be sorted before it was written to the pak
file (right?).

If the implementation of the overloaded pak files feature does not add
anything to already constructed trees (and therefore cause a resort of the
implicated trees), doing mounts in sessions would be pointless.

If all these changes are applied, I believe the result would be a faster
tree that would take up less memory, cause much less calls to new and
delete, and cause less fragmentation of memory, as the allocated blocks of
memory would be much larger. On top of that, we get a solution that conforms
to the ideology of OOP and C++ better than the current, and we get an
implementation of this special kind of tree (what's the name of this kind of
tree, anyway?) that can be reused to fulfill any porpuse, as it is
implemented completely independantly of the code that is to use it and is
templatised.