[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sv: Strangest thing happened; it seems I'm perfectly normal. That doctor must be a fraud! I want my money back!



>>If its unbalanced or not even a search tree, and that is a problem (which
it
>>is), why not just MAKE it a balanced binary search tree (and if my idea is
>>used, also complete)? How would you end up with suchs a tree, anyway?
>>
>Well, the file format is an open spec, so anyone can implement her own
>writing code...
>But I'm more and more leaning towards just assuming the tree is correct
>(perhaps just doing some simple sanity checks in the debug version).
>
why not just make it part of the spec how this data *must* be stored?
(storing incomplete binary trees is somewhat problematic, or atlweast
ineffecient, btw).

>Most of all we want file opening to be fast. Mounting performance is
>secondary.
>
Exactly. However, taking *too* long for mounting is very undesireable: I
really don't think it should take more than 1-2 seconds or something like
that. I remember Ceaser III taking a couple of minutes to start up (ok, I
was using a slow computer, but it played the game without any problems at
"ultra fast" speed, so it wasn't out of the target range), I also remember
that being extremely irritating. On my new computers, it's a little better,
but it still takes ages compared to other (much larger) games.

>>I find this implementation effecient, elegant, easy to code and OOP'y
>>(things take care of themselves and does so with minimum effort). What
more
>>can you ask for? We also get rid of that annoying mount/unmont session
>>thingy.
>>
>Well, it's not as fast as it could be, but it's *much* simpler, right.
>

That depends on how many files you'll be opening. I don't know the treshold,
but if you open an amount of files below this threshold, this way is the
fastest. Any amount above the threshold will be a little slower. I haven't
tested this, but I suspect this threshold being reasonably high, as there
really is ALOT of mess that needs to be cleaned up every time a pak file is
(un)mounted.

>Hmmm, I'll have to think about that...
>Some speed benchmark results  comparing the speed of added hashing vs.
>plain trees for small/mediun/large directories would be helpful here.
>
I'll see what I can do.

>Yup. For prototyping some game I wouldn't even recommend using Format #1
>Paks, as they only allow *adding* of files, not removing or changing.
>
>Format #1 Paks are intended for data the game creates on the fly, e.g.
>savegames.
>
You should update the documentation with this.

>>If format 1 dir entries are all stored in an unsorted linked list, why
would
>>we need any sorting? An unsorted linked list is by definition not
sorted...
>>
>We need to sort them on reading the dir info (by inserting the entries into
>the tree). That's what I meant. In contrast we don't need to do any sorting
>when reading a type 0 Pak, because its dir structures are already sorted.
>
This strikes me as a bit strange way to do things. If its going to be sorted
anyway at some point, why not just sort them at write-time instead of
read-time? You just need to be ble to specify all the files that go in a pak
format 1 when writing it at once for this to be effecient. You would still
be able to add files, but then that would require adding the new entry to
the stored tree; not a major thing.

Btw, how would you make adding files to a format 1 pak work? I mean, the
data saying what is in the pak file is in the head of the file. This chunk
of data would have to grow to add a new file. You want to offset all data in
the file (except some of the head) by however much it is (a couple of bytes)
to get space for this? You're going to leave a bit of slack? Store this data
in the end of the file?

>>You haven't touched these issues which I raise in my Email(s):
>>
>>1. Implementing a complete binary tree instead of the current incomplete
>>one.
>
>Could be useful for format 0, together with 2. Classification: optimization
>to be explored deeper when the main functionality is in place and working.
>
Well, actually I haven't got anything to do right now, except for that
benchmark thing. I'd be happy to implement it. I've already figured out how
to add and remove files while keeping the tree complete at all times (ok, so
I probably could've found that info on the net, but didn't have anything to
do...).

>2. Storing the nodes in arrays rather than dynamicly allocating each and
>every one (becomes feasible if the tree is complete).
>
This also makes it MUCH easier to store the tree: just dump the array to the
stream.

>>3. Implementing the tree as a template.
>
>certainly not now. I still believe that making the thing generic would slow
>things down. But a generic version of it should be created for other uses,
>yes.
>
There is no special requirements of the tree. We basically just want to
store and retrieve strings, with possibly some hashing. Implementing a
generic tree and deriving a class from the char*/string specialisation of
that (if we do use hashing) would be IMHO ideal. I don't see how that would
make things slower, as what we need is just a completely standard complete
binary search tree.

>>4. Using seperate trees for files and dirs (become feasible if the tree is
>>templated).
>
>Useless. Have a look at how the files of a typical game are organized - in
>a directory are usually either (almost) only subdirectories or (almost)
>only files. Especially the large directories (several 100 entries) contain
>only files.
>
Usually, as you correctly have observed, data is stored in sub-directories
of the main game dir, sometimes data is stored 2-3-4 levels from the main
game dir. In other words, the directory data is going to get accessed ALOT
of times. Keeping directory data for itself therefore gives a performance
boost, since it can probably slice of somewhere between 1-2 lookups for each
directory a file is away from the main game dir.

Often, you will have a data dir with alot of files, and then a few
directories. In suchs a setup, the benefit becomes more profound.

Besides that, it also makes things simpler to code and understand (you get
rid of some casting and can name the two trees with more saying names). You
also save a little memory for each and every node in each and every tree
that is loaded, as you don't have to keep track of wheter a specific entry
is a dir or a file.