[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PFile notes



> > serialize and close the file.
> >
> >However, that leaves serialization of the header (or footer, as
> is the case
>
> Done in the second constructor of PakFile (calls CreateHeader ()).
>
> The PakFile writing code I wrote worked as follows:
>
> (1) The User calls ppfCreatePak (), which creates a new PakFile object
> using the second constructor. Here PakFile::CreateHeader () is called,
> which writes the Pak header, creating a completely empty PakFile. After
> that the PakFile object is immediately destroyed.
>
> (2) The user mounts the new PakFile with the ppfMF_write flag.
> The code for
> this creates a new PakFile object using the first constructor, which
> recognizes the Pak as a new one, allowing the adding of files.
>
Hmm... Is there a good reason for doing it in this way?

If the pakfile construction process is interrupted for some reason, there'll
be a useless empty pak file lying around.

It would seem to me to be somewhat more effecient to delay creating any
actual file for the pakfile untill the very end, so that all of the pakfile
can be written in one go. That way, nothing unneded is written, nothing that
is already known is read form disk (the empty pak), caching is better
utilised and no split-second-life-time PakFile objects are created.

> (3) The user uses ppfMkdir () and ppfCopy () to add directories and files
> to the Pak. The changes are only reflected in the Pak's dir structure in
> RAM (now composed of PakWDir and PakWFile objects) until ...
>
> (4) The user umounts the Pak. The destructor of the PakFile object is
> executed, which in turn makes the toplevel dir write itself (recursively).
> The Pak is closed. Done.
>
> The PakFile contains a header (always at the very start of the Pak fixed
> size, standard layout independent of the Pak format) plus an "extended"
> header (located at any place, format dependent layout/size,
> eventually even
> variable size (I'm not sure about that yet)). The main header contains
> links to both the extended header and the root dir structure.
> These links (at least the one for the root dir struct) have to be updated
> for format 1 (& eventual later formats) after adding files, so a generic
> WriteHeader () instead of the CreateHeader currently in PakFile is better.
> Trivial.
>
>
> I guess you had some different assumptions of that process?
>
Well, yes, somewhat.

I thougth a pakfile was laid out more like this:

-always present header-
here goes stuff like the game version, PFile version, verification that
"yes, this is a PFile pak file", PFile format number etc. This would have to
stay the same if we want compatibility with previous versions of PFile. The
length of the extended header is also in here.

-extended header-
This is where version and format specific information goes.

Each version of PFile will only read so much of the extended header. If its
longer than expected, then all versions of PFile will know that the
remaining parts of the extended header is meant for other later versions.
Simple compatibility is created in a non-expensive manner.

-directory structure: format 0 only-
Ie, what you get by serializing the top Directory.

-file contents-

-directory structure: format 1 only-


I'm not so fond of giving any of these parts "random" positions within the
PFile, as it complicates things without really giving any benefit (?).

>
> Ok, some notes about the code you added to PakFile::Umount () :
>
> 		Directory::SetContentOffset(0);
> 		Directory::SetSerializingPakFile(this);
>
Oops. Actually, the call to SetSerializingPakFile() is only nessecary for
serializing a pak *from* storage, not to it.

The call to SetContentOffset() can be made unnessecary if it's possible to
get the current offset of a stream (ie, the amount of bytes that has been
written to the stream, plus the starting offset). You know how? There's a
function that will give you the stream position, but my docs specificly
state I should not try to read this value, only pass it back into the stream
to reset it to that position at a later time.

> You assume serializations to PakFiles only - at the level of the Directory
> class. Bad.
>
Serialization to storage is completely independent of the PakFile. Directory
will write the correct info to the stream, whereever it may point.

> This is better: PakFiles that may have to update themselves
> later (i.e. either ones being initially constructed or format 1 ones
> mounted as writeable) accept no "foreign" directories below them - only
> directories belonging to that PakFile.
>
Wouldn't that needlessly limit the use of format 1 pak files? I mean, a bit
of common sense of the part of the client should make it possible to steer
clear of any problems.

Also, I think it should be possible for as much as the client code as
possible to be oblivious of the actual format of the pak file it is
accessing and/or manipulating. In other words, I don't think a piece of
client code should have to check the format of a pakfile passed to it before
it can mount anything inside of it.

Code mounting things in pakfiles should unmount them at some time,
regardless of the format, so I don't see any major problem with this.

> That ensures that the root dir of
> the Pak and all dirs/files below it (1) are of proper type
> (PakFileDir/PakFileFile or PakFileWDir/PakFileWFile)
>
Currently, if there is a dir/file somewhere of a type that does not support
any action that they are asked to perform, they make sure to tell the
calling code "no way". This of course makes the whole process fail.

Is there a very good reason that its nicer to get this "no way" message
sooner (when the dir/file is mounted), rather than later (when its asked to
serialize), considering that giving the "no way" message sooner would remove
functionality from PFile, and that giving it later does not remove the
"safety net" that dirs/files that cannot serialize will say so?

Even if temporarily mounting things in pak files, that are to be serialized
to storage, is not a thing I see any reason for client code to do alot, I
don't think we should remove that possibility from clients, as we cannot
predict how people will use PFile. Hackers are creative and want to
experiemt, and I imagine we'll invaribly get an E-mail in the future going
something like this: "Hey! I need to do that!".

>, (2) know
> what PakFile
> object they belong to,
>
Unless you want to store that as a pointer-member of each file and dir, I
don't see how that's possible. If you do want to do that, I don't see how
that's not possible in the current scheme of things.

In any case, I don't think adding 4 bytes to each File and Directory is
really worth it to avoid the very slight inconvinience of having to call
Directory::SetSerializingPak() when loading a pak.

> thus (3) know what format the Pak is of
> and with that
> (4) know exactly how to serialize themselves.
>
The layout of the data that Directory::SerializeTo() causes to be serialized
to the stream is completely independent of what format the pak is. All that
is serialized is the directory structure and the files contained in this
structure. This will always be required, nomatter what you write to, and
even if you aren't writing to a pak.

> So the Directory/File classes only need to provide (pure?) virtual
> SerializeTo () methods. The details of that operation are then handled by
> the classes derived from them.
>
I'm not sure I understand you. That *is* how it works right now. Well, ok,
under the hood, the names are a bit different, but one should not care about
the implementation details of Directory.

> This also makes it simple to serialize to something different
> than PakFiles
> in the future.
>
That is already possible.

> Another thing - What about positioning the dir info of format 0
> paks at the end as well?
>
I've thought about this too. That would make format 1 and format 0 paks
exactly the same, wouldn't it?

Its certainly a good idea to give the possibility of appending to format 0
paks, as long as, as is the case, there is no performance hit what-so-ever.

>That would make it possible for ppfCopy () to instantly copy
> the file data instead of delaying that to umounting time. And that in turn
> makes it easier for the client to keep track of the progress (very useful
> for feedback to the gamer).
>
I think it'll actually make it harder. Well, not compared to the current
situation (no way of keeping track of progress), granted. But:

If the file content was written at once, you'd first off have the problem
that it would be quite a pain to cancel putting something in a pak. I can
think of atleast one situation in which it would be nice to have the
possibility of cancelling the inclusion of a file: a GUI pak builder where
you drag and drop files to the pak, and then lastly build it. Users of suchs
a program would certainly expect to be able to remove files again. For suchs
a program, there is also the matter of it being quite annoying setting off
harddisk-activity each time a drop of a file is made.

PFile can check that there is enough free space to write a pak in its
intirety before beginning to write it.

If something goes wrong during the construction of the pak, then the size of
the pak file will be equal to the combined sizes of the files included up to
the point of something going wrong. If writing is delayed, the size of the
pak file in this cirsumstance will be small, and it actually really doesn't
even have to exist before its written (even though the current
implementation creates it anyway).

I said I thougth it would be easier to follow the progress of the complete
process if the writing of the pak file was done all at the same time. The
reason for this is that it would then be possible to move the reponsibility
of keeping track of progress from the client program, to PFile itself, so
each and every client that needs this functionality doens't have to
implement it. PFile will not be able to do this otherwise, as it won't know
how many files there are to go, or how large those files are. If it waits
for the umount, all information will be available, so it can accurately tell
how far it has already gone, and how much is left.

We could provide for some kind of call-back that would be called each time
1% of the job had been completed.

If files are written in the order they are added to the pak, we will have to
store this, which will take up extra memory (ok, minor point, but anyway :)

Lastly, it is common to access several files in the same directory rigth
after each other. Imagine reading the whole of one file, then going on to
the next. If one is lucky, the first file is positioned in the pak rigth
before the second, so that the cache will already include some of the data
in the second file. The probability of this happening is somewhat larger if
files in the same directory are placed rigth after each other in the pak
file. This cannot be guaranteed if files are written in the order they are
added.

There's a quite good counterpoint here, though, and that is that people
using PFile will have the ability to exactly dictate the order of files in
the pak, and the people who wrote the program that uses PFile will probably
best know what file order will maximize the possibility of this happening. I
don't see people taking the time for this, though, and the effect really is
minor.

btw, are we going to implement our own caching rutines for pakfile reading?
If we do, we can avoid reading past the end of a file contained in a
pakfile...