[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PFile notes

Bjarke Hammersholt Roune wrote:

>> The PakFile writing code I wrote worked as follows:
>> (1) The User calls ppfCreatePak (), which creates a new PakFile object
>> using the second constructor. Here PakFile::CreateHeader () is called,
>> which writes the Pak header, creating a completely empty PakFile. After
>> that the PakFile object is immediately destroyed.
>> (2) The user mounts the new PakFile with the ppfMF_write flag.
>> The code for
>> this creates a new PakFile object using the first constructor, which
>> recognizes the Pak as a new one, allowing the adding of files.
>Hmm... Is there a good reason for doing it in this way?

Yes. That way ppfCopy can be a generic function, just copying files from A
to B.

>If the pakfile construction process is interrupted for some reason, there'll
>be a useless empty pak file lying around.

If it is interrupted by the user, then, well, it's his own fault. Perhaps
it's even what he wanted.
And if it's interrupted by some external force (the game crashing /...)
then it doesn't matter whether it is interrupted in a ppfCopy () run
copying a single file or in the syncronization code. The result is exactly
the same.

>It would seem to me to be somewhat more effecient to delay creating any
>actual file for the pakfile untill the very end, so that all of the pakfile
>can be written in one go. That way, nothing unneded is written, nothing that

Writing file-per-file also doesn't need anything unneeded. It just writes
the file data immediately, and the dir data is written at serialization

>is already known is read form disk (the empty pak), caching is better

Er, the header is about 100 bytes.

>utilised and no split-second-life-time PakFile objects are created.

Ok, the PakFile object create-and-destroy thing isn't nice, but apart from
that it absolutely doesn't matter.

>> I guess you had some different assumptions of that process?
>Well, yes, somewhat.
>I thougth a pakfile was laid out more like this:
>-always present header-
>here goes stuff like the game version, PFile version, verification that
>"yes, this is a PFile pak file", PFile format number etc. This would have to
>stay the same if we want compatibility with previous versions of PFile. The
>length of the extended header is also in here.
>-extended header-
>This is where version and format specific information goes.

Eventually the XHeader can grow with the PakFile in later versions, so it
shouldn't be at a fixed position.

>-directory structure: format 0 only-
>Ie, what you get by serializing the top Directory.
>-file contents-
>-directory structure: format 1 only-
>I'm not so fond of giving any of these parts "random" positions within the
>PFile, as it complicates things without really giving any benefit (?).
>> Ok, some notes about the code you added to PakFile::Umount () :
>> 		Directory::SetContentOffset(0);
>> 		Directory::SetSerializingPakFile(this);
>Oops. Actually, the call to SetSerializingPakFile() is only nessecary for
>serializing a pak *from* storage, not to it.
>The call to SetContentOffset() can be made unnessecary if it's possible to
>get the current offset of a stream (ie, the amount of bytes that has been
>written to the stream, plus the starting offset). You know how? There's a
>function that will give you the stream position, but my docs specificly
>state I should not try to read this value, only pass it back into the stream
>to reset it to that position at a later time.

ftell () ?

>> You assume serializations to PakFiles only - at the level of the Directory
>> class. Bad.
>Serialization to storage is completely independent of the PakFile. Directory
>will write the correct info to the stream, whereever it may point.

What I meant is that the generic Directory class has a PakFile specific
function (SetSerializingPakFile ()).

>> This is better: PakFiles that may have to update themselves
>> later (i.e. either ones being initially constructed or format 1 ones
>> mounted as writeable) accept no "foreign" directories below them - only
>> directories belonging to that PakFile.
>Wouldn't that needlessly limit the use of format 1 pak files? I mean, a bit
>of common sense of the part of the client should make it possible to steer
>clear of any problems.

Ok, my mistake. Actually that restriction is unneccessary. Directory has a
method GetVFSInterface () which returns - in the case of PakFile(W)Dirs the
PakFile the dir belongs to. If make that a general method of DirEntry
(which is a good idea anyway) we can query whether the respective file/dir
has to be serialized. If not, it can be simply skipped.
No - even better. Umounting is anyway impossible if some other FS is
mounted below. That means at synchronization time we do not have any
foreign dirs/files to care about.

>Is there a very good reason that its nicer to get this "no way" message
>sooner (when the dir/file is mounted), rather than later (when its asked to
copied via ppfCopy () you mean?

>serialize), considering that giving the "no way" message sooner would remove
>functionality from PFile, and that giving it later does not remove the
>"safety net" that dirs/files that cannot serialize will say so?

>>, (2) know
>> what PakFile
>> object they belong to,
>Unless you want to store that as a pointer-member of each file and dir, I

only in each dir. Files can still access it via their parent dir.

>In any case, I don't think adding 4 bytes to each File and Directory is
>really worth it to avoid the very slight inconvinience of having to call
>Directory::SetSerializingPak() when loading a pak.

BTW - where is this method declared? I couldn't find it. And why is it
static (according to the call in PakFile::Umount)?

>> So the Directory/File classes only need to provide (pure?) virtual
>> SerializeTo () methods. The details of that operation are then handled by
>> the classes derived from them.
>I'm not sure I understand you. That *is* how it works right now. Well, ok,
>under the hood, the names are a bit different, but one should not care about
>the implementation details of Directory.

Ok, you're right. I was a bit confused because the PakFileDir etc classes
are so short, as the main part of the serialization is done by HashTable's
MISC (which is BTW outdated according to its comment) AFAIS.

>> Another thing - What about positioning the dir info of format 0
>> paks at the end as well?
>I've thought about this too. That would make format 1 and format 0 paks
>exactly the same, wouldn't it?

Hmm, it would make the distinction pointless. At least I think so. I'll
think over it some more... Would be good.

>Its certainly a good idea to give the possibility of appending to format 0
>paks, as long as, as is the case, there is no performance hit what-so-ever.

Right. But then adding to HashTable needs to be adjusted a bit, so that it
doesn't require the HashToDyn () ; Add () ; DynToHash () op at *every* Add
operation. Opening a Pak using HashTable, converting that to DynHashTable
for work and converting back to HashTable for serialization only would be
good IMHO.
That would mean all searches are done via DynHashTable (fine, as the
unsorted overflow list won't be long in most cases - and when it is long it
won't be searched usually) and HashTable only does serialization work.

>>That would make it possible for ppfCopy () to instantly copy
>> the file data instead of delaying that to umounting time. And that in turn
>> makes it easier for the client to keep track of the progress (very useful
>> for feedback to the gamer).
>I think it'll actually make it harder. Well, not compared to the current
>situation (no way of keeping track of progress), granted. But:
>If the file content was written at once, you'd first off have the problem
>that it would be quite a pain to cancel putting something in a pak. I can
>think of atleast one situation in which it would be nice to have the
>possibility of cancelling the inclusion of a file: a GUI pak builder where
>you drag and drop files to the pak, and then lastly build it. Users of suchs
>a program would certainly expect to be able to remove files again. For suchs
>a program, there is also the matter of it being quite annoying setting off
>harddisk-activity each time a drop of a file is made.

That app needs to keep track about what files will be stored anyway, i.e.
it has to have some internal representation of the target hierarchy. So it
would be stupid if it actually called ppfCopy () immediately at each drag

>PFile can check that there is enough free space to write a pak in its
>intirety before beginning to write it.

Hmm, ok, that's a point. 

>If something goes wrong during the construction of the pak, then the size of
>the pak file will be equal to the combined sizes of the files included up to
>the point of something going wrong. If writing is delayed, the size of the
>pak file in this cirsumstance will be small, and it actually really doesn't
>even have to exist before its written (even though the current
>implementation creates it anyway).

See my note concerning that somewhere at the top of the mail.

>I said I thougth it would be easier to follow the progress of the complete
>process if the writing of the pak file was done all at the same time. The
>reason for this is that it would then be possible to move the reponsibility
>of keeping track of progress from the client program, to PFile itself, so
>each and every client that needs this functionality doens't have to
>implement it. PFile will not be able to do this otherwise, as it won't know
>how many files there are to go, or how large those files are. If it waits
>for the umount, all information will be available, so it can accurately tell
>how far it has already gone, and how much is left.

ppfCopy () is designed to be able to copy entire directory hierarchies at
once so it also has that info even when it immediately writes the file
data. Ok, it propably doesn't know how much directory info has to be
written, but that's no issue. The point is that it's up to the user - if he
wants to copy file-by-file, fine. If he wants PFile to assemble the entire
thing at once, ok, it does that.
And it's much more intuitive if the copy operation actually happens when
you call, well, the copy command. And that this copy command only succeeds
if the copy operation succeeded.

>We could provide for some kind of call-back that would be called each time
>1% of the job had been completed.

Yup, that would be good. But that's for later.

>btw, are we going to implement our own caching rutines for pakfile reading?
>If we do, we can avoid reading past the end of a file contained in a

No. Caching is the job of the OS. And if the OS knows that it's better to
read the data in 4096 Byte chunks, even if the requested file is only 200
Bytes long, then fine. We can't bypass OS caching anyway.


Drive A: not responding...Formatting C: instead