[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PFile notes

> >> (2) The user mounts the new PakFile with the ppfMF_write flag.
> >> The code for
> >> this creates a new PakFile object using the first constructor, which
> >> recognizes the Pak as a new one, allowing the adding of files.
> >>
> >Hmm... Is there a good reason for doing it in this way?
> Yes. That way ppfCopy can be a generic function, just copying files from A
> to B.
What is the problem that makes it impossible for ppfCopy() to be generic

> >If the pakfile construction process is interrupted for some
> reason, there'll
> >be a useless empty pak file lying around.
> If it is interrupted by the user, then, well, it's his own fault. Perhaps
> it's even what he wanted.
Actually, its not empty, its invalid (or becomes so the moment a file is
written), which cannot be usefull for anybody.

Actually, the program itself should make sure to delete any unneded file if
the user quits, so its the programs fault, unless you meant something like
killing the process.

> And if it's interrupted by some external force (the game crashing /...)
> then it doesn't matter whether it is interrupted in a ppfCopy () run
> copying a single file or in the syncronization code. The result is exactly
> the same.
Ehh, serialization code... :)

Well, yes, if, for some reason, the serialization process is interrupted in
some "violent" way, then you'll have the same result.

However, in writing the pak file a little at a time, the amount of time in
which the program can be subjected to some unexpected, fatal
internal/external state/event, is streched over a timespan that can be as
short as less than a nanosecond, or as long as a couple of billion years or

If all information is written at the same time, the timespan where a fatal
situation will result in an invalid pak is limited to exactly as long as is
nessecary, no longer.

> >It would seem to me to be somewhat more effecient to delay creating any
> >actual file for the pakfile untill the very end, so that all of
> the pakfile
> >can be written in one go. That way, nothing unneded is written,
> nothing that
> Writing file-per-file also doesn't need anything unneeded. It just writes
> the file data immediately, and the dir data is written at serialization
> time.
I was referring to the empty pak that gets written.

Actually, it can be much, much more than the relatively small header, if
some files have been added to the pak, and then something goes wrong.
Mininum 100 bytes, maximum unbounded.

What's worse is that having an invalid pakfile migth cause the program to
crash at a future point. That's a very nasty thing for program developers to
fix, as they'll probably very much will have a problem finding out what the
problem is. Of course, if you're a smart developer, you'll have forseen this
problem and done something about it, but it *is* a quite easy thing to

A program migth also do something like this:

"Datafile currupt - cannot continue"

Which is quite undesireable.

> >utilised and no split-second-life-time PakFile objects are created.
> Ok, the PakFile object create-and-destroy thing isn't nice, but apart from
> that it absolutely doesn't matter.
If we eventually do go ahead with this waying of doing it, then I think we
migth want to consider having a PakFile::CreateEmptyPak() function, which
doens't create split-second-life-time PakFile objects.

> >> I guess you had some different assumptions of that process?
> >>
> >Well, yes, somewhat.
> >
> >I thougth a pakfile was laid out more like this:
> >
> >-always present header-
> >here goes stuff like the game version, PFile version, verification that
> >"yes, this is a PFile pak file", PFile format number etc. This
> would have to
> >stay the same if we want compatibility with previous versions of
> PFile. The
> >length of the extended header is also in here.
> >
> >-extended header-
> >This is where version and format specific information goes.
> Eventually the XHeader can grow with the PakFile in later versions, so it
> shouldn't be at a fixed position.
How do you get from the premise that "Eventually the XHeader can grow with
the PakFile in later versions" to the conclusion that "it shouldn't be at a
fixed position" ?

If PFile writes the length of the extended header to each pak, then what is
the problem in skipping past the part of this header that is longer than is
expected (ie, for later versions) ?

> ftell () ?
*thanks* :)

I was looking at fgetpos().

> >> You assume serializations to PakFiles only - at the level of
> the Directory
> >> class. Bad.
> >>
> >Serialization to storage is completely independent of the
> PakFile. Directory
> >will write the correct info to the stream, whereever it may point.
> What I meant is that the generic Directory class has a PakFile specific
> function (SetSerializingPakFile ()).
Well, yes, but that's only when Directory is serializing *from* a *pak*. I
don't see how that is bad. Directory is reponsible for loading a directory
hierachy from a pak file. It needs to know what pakfile to tell each loaded
PakFileDir it belongs to.

There is no problem in moving the static SetSerializingPakFile() and
GetSerializingPakFile() methods to another class (like PakFile), though.

> >Directory::SetSerializingPak() when loading a pak.
> BTW - where is this method declared? I couldn't find it. And why is it
> static (according to the call in PakFile::Umount)?
Because the specific code that creates a PakFileDir and asks it to serialize
is static, namely pp::internal::Directory::MiscDir::SerializeFrom().

> >> So the Directory/File classes only need to provide (pure?) virtual
> >> SerializeTo () methods. The details of that operation are then
> handled by
> >> the classes derived from them.
> >>
> >I'm not sure I understand you. That *is* how it works right now.
> Well, ok,
> >under the hood, the names are a bit different, but one should
> not care about
> >the implementation details of Directory.
> Ok, you're right. I was a bit confused because the PakFileDir etc classes
> are so short, as the main part of the serialization is done by HashTable's
I think that's a bit of an overstatement. The longest method contained in
any of the classes used as the MISC parameter is 2 lines.

The framework for the serialization (and nothing else) is offered by
HashTable, aided a little bit by its MISC template parameter.

MISC functions as a no-cost (inlining) adapter layer between HashTable and
its contained elements.

> (which is BTW outdated according to its comment) AFAIS.
The MISC template parameter candidate which is in HashTable.h is strictly
there for documentation porpuses, showing what interface a valid candidate
for a MISC template parameter must have. I believe it says this.

The "real" misc'es are Directory::MiscDir and Directory::MiscFile.

> >Its certainly a good idea to give the possibility of appending
> to format 0
> >paks, as long as, as is the case, there is no performance hit
> what-so-ever.
> Right. But then adding to HashTable needs to be adjusted a bit, so that it
> doesn't require the HashToDyn () ; Add () ; DynToHash () op at *every* Add
> operation.
I'm thinking about adding AddEntry() and RemoveEntry() methods of HashTable.
I can make them marginally more effecient than the hash->dyn->hash cycle if
we are only talking about a very small number of entries. That, and (which
is the main reason) its more intuitive to look at when its used.

> Opening a Pak using HashTable, converting that to DynHashTable
> for work and converting back to HashTable for serialization only would be
> good IMHO. That would mean all searches are done via DynHashTable (fine,
as the
> unsorted overflow list won't be long in most cases - and when it
> is long it
> won't be searched usually) and HashTable only does serialization work.
The way to do this is to give DynHashTable and HashTable a pure virtual base
class, having only add/remove methods, aswell as an IsDynamic() method (more
effecient than RTTI, and with no drawback).

Directory would then store a pointer typed as a pointer to this base class,
so that Directory::Add(), Directory::RemoveXxx(), Directory::FindXxx() and
Directory::GetXxx() can not care about this change.

That will give a slight performance hit for all pack accesses, regardless of
type, though.

> >If the file content was written at once, you'd first off have the problem
> >that it would be quite a pain to cancel putting something in a pak. I can
> >think of atleast one situation in which it would be nice to have the
> >possibility of cancelling the inclusion of a file: a GUI pak
> builder where
> >you drag and drop files to the pak, and then lastly build it.
> Users of suchs
> >a program would certainly expect to be able to remove files
> again. For suchs
> >a program, there is also the matter of it being quite annoying
> setting off
> >harddisk-activity each time a drop of a file is made.
> That app needs to keep track about what files will be stored anyway, i.e.
> it has to have some internal representation of the target hierarchy.
Why would it duplicate this functionality, when PFile would be able to do
this for it?

In any case, it comes down to offering clients the freedom of cancellation
or not.

> [progress tracking] PFile will not be able to do this otherwise, as it
> won't know
> >how many files there are to go, or how large those files are. If it waits
> >for the umount, all information will be available, so it can
> accurately tell
> >how far it has already gone, and how much is left.
> ppfCopy () is designed to be able to copy entire directory hierarchies at
> once so it also has that info even when it immediately writes the file
> data.
What exactly do you mean? Do you mean like duplicating directory hierarchy
from the native file system?

If you do, then if the files that go in the pakfile by chance happen to not
be already in a hierarchy from the beginning, its completely impractical to
first build this hierarchy (inclusive copying files around, which is really
unnessecary), and then tell PFile to duplicate that hierarchy.

> Ok, it propably doesn't know how much directory info has to be
> written, but that's no issue. The point is that it's up to the
> user - if he
> wants to copy file-by-file, fine. If he wants PFile to assemble the entire
> thing at once, ok, it does that.
At an *insane* performance cost (if this is implemented the way I ask

> And it's much more intuitive if the copy operation actually happens when
> you call, well, the copy command. And that this copy command only succeeds
> if the copy operation succeeded.
The copied file will be there as soon as it is copied, and it will be there
the next time the pack is opened also. The pak is invalid while its being
built anyway, so there is absolutely no difference for the client in this
regard. I think that works pretty intuitively, atleast it does for me.