[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PFile notes



Bjarke Hammersholt Roune wrote:

>> >> (2) The user mounts the new PakFile with the ppfMF_write flag.
>> >> The code for
>> >> this creates a new PakFile object using the first constructor, which
>> >> recognizes the Pak as a new one, allowing the adding of files.
>> >>
>> >Hmm... Is there a good reason for doing it in this way?
>>
>> Yes. That way ppfCopy can be a generic function, just copying files from A
>> to B.
>>
>What is the problem that makes it impossible for ppfCopy() to be generic
>otherwise?

Not impossible, but harder.
If the target file system is not mounted at copying time but has to be
treated specially ppfCopy would be much more complex.

But I guess we misunderstood each other again... ;)

>> >If the pakfile construction process is interrupted for some
>> reason, there'll
>> >be a useless empty pak file lying around.
>>
>> If it is interrupted by the user, then, well, it's his own fault. Perhaps
>> it's even what he wanted.
>>
>Actually, its not empty, its invalid (or becomes so the moment a file is
>written), which cannot be usefull for anybody.

Unless the person who initiated the copying process decides that it's
taking too long because he now really has to leave for his date and doesn't
want to have his computer running while he's away.
The copying process should be abortable at any point.

>Actually, the program itself should make sure to delete any unneded file if
>the user quits, so its the programs fault, unless you meant something like
>killing the process.

For me "user" == user of LibPP, i.e. programmer of a game. So I guess we
agree here.

>> And if it's interrupted by some external force (the game crashing /...)
>> then it doesn't matter whether it is interrupted in a ppfCopy () run
>> copying a single file or in the syncronization code. The result is exactly
>> the same.
>>
>Ehh, serialization code... :)

<searching for nearest little hole...>

>Well, yes, if, for some reason, the serialization process is interrupted in
>some "violent" way, then you'll have the same result.
>
>However, in writing the pak file a little at a time, the amount of time in
>which the program can be subjected to some unexpected, fatal
>internal/external state/event, is streched over a timespan that can be as
>short as less than a nanosecond, or as long as a couple of billion years or
>longer.

If the user (game programmer) copy file-per-file and do some processing
between each copy, fine. If he wants to copy everything in one run, fine.
It's his choice.

>I was referring to the empty pak that gets written.
>
>Actually, it can be much, much more than the relatively small header, if
>some files have been added to the pak, and then something goes wrong.
>Mininum 100 bytes, maximum unbounded.

You sound as if PakFile creation utilities would encode some mp3s after
each file they copy.
If the user wants to create the entire Pak in one "quick" op he can do that
- the time spent outside of PFile during Pak construction is about 10-20 CPU
cycles in that case (the calling of ppfUmount ()). Don't tell me that
raises the likeliness of a corrupted Pak noticeably!

>What's worse is that having an invalid pakfile migth cause the program to
>crash at a future point. That's a very nasty thing for program developers to
>fix, as they'll probably very much will have a problem finding out what the
>problem is. Of course, if you're a smart developer, you'll have forseen this
>problem and done something about it, but it *is* a quite easy thing to
>forget.
>
>A program migth also do something like this:
>
>"Datafile currupt - cannot continue"
>
>Which is quite undesireable.

As undesirable as a "file not found - cannot continue" and about as likely.

>> >utilised and no split-second-life-time PakFile objects are created.
>>
>> Ok, the PakFile object create-and-destroy thing isn't nice, but apart from
>> that it absolutely doesn't matter.
>>
>If we eventually do go ahead with this waying of doing it, then I think we
>migth want to consider having a PakFile::CreateEmptyPak() function, which
>doens't create split-second-life-time PakFile objects.

Ok, if you mind about the extra few cycles...

>> Eventually the XHeader can grow with the PakFile in later versions, so it
>> shouldn't be at a fixed position.
>>
>How do you get from the premise that "Eventually the XHeader can grow with
>the PakFile in later versions" to the conclusion that "it shouldn't be at a
>fixed position" ?

Well, take some Pak with n dirs. The XHeader uses, say, 16 bytes per dir
(position and length of the dir info for easier memory mapping). Now you
add a dir - and both the data in the Pak grows and the XHeader grows. If
it's at the start of the Pak then you lost. If it's at the end you lost
(ok, you can always make sure it's at the very end and always read the Pak
from its end backwards until you reach the XHeader ID to determine its
starting pos). If it's somewhere in the middle you lost.
In other words - it has to be relocateable.

>> What I meant is that the generic Directory class has a PakFile specific
>> function (SetSerializingPakFile ()).
>>
>Well, yes, but that's only when Directory is serializing *from* a *pak*. I
>don't see how that is bad. Directory is reponsible for loading a directory
>hierachy from a pak file. It needs to know what pakfile to tell each loaded

No. Directory is responsible for maintaining a collection of files and/or
subdirectories. A Directory is in no way related to a PakFile. Some of the
classes derived from it are, others are not.

>PakFileDir it belongs to.

>> Ok, you're right. I was a bit confused because the PakFileDir etc classes
>> are so short, as the main part of the serialization is done by HashTable's
>> MISC
>>
>I think that's a bit of an overstatement. The longest method contained in
>any of the classes used as the MISC parameter is 2 lines.
>
>The framework for the serialization (and nothing else) is offered by
>HashTable, aided a little bit by its MISC template parameter.
>
>MISC functions as a no-cost (inlining) adapter layer between HashTable and
>its contained elements.

Ok, right. The main writing code is in Directory and HashTable. Which
explains that I didn't find it for so long, because it shoudn't be there.

The writing code is hardwired into two rather generic classes, and thus
both are bound to PakFiles. Now say we wanted to serialize to a ZIP file,
or to a CD image. (The ZIP file thing at least isn't so far off - support
for at least reading these things is something I want for later versions).
Adding that would require rewriting very much code.

Writing to PakFiles should be *only* done by PakFile specific
classes/functions. For serialization that's PakFileDir / PakFileFile and
PakFile (The PakFileW* classes should be removed when all PakFiles support
arbitrary adding of files).

>> (which is BTW outdated according to its comment) AFAIS.
>>
>The MISC template parameter candidate which is in HashTable.h is strictly
>there for documentation porpuses, showing what interface a valid candidate
>for a MISC template parameter must have. I believe it says this.

Yup, but it also says it's outdated. I don't know what you meant with that
comment though... ;)

>> Right. But then adding to HashTable needs to be adjusted a bit, so that it
>> doesn't require the HashToDyn () ; Add () ; DynToHash () op at *every* Add
>> operation.
>>
>I'm thinking about adding AddEntry() and RemoveEntry() methods of HashTable.
>I can make them marginally more effecient than the hash->dyn->hash cycle if
>we are only talking about a very small number of entries. That, and (which
>is the main reason) its more intuitive to look at when its used.

[more explanations]

>That will give a slight performance hit for all pack accesses, regardless of
>type, though.

I can't completely follow you, but ok... The system has to be able to
handle the following scenarios nicely:

(1) Creating a dir from a static source with optimizations (e.g. an existing
PakFile containing the hash keys of the entries, hashtable size etc) and
eventually adding 1-2 directories later (some FS mounted below etc)

(2) Creating a dir structure file by file (e.g. when creating a Pak or
reading an ftp dir) with only very rare read accesses to that struct, and
afterwards optimizing the hashtable for read accesses and eventually
serialization.

>> That app needs to keep track about what files will be stored anyway, i.e.
>> it has to have some internal representation of the target hierarchy.
>>
>Why would it duplicate this functionality, when PFile would be able to do
>this for it?

Because it somehow has to display what's stored in the Pak? At least it
*should* have this feature ... ;)
Do you think WinZip would be as common if it wouldn't offer the possibility
to see what is stored in the archive? ;)

>In any case, it comes down to offering clients the freedom of cancellation
>or not.

Sure. That's also my opinion.

>> ppfCopy () is designed to be able to copy entire directory hierarchies at
>> once so it also has that info even when it immediately writes the file
>> data.
>>
>What exactly do you mean? Do you mean like duplicating directory hierarchy
>from the native file system?

from anywhere to anywhere.
Example: 

ppfCopy ("/usr", "/mnt/usrpak", ppfGF_recursive);

will copy the directory "/usr" and everything below it, recursively, to the
directory "/mnt/usrpak" and then returns.

>If you do, then if the files that go in the pakfile by chance happen to not
>be already in a hierarchy from the beginning, its completely impractical to
>first build this hierarchy (inclusive copying files around, which is really
>unnessecary), and then tell PFile to duplicate that hierarchy.

Huh??? What are you speaking about?

>> And it's much more intuitive if the copy operation actually happens when
>> you call, well, the copy command. And that this copy command only succeeds
>> if the copy operation succeeded.
>>
>The copied file will be there as soon as it is copied, and it will be there

ppfCopy ("some/weird/file", "/mnt/newpak", 0);
ppFILE *WFile = ppfOpen ("/mnt/newpak/file", "rb");

Sure?

>the next time the pack is opened also. The pak is invalid while its being
>built anyway, so there is absolutely no difference for the client in this

Does it have to be invalid while it is being built? When storing all dir
info at the end (both in terms of location and time) there's no more reason
for this.


	Christian

PS: I'll rename ppFILE to ppfFILE for consistency sake (even though it
sounds ugly ;)
-- 

Drive A: not responding...Formatting C: instead