[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PFile notes

Bjarke Hammersholt Roune wrote:

>> >What is the problem that makes it impossible for ppfCopy() to be generic
>> >otherwise?
>> Not impossible, but harder.
>> If the target file system is not mounted at copying time but has to be
>> treated specially ppfCopy would be much more complex.
>Why wouldn't it be mounted?

Ah, here's the misconception. If ppfCreatePak () didn't create an empty
PakFile, ppfMount () would either (1) have to take all the Pak header
parameters and construct the proper PakFile object itself (bad and most
likely impossible) or (2) The PakFile object created by ppfCreatePak ()
would have to be stored somewhere in FileGlobalData or so, creating another
special case for mounting.

Actually writing the empty Pak to disk and then mounting that as usual is
actually the simplest way.

>> Well, take some Pak with n dirs. The XHeader uses, say, 16 bytes per dir
>> (position and length of the dir info for easier memory mapping). Now you
>> add a dir - and both the data in the Pak grows and the XHeader grows. If
>> it's at the start of the Pak then you lost. If it's at the end you lost
>> (ok, you can always make sure it's at the very end and always read the Pak
>> from its end backwards until you reach the XHeader ID to determine its
>> starting pos). If it's somewhere in the middle you lost.
>> In other words - it has to be relocateable.
>Well, if you want to use it like that, you can put it after the directory
>and file header data, and relocate it along with the header when the pak
>grows. That also guanrantees that you don't have any unused gaps in the pak,
>which might otherwise be created.

Yup. See the end of this mail

>> I can't completely follow you, but ok... The system has to be able to
>> handle the following scenarios nicely:
>> (1) Creating a dir from a static source with optimizations (e.g.
>> an existing
>> PakFile containing the hash keys of the entries, hashtable size etc) and
>> eventually adding 1-2 directories later (some FS mounted below etc)
>> (2) Creating a dir structure file by file (e.g. when creating a Pak or
>> reading an ftp dir) with only very rare read accesses to that struct, and
>> afterwards optimizing the hashtable for read accesses and eventually
>> serialization.
>That would be handled fine with the very simple to implement solution I
>outlined above.

Then it's ok.

>> Because it somehow has to display what's stored in the Pak? At least it
>> *should* have this feature ... ;)
>> Do you think WinZip would be as common if it wouldn't offer the
>> possibility
>> to see what is stored in the archive? ;)
>Why wouldn't it just read the PFile structure when finding out what to
>display? Why duplicate all of this?

You mean it should do

UserDraggedSomethingToPak (src, dest)
   ppfcopy (src, dest);

UpdateTargetDisplay ()
   DisplayComponent.Clear ();
   for i in (Iterate through Pak via ppfReadDir () etc )
       DisplayComponent.add (i);

Finish ()
    ppfUmount (ThePak, 0);

*That* would be a waste of resources, compared to the usual way:

UserDraggedSomethingToPak (src, dest)
   Dest = MkTargetPath (src, dest);
   TargetList.Add (Dest);
   SrcList.Add (src);
   DisplayComponent.Add (Dest);

Finish ()
   for i in (Iterate through SrcList)
      ppfCopy (SrcList [i], TargetList [i], 0);

   ppfUmount (ThePak, 0);

>> ppfCopy ("/usr", "/mnt/usrpak", ppfGF_recursive);
>> will copy the directory "/usr" and everything below it,
>> recursively, to the
>> directory "/mnt/usrpak" and then returns.
>Yes, but that requires that /usr contains the exact directory and file
>structure you want.

Ok, right. But even if the user (client) copies file-after-file, the time
spent in user code during this is still extremely small compared to the
time spent in PFile code. So the difference in terms of "corruption safety"
is minimal.

>> Does it have to be invalid while it is being built? When storing all dir
>> info at the end (both in terms of location and time) there's no
>> more reason
>> for this.
>Well... it doesn't *have* to be invalid.
>You do realise you'll have to move (ie, serialize) the complete directory
>and file structure to the end of the pak each and every time you add a file
>to the pak? (I sense we aren't talking about the same thing here)

We are I think. And actually it's perfectly fine. Just look at some

(1) Creating the (big?) Pak once and after that accessing it only readonly.
    Here behavior is exactly the same as for Format 0 Paks.
(2) Creating a (small) Pak file by file, and eventually adding single files
    Behavior almost the same as for Format1, except that writing is a
    little slower (but not much as the Pak is small anyway) and reading is
    a little faster.
(3) Doing some mix of the above
    Now that couldn't be done previously - F0 Paks were always static,
    there was no chance of fixing one later on. Performance is bad for very
    large Paks, but not as bas as reconstructing the entire Pak to
    incorporate that small fix.

BTW - actually there also isn't a reason to disallow *modification* of
files in the Pak (no matter where they are and how many are opened at the
same time). Just an attempt to grow them will fail (except for the
physically last one).

I want to have PakFiles layouted like that:

|             Header
|           Content (File data)
|        DirInfo (directory data)
|             Extended Header

In other words, a Pak is divided into several parts, each more or less
independent of the others. Adressing is relative to these parts, i.e. you
access, say, Byte 0x1234 in the "Content" part or Position 0x040 in the
DirInfo one. That makes it simple for all parts to grow independently of
each other. The additional overhead is neglible (unless of course when
reading files one byte at a time...)
It also lends itself to a better distribution of responsibilities between
the classes (PakFile doing all actual writing/reading etc).

I'll implement that the next days.


Drive A: not responding...Formatting C: instead