[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SV: PFile notes

To: <penguinplay@sunsite.auc.dk>
Subject: SV: PFile notes
From: "Bjarke Hammersholt Roune" <asmild@post6.tele.dk>
Date: Thu, 28 Oct 1999 20:48:03 +0200
Reply-To: penguinplay@sunsite.auc.dk
> >> Not impossible, but harder.
> >> If the target file system is not mounted at copying time but has to be
> >> treated specially ppfCopy would be much more complex.
> >>
> >Why wouldn't it be mounted?
>
> Ah, here's the misconception. If ppfCreatePak () didn't create an empty
> PakFile, ppfMount () would either (1) have to take all the Pak header
> parameters and construct the proper PakFile object itself (bad and most
> likely impossible) or (2) The PakFile object created by ppfCreatePak ()
> would have to be stored somewhere in FileGlobalData or so,
> creating another
> special case for mounting.
>
> Actually writing the empty Pak to disk and then mounting that as usual is
> actually the simplest way.
>
This is a good point. Actually, its so good that any arguments to the
contrary (I can think of rigth now, anyway) aren't strong enough to make it
up for this. Little at a time it is.

> >> Does it have to be invalid while it is being built? When
> storing all dir
> >> info at the end (both in terms of location and time) there's no
> >> more reason
> >> for this.
> >>
In any case, there isn't that much reason to not make it invalid when
building it, as its already mounted, so the program will be fine.

If its a pakfile that will be opened by several processes at different
times, we could have a method that wrote the pak filesystem architecture
data (like umount does now).

> >Well... it doesn't *have* to be invalid.
> >
> >You do realise you'll have to move (ie, serialize) the complete directory
> >and file structure to the end of the pak each and every time you
> add a file
> >to the pak? (I sense we aren't talking about the same thing here)
>
> We are I think. And actually it's perfectly fine. Just look at some
> situations:
>
> (1) Creating the (big?) Pak once and after that accessing it only
> readonly.
>     Here behavior is exactly the same as for Format 0 Paks.
>
Except for the building the pak part. If its a 1000 files pak, the
filestrucutre serialization code will be run a 1000 times... Is this really
what you mean? This would have performance of O(n-square), as there will be
a larger and larger dir-header to move/serialize each time.

> (2) Creating a (small) Pak file by file, and eventually adding
> single files
>     later.
>     Behavior almost the same as for Format1, except that writing is a
>     little slower (but not much as the Pak is small anyway) and reading is
>     a little faster.
> (3) Doing some mix of the above
>     Now that couldn't be done previously - F0 Paks were always static,
>     there was no chance of fixing one later on. Performance is
> bad for very
>     large Paks, but not as bas as reconstructing the entire Pak to
>     incorporate that small fix.
>
I agree with this, but I think its best to let the client decide when it
thinks its nessecary to have the pakfile be valid, and when it doesn't
matter. IMHO we get the best of both worlds that way.

> I want to have PakFiles layouted like that:
>
>
> ----------------------------------------
> |             Header
> ----------------------------------------
> |
> |
> |
> |           Content (File data)
> |
> |
> |
> -----------------------------------------
> |
> |        DirInfo (directory data)
> |
> -----------------------------------------
> |             Extended Header
> -----------------------------------------
>
That looks good, except I think the x-part should be before the dir-part, as
we *know* that the dir-part will grow when adding to the pak, while the
x-part might not.

In situations where the x-part does not grow, but just contains static-size
information, we save rewriting the x-header.

> In other words, a Pak is divided into several parts, each more or less
> independent of the others. Adressing is relative to these parts, i.e. you
> access, say, Byte 0x1234 in the "Content" part or Position 0x040 in the
> DirInfo one. That makes it simple for all parts to grow independently of
> each other. The additional overhead is neglible (unless of course when
> reading files one byte at a time...)
> It also lends itself to a better distribution of responsibilities between
> the classes (PakFile doing all actual writing/reading etc).
>
I agree the overhead would be quite infinitesimal.

I don't see how it helps, though. Offsets into the content-part are very
easy to compute (and impose no overhead), as the header-part has static
size. The dir- and x-parts only need a single link from the header-part,
which they would need anyway.

Atleast for dir, as everything here is done by the relative positions of the
files and directories, and nothing is done by absolute-position-links (ie,
jump to offset ###).

The only position-based links that need to be updated when the pak files are
in the dir-part, and possible the x-part. These parts get rewritten when the
pak grows anyway, so this is no problem.

Hmm... Wouldn't this also make it harder to go thorugh the pakfile in a
hex-editor? ;)
Prev by Date: SV: Serialization et al
Next by Date: SV: HashTable specialization
Prev by thread: SV: PFile notes
Next by thread: SV: Some notes
Index(es):
- Date
- Thread