[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PFile work



> >> I fixed the file/dir attribute stuff the basic write access to Paks. The
> >> serialization code is still untouched. Bjarke, can you have a look at it?
> >>
> >Sure. Simply a matter of having it do what you talk about below, and the
> >different section stuff (ie, offsets starting at 0 for each section) ?
> 
> Plus "unrolling" of the dir storage recursion if possible.
> 
?

> >> (1) Directory:
> >>
> >> "dir\0"
> >> <HashTable info (NrOfEntries, HashTableSize, ...)>
> >> Entry1
> >> Entry2
> >> Entry3
> >> ...
> >>
> >What about the additional data of Directory? (like mtime and ctime) Do
> >they go before or after? (I think before, but I really don't think it
> >matters)
> 
> In the corresponding dir entry. At least I thought that. But that leaves
> the toplevel dir without such info. Hmmmm, the attribs are not the problem
> (default values are fine), and ctime/mtime could be the ones in the Pak
> header. But that's bad with the current implementation.
>
To which implementation are you referring? And what is the problem?

> Perhaps some
> "freestanding dir entry"?
>
Actually, do we even need ctime/mtime for directories? I mean, I can't
imagine when I would use such information. And other attribs, well,
actually, I don't see any that are needed there either, except the
number of elements in the dir (which is handled by the container
contained by Directory).

That leaves the dir name.

> >> (2) Directory Entry:
> >>
> >> "dire"
> >> NameLength (1 Byte)
> >> Name (Max 255 chars, no trailing \0)
> >>
> >About that 255. What do we do about OSes that allow paths and names
> >longer than 255 characters? (are there any that does this? I don't know
> >the limit for Windows, but I do think it allows paths longer than 255)
> 
> Bad luck. Anyway, the data should be stored in the file, not in the file
> name ;)
>
But, well, what do we do? I mean, if a the player of a game for some
reason wants to store his game in this path:

"c:\games\man, this is a COOL game. Yeah. Man, I'm happy I bought this
game. Ohh, this name is getting loooooooong. Oh well, good for me my OS
supports it. I really hope that the game will support it too, or I might
have to return it for something better\"

Do we really want to stop him? Ok, perhaps the above name isn't that
realistic, but its not impossible to have paths longer than 255
characters. Anyways, its really a very easy thing to fix. Just use
unsigned shorts instead of unsigned chars. Not a priority right now, of
course.

> >We store path and filenames all over in one-byte-size variable. I think
> >we might want to consider using two bytes (ie, short) instead.
> 
> Not yet. And the stdio functions don't support wide chars anyway. Support
> for that is something for the "later" list.
>
Excuse my english... What I meant was storing the filenamesize in
unsigned shorts rather than unsigned chars.

> >> You might have to add an m_entryPos field to PakDirectory so that it can
> >> correctly write the dir position thing
> >>
> >Dir position? Directories are positioned in a recursive manner so that
> >their position relative to each other plus the number of subdirectories
> >of each directory make it clear what the position of each directory is.
> 
> See top of mail. You know I don't like that recursion <evilgrin>
> Eliminating it isn't difficult anyway.
> 
You do realise that not doing it in this way will have several bad
sideffects, while offering no benefits that any client of PFile will
ever know about?

Lemme see:

Do you know how slow a seek time harddisks have? We'll be doing 1000
repositions of the read-head if there are 1000 directories. Caching
helps here, but many harddisks don't have extremely large caches, and we
might easily hit beyound the capacity, and if we want to be serious,
PFile will have to scale well to large paks, and large paks will also
have large directory sections, which easily migth not fit in the cache.
(you want to use 8 bytes for offsets, so I must assume you expect PFile
to be handling paks that have one or more parts larger than 4 gigabytes)

We will have to make alot of calls to the seek, read and write functions
that are completely unnessecary.

The pak gets larger by 12 bytes for each directory (8 for link + 4 for
"dir\0"). This is not a great deal, but its completely unnessecary. One
thing is taking up extra space if it is needed, another thing completely
is unneded bloat.

We *add* to the complexity of the pak file format, since we are taking
what was previously just one part, and making it two.

We add to the complexity of the code that writes and reads pak files,
thus increasing the chance of bugs. We have to remember things like
updating the current direntry offset each and every time we write a dir,
which is not otherwise needed.

I really don't like this, especially since I view the act of manually
going through a large pak to verify its conformance with the
specification as somewhat paranoid (if the reading code can read all the
saved data without problems, and the size of the pak is exactly what it
should be if the specification is followed). For small paks, you don't
get any benefit for this, as thins are pretty easy to overlook.

I also don't feel terribly much like discussing it (I'd rather do some
coding on PFile! :), and I don't have any other arguments than the
above, so if you still really feel that this is a good idea, well, then
ok. I guess it isn't that important anyway.