[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: Serialization et al



> Ok, now for my disclaimer/excuse: I got way too little sleep the past days
> and was, as a result, not "thinking normally". I overreacted. It sometimes
>
Don't worry about it.

> That plus adapting HashTable to be aware of what kind of file/stream it
> should write itself to. I guess moving the proper Misc classes to
> PakFileFile and giving Directory a protected virtual HashTable factory, so
> that PakFileDir can create the proper oject is the best way.
>
That would work. See below for a few comments on this, though.

> >In other words, you have perfect information on what is in the
> HashTable. I
> >completely fail to see what makes you believe that you *have* to call
> >HashTable::SerializeTo(), which is what you imply above (if you
> didn't mean
> >to do that, I fail to see what exactly the problem is).
>
> Well, no. A core assumption behind that serialization scheme was "the data
> structures should be able to serialize themselves" - and that's a good
> thing.
>
Well, yes, when it makes sense. It makes sense for HashTable to know how to
serialize itself in a format that is ideally suited to itself, without
caring too much about anything else.

It doens't make so much sense for it to know how to serialize itself so that
it meets the requirements of a multitude of different formats. That really
has nothing to do with any of its main responsibilities.

Actually, I originally planned that Directory should take care of
serialization of its HashTables, but then I realised that it would have to
have intimate information of how HashTable is implemented in order to do
that effeciently. I considered that a very bad thing, so I encapsulated this
functionality within HashTable.

HashTable is completely hidden from any client of Directory or its
derivatives. It is thus an implementation detail of Directory. This means
that if Directory knows how to serialize itself, all classes that are
exposed as part of the virtual filesystem handling know how to serialize
themselves. That's what I aimed at doing. HashTable only got its own
serialization functions because the task of most effeciently serializing
required knowing how HashTable works on the inside.

> So I don't believe that I *have* to call HashTable::SerializeTo (),
> but I beliebe that I *should* do it because that's the right way to
> organize things.
>
HashTable serialization code should be in HashTable. I agree on that.

However, I would also think that Zip-file serialization code should belong
in a Zip-file specific class, Tar-file serialization in a Tar-specific class
etc. That just makes more sense to me, but what you propose would work too.

> >> [4] Unnicety: This gives a data structure which can be correctly read
> >> again, but which is very complex (not complicated to read/write
> >> though) and thus hard to verify during debugging.
> >>
> >Ehh? What exactly do you mean? Like powering up your favorite editor and
> >watching the pak getting written?
>
> No. Creating a small pak, loading it into my favourite hex editor and
> looking whether it meets the spec. That's the only real way of ensuring
> correctness I am aware of.
> Of course you can let the writing and the reading code cross-check each
> other, but that only ensures that both have the same bugs...
>
Hmm... I hadn't actually thougth of doing that. When I verify my
serialization code, I usually just make sure every kind of data my program
can handle is currently in memory, tell my program to serialize that to
storage and end the program. I then start it again, and if it can correctly
load everything, I consider that a strong enough indication of my programs
ability to do correct serialization that I assume my serialization code is
ok. Only if I have some problems I really cannot understand, I migth be
tempted to look in the actual serialized file.

I do, however, think that what should count the most is how easy the format
is to handle programmatically, not manually. It is, afterall, a binary
format.

If you don't have too deep directory structures, I don't think its *that*
bad compared to the old scheme. You'd probably know that better than I,
though... ;)

> >And that's it. I believe I said that in 3 pretty sentences that
> weren't too
> >hard to comprehend. I don't find that a very good indication that this
> >format is "very complex".
>
> It's highly recursive (with first the dir headers being written
> recursively one after another an then dir entries being written in the
same
> order etc).
> It reminds me of purely functional LISP code - and that makes my mind jump
> in loops when I try to read and understand it.
>
Well, figuring out exactly how to best do this made my mind jump in loops
too :)

It's really hard to imagine how something like this will look in its
serialized from, because we have two distinct datatypes where one gets
expanded recursively and contains both more of its own kind, aswell as some
of the other kind. Atleast, it was kind of hard for me.

> >> Adding a
> >> SerializeContentTo () method to directories as well, which
> calls the same
> >> method for all of that dir's files & recursively for its subdirs
> >> is better.
> >>
> >It does the exact same thing, and you get call-stack that is
> alot deeper +
> >you cause very many invocations of functions you simply don't have to.
>
> If you think about it it's the same effort doing these extra calls as
> looping throught two iterators. But the "real" recursive way is easier to
> comprehend.
> And the call stack is at maximum 32 levels deep (the maximum depth of a
> directory hierarchy), which is nothing. And BTW you emulate the very same
> thing in your code.
>
It is my understanding that when a function calls itself recursively, it
gets its code loaded to the instruction-stack (what's the proper name?)
twice. Have I misunderstod this? Doesn't doing it this way save alot of
pushing/popping on the stack?

Anyway, I think this works quite well (even if those 10 lines might look a
bit arcane the first few seconds), so instead of polishing off code that
already works, I'd recommend getting on with more important matters.

> In the meantime, can you finish Win32.cpp (implement all functions
> declared in System.h . Unix.cpp should be a useable template) so that we
> can dump Win32POSIXImpl.* ?
>
Sure. I'll have a look at that.