[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Serialization et al



Bjarke Hammersholt Roune wrote:
>> I'm looking through the serialization code, and well, it just
>> doesn't work. Conceptually that is.
>>
>I can't say I fully agree.
>
>Some of the below responses might be interpreted "oh, you're stupid, you got
>everything wrong!". I really don't mean it like that, and if you get that
>feeling, sorry, that wasn't the intention.

Ok, now for my disclaimer/excuse: I got way too little sleep the past days
and was, as a result, not "thinking normally". I overreacted. It sometimes
happens (more likely in such little-sleep conditions etc) that I see
something I don't like and then simply conclude from that that everything
around that bit must be bad as well.
Anyway - you're right in many points.

>> [1] First problem - The Directory class writes data to a stream whose
>> format it cannot know. Right now the format of PakFiles is hardwired into
>> that.
>>
>Well, since the only kind of serialization that can happen right now is to a
>PakFile, it *can* know the format. I found this most appropiate as I found
>it to be the simplest way to do it.
>
>If other kinds of format is needed in the future, the changes that need to
>be done are really very minor (or it can be done now, if you wish):

I'll do it now, even if it's only to please my sense of aesthetics ;)

>-Make all serialization-realted methods of Directory pure virtual. ~20
>seconds
>
>-Move the implementation of these to PakFileDir. 1-2 minutes, perhaps 5, if
>some unforseen problem presents itself.

That plus adapting HashTable to be aware of what kind of file/stream it
should write itself to. I guess moving the proper Misc classes to
PakFileFile and giving Directory a protected virtual HashTable factory, so
that PakFileDir can create the proper oject is the best way.

>> [2] Same problem as [1], but now for HashTable instead of Directory
>>
>The reason serialization is done through the HashTable class is that the
>specific way HashTable stores entries so as to be able to load them again as
>effeciently as possible, is a service offered by HashTable, and the way it
>does this is an implementation detail of HashTable.

[...]

>Now, HashTable *certainly* can know what it should write to the format:
>anything it feels like, and that it can load again. This is all a
>implementation detail of HashTable. The format *has* to be this way to
>enable HashTable to be as effecient as possible.

Right. More or less at least. The problem I have with it is that in this
case some algorithm / internal data structure defines how the data i a
PakFile is layed out. Which is IMO a bad thing - a file format should be
formally specified, and the code etc has to meet that specification (of
course the spec should be optimized for the data structures etc).
Currently we have this case:
  "Ok, let's examine how the algo writes the data and create a spec out of
   our findings"
But it should be:
  "Let's see what kind of data structure we use and what optimizations are
   possible and then we construct both a spec and a concrete data structure
   matching each others' needs"


Anyway, that's not too difficult to change.

>In other words, you have perfect information on what is in the HashTable. I
>completely fail to see what makes you believe that you *have* to call
>HashTable::SerializeTo(), which is what you imply above (if you didn't mean
>to do that, I fail to see what exactly the problem is).

Well, no. A core assumption behind that serialization scheme was "the data
structures should be able to serialize themselves" - and that's a good
thing. So I don't believe that I *have* to call HashTable::SerializeTo (),
but I beliebe that I *should* do it because that's the right way to
organize things.

>> [3] Unnicety: MISC::SerializeTo () only calls SerializeHeaderTo
>> () although
>> its name suggests otherwise
>>
>When HashTable calls MISC::SerializeTo(), what it expects that method to do
>is to serialize whatever should be serialized by HashTable. In this case, th

Ok, right.

>> [4] Unnicety: This gives a data structure which can be correctly read
>> again, but which is very complex (not complicated to read/write
>> though) and thus hard to verify during debugging.
>>
>Ehh? What exactly do you mean? Like powering up your favorite editor and
>watching the pak getting written?

No. Creating a small pak, loading it into my favourite hex editor and
looking whether it meets the spec. That's the only real way of ensuring
correctness I am aware of.
Of course you can let the writing and the reading code cross-check each
other, but that only ensures that both have the same bugs...

>And that's it. I believe I said that in 3 pretty sentences that weren't too
>hard to comprehend. I don't find that a very good indication that this
>format is "very complex".

It's highly recursive (with first the dir headers being written recursively
one after another an then dir entries being written in the same order etc).
It reminds me of purely functional LISP code - and that makes my mind jump
in loops when I try to read and understand it.

>> Adding a
>> SerializeContentTo () method to directories as well, which calls the same
>> method for all of that dir's files & recursively for its subdirs
>> is better.
>>
>It does the exact same thing, and you get call-stack that is alot deeper +
>you cause very many invocations of functions you simply don't have to.

If you think about it it's the same effort doing these extra calls as
looping throught two iterators. But the "real" recursive way is easier to
comprehend.
And the call stack is at maximum 32 levels deep (the maximum depth of a
directory hierarchy), which is nothing. And BTW you emulate the very same
thing in your code.

>> I haven't completely worked out a better system (preserving as many of the
>> benefits of the current one as possible) yet, but I'm sure I'll very soon
>> have.
>>
>I believe my solutions to the issues you raise should suffice.
>
>You probably are rigth that I could've done these things before sending you
>the code. But I guess I just figured "this works, so that should be ok for
>now", and cut a little on the seeming architectural soundness of the design.
>
>I can do the things I mention here if you want...

No, I'll do it. It fits in very well:
(1) I have a great desire doing PFile coding recently
(2) I want to get familiar with your code
(3) I'm looking through some other changes related to that (see the other
    mail)

In the meantime, can you finish Win32.cpp (implement all functions
declared in System.h . Unix.cpp should be a useable template) so that we
can dump Win32POSIXImpl.* ?

	Christian
-- 

Drive A: not responding...Formatting C: instead