[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: Serialization et al



> I'm looking through the serialization code, and well, it just
> doesn't work. Conceptually that is.
>
I can't say I fully agree.

Some of the below responses might be interpreted "oh, you're stupid, you got
everything wrong!". I really don't mean it like that, and if you get that
feeling, sorry, that wasn't the intention.

> Serialization to somewhere right now works like that (the numbers
> in the []
> brackets are "links" to the problem descriptions below):
>
> PakFile::Umount calls SerializeTo (PakFileHandle) on its root dir.
> This calls first SerializeHeaderTo (stream),
> which writes the dir header to the stream [1]
> and calls SerializeTo (stream) on its contained dir and file HashTables
> These first write entry count & table size to the stream [2]
> and then calls MISC::SerializeTo (entry, stream) for all its entries,
> which calls entry->SerializeHeaderTo (stream) [3]
> That way all directory/file header data is written recursively [4] [5]
> Then the toplevel dir iterates through its entire hierarchy, [6]
> calling SerializeContentTo () on all files, which writes the file data.
>
>
> [1] First problem - The Directory class writes data to a stream whose
> format it cannot know. Right now the format of PakFiles is hardwired into
> that.
>
Well, since the only kind of serialization that can happen right now is to a
PakFile, it *can* know the format. I found this most appropiate as I found
it to be the simplest way to do it.

If other kinds of format is needed in the future, the changes that need to
be done are really very minor (or it can be done now, if you wish):

-Make all serialization-realted methods of Directory pure virtual. ~20
seconds

-Move the implementation of these to PakFileDir. 1-2 minutes, perhaps 5, if
some unforseen problem presents itself.

I believe that's it.

> [2] Same problem as [1], but now for HashTable instead of Directory
>
The reason serialization is done through the HashTable class is that the
specific way HashTable stores entries so as to be able to load them again as
effeciently as possible, is a service offered by HashTable, and the way it
does this is an implementation detail of HashTable.

Moving the internals of HashTable anywhere put in HashTable would be
inappropiate. Ie, the serialization of a HashTable has to be done in the
HashTable (you haven't actually said anything about this, but I feel its a
subpoint)

Now, HashTable *certainly* can know what it should write to the format:
anything it feels like, and that it can load again. This is all a
implementation detail of HashTable. The format *has* to be this way to
enable HashTable to be as effecient as possible.

Client: Doctor, it hurts when I do this
Doctor: Don't do it, then

If you don't want HashTable to dictate the format, don't call
HashTable::SerializeTo(). HashTable has iterators, so you can access all of
its entries. You can get the amount of entries and the loadfactor (if you
need that) from it via perfectly standrad accessor methods.

In other words, you have perfect information on what is in the HashTable. I
completely fail to see what makes you believe that you *have* to call
HashTable::SerializeTo(), which is what you imply above (if you didn't mean
to do that, I fail to see what exactly the problem is).

How HashTable wants to serialize itself is its buisness and should be
encapsulated into HashTable, which it correctly is. If you don't want a
format that is HashTable aware (in a certain situation, like serializing a
zip or tar file), and optimizes for HashTable, without too many other
consearns, well, then what you want needs no internal data of HashTable, and
can be placed in the code calling HashTable::SerializeXxx(), where it should
be (as its specific to something non-HashTable).

> [3] Unnicety: MISC::SerializeTo () only calls SerializeHeaderTo
> () although
> its name suggests otherwise
>
When HashTable calls MISC::SerializeTo(), what it expects that method to do
is to serialize whatever should be serialized by HashTable. In this case, th
e only thing HashTable should serialize is the headers of its entries, so
that's what MISC::SerializeTo() does.

I don't see how that is inappropiate.

> [4] Unnicety: This gives a data structure which can be correctly read
> again, but which is very complex (not complicated to read/write
> though) and thus hard to verify during debugging.
>
Ehh? What exactly do you mean? Like powering up your favorite editor and
watching the pak getting written?

In any case, if you trust the framwork works, then you can put breakpoints
in the serialization code that you are testing. If you don't trust the
framwork, then test that first, then test the other serialization code
afterwards.

I think its very, very simple. First, directories a read. If a parent
directory has two sub-dirs, then the two next directories read from the
serialization code of that parent directory will be those two childs.

Then, files are read. If the top-most directory has 20 files in it, well,
then the first 20 files will be the files of that directory. The same goes
iteratively for its sub-dirs.

And that's it. I believe I said that in 3 pretty sentences that weren't too
hard to comprehend. I don't find that a very good indication that this
format is "very complex".

> It is also *very* fault intolerant
> (doesn't have any hooks for checking "I'm at the right place", "I'm
> really reading directory data, not just some random file contents" etc).
>
That can be added in ~5 minutes. Write "dir\0" to
PakFileDir::SerializeHeaderTo(), and verify that this is read in
PakFileDir::SerializeHeaderFrom(). Do the same for PakFileFile.

I really only thougth of getting this to work in a correct way. Stuff like
this is really very unimportant to the serialization scheme concept working,
and so I didn't bother with it. I did make sure it would be easy to fix
this, though.

> [5] Again a problem with format hardwiring. This process as it stands is
> simply impossible to adapt for writing/reading different formats (remember
> the ZIP file example). So it might be ok for now, but surely will give us
> headaches later.
>
This is very easy to "fix". Make Directory::SerializeTo() and
Directory::SerializeFrom() virtual and implement Zip-file specific directory
and file classes.

The system works now, and it will in the future with very minor and few
modifications. That doesn't spell headache to me.

> [6] I consider that bad design. On the one hand forwarding
> responsibilities
> recursively (as for the headers) and on the other hand iterating over all
> entries and calling methods om them is confusing at least.
>
I recognised this, and added a comment that says that an array is used as a
stack to emulate recursion with iteration. I think that makes it pretty
clear what is going on.

> Adding a
> SerializeContentTo () method to directories as well, which calls the same
> method for all of that dir's files & recursively for its subdirs
> is better.
>
It does the exact same thing, and you get call-stack that is alot deeper +
you cause very many invocations of functions you simply don't have to.

>  Sorry for taking your work apart in this way, Bjarke. But I think it's
> neccessary.
>
Oh, that's completely ok. If you feel something is wrong, it'd be far worse
to just keep quit about it, and then deal with it later, when it migth be a
serious problem to fix.

If there are anything else wrong, or my solutions aren't good enough, please
don't hesitate to tell me.

> I haven't completely worked out a better system (preserving as many of the
> benefits of the current one as possible) yet, but I'm sure I'll very soon
> have.
>
I believe my solutions to the issues you raise should suffice.

You probably are rigth that I could've done these things before sending you
the code. But I guess I just figured "this works, so that should be ok for
now", and cut a little on the seeming architectural soundness of the design.

I can do the things I mention here if you want...