[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Funny situation (was: Re: Serialization et al)



Bjarke Hammersholt Roune wrote:

> Writing from my new Netscape E-mail reader. First impressions is that I
> like this *much* better than Microsoft Outlook. Thanks goes to the guy
> who told me to switch (sorry, can't remember the name).

Well, that's me! And it was threaded properly! Yay!

> Above, I state that HashTable should only know how to serialize itself
> in a format that makes sense relative to itself. To me, that is pretty
> irrefutable.
> 
> However, your idea is/was to make derivatives of HashTable. Thus,
> HashTable still only knows how to serialize itself in a way that makes
> sense relative to itself, but its derivative class ZipHashTable will
> also be able to serialize Zip files. The reason this is IMHO still
> appripiate is that the Zip format makes sense relative to ZipHashTable.
> 
> ZipHashTable might be implemented in a radically different way than
> HashTable (that way, it would probably be better to have a pure virtual
> HashTable base class, and make PakHashTable, ZipHashTable, TarHashTable
> etc.), if a different internal representation of the data migth make it
> more effecient when dealing with Zip files.

What the hell is this??? If a ZipFile (similar to PakFile, inherited
from some common class) is made, it has no need to even *use* a hash
table in its implementation!!! The way I see this, there is a
FooFile::SerializeTo method that just stores the content of the FooFile
to a stream. If the format of FooFile is simply recursively serializing
the objects that makes up FooFile, then so be it (calling
HashTable::SerializeTo if it uses that class in its implementation).
Maybe the format of FooFile is all done within a few methods of the
FooFile objects and just walking the directory tree nodes to gather the
information, then so be it.

If the internal implementation has to reflect one to one the internal
implementation of the archive itself, you're definitely gonna freak out
at some point. Just consider a zip file versus a tar.gz file. The first
has each file compressed by itself, and the second has the exact
reverse, compression is applied to the whole archive. If you force those
two to share the same internal implementation, you will only get
yourself in a lunatic asylum.

Isn't it obvious? Doesn't the words "encapsulation", "hidden
implementation" and "black box" ring a bell?

> > Well, yes. Actually it should be a compromise of both ;)
> > A format that's very simple to write is fine, but quite problematic if you
> > can't tell how it actually looks like. Imagine some image manipulation
> > software say: "Well, the image files are written by this code, and that
> > one can read them again. At least for the cases we tested. You want to
> > write a viewer for them? Uh, you could reverse-engineer it..." ;)
> 
> He he :)
> 
> What I meant was that IMHO its most important that it is both easy to
> program and explain how to program the serialization code. I believe
> that is the case for the current format.
> 
> Now, of course, you are correct that it shouldn't be so insanely
> difficult to verify the product of the format (ie, the actual pak files)
> manually that it basically becomes impossible.

There are actually where this is the case and everybody is perfectly
happy. Think of the NFS protocol. It isn't defined in term of packet
types and formats. It is defined in term of an IDL interface! The thing
assuring compatibility isn't NFS itself, but the Sun RPC layer
underneath it. Parallel the RPC layer to the serialization process, and
the NFS layer to the Pak classes.

> > Wait, <searching> Ok, here is an example of the LISP code I was thinking
> > of:
> >
> > (defun fak1 (n)
> >    (fak_iter 1 1 n)
> > )
> >
> > (defun fak_iter (product counter n)
> >    (if (> counter n) product
> >        (fak_iter (* counter product) (1+ counter) n)
> >    )
> > )

Isn't that also recursive? I could be more easily unrolled into an
iterative version, and I don't know if LISP compilers actually do that,
but I do know that a C/C++ won't go that far and it will stay recursive.

> > It calculates the faculty of n
> > Our Prof presented this as example of an "iterative" process (meaning that
> > a compiler can optimize it to an iterative process) ;)
> > The recursive version is:
> >
> > (defun fak (n)
> >    (if (= n 1) 1
> >        (* n (fak (1- n)))
> >    )
> > )
> >
> > Just as little anecdote...
> 
> I've just desided I don't like LISP :)
> 
> C++ is so superior to that.

Youngster, you're showing your age and experience! What you don't
understand, you are condemn to reinvent, poorly (paraphrasing somebody I
do not remember).

Lambda functions are like drug, when you start using them, you can't
stop and C++ will probably give you some withdrawal symptoms (nausea and
vomiting, usually). ;-)

The iterative way is more optimal (use a more clearly defined amount of
resources and can be unrolled/vectorized even by the dumb C/C++
compilers that plagues us) and is better in many cases, like in threaded
programs where the stack might be limited. But as you noticed, the
recursive way is easier to program and understand (but this is not
always the case). I would expect an optimized library to go out of its
way and code critical algorithms in the iterative style.

Mozilla for example, is using threads and had a problem where the HTML
parser had a *lot* of local variables and they were recursing a lot and
simply ate the stack alive, smashing it and crashing the program. They
had to rewrite some of the critical methods in an iterative style
instead so that it could handle arbitrarily complex parsing patterns.

> Ahh, thank you very much for explaining this out for me. I agree, the
> code I made would have been much better had it been done with recursion.
> We should change it sometime we haven't got anything better to do (if we
> choose to implement the solution I detail in the end of this E-mail,
> doing it along with that would make sense).

Keep it iterative.

About the rest of the e-mail, I just can't believe the amount of
thinking you are putting into a so well-researched problem as
serialization. For a very basic and totally sufficient example of
serialization applied to a kind of archive file, look at the StrList
class in Turbo Vision, which you can readily find on the metalab.unc.edu
site.

Your Pak file example is only a slightly more advanced form of that.
StrList only let you get a bunch of bytes by using an integer number,
where your system has directories and names. But the workings are the
same, its only a more sophisticated index/directory.

-- 
Pierre Phaneuf
http://ludusdesign.com/