[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Funny situation (was: Re: Serialization et al)



Bjarke Hammersholt Roune wrote:

> I agree here. The solution I detailed at the end of the E-mail you
> responded to abstracted out implementation details like the fact that
> Paks use a hashtable. A directory for a zip could transparently use any
> kind of algorithm and representation.

You had a good start in that implementation, I just didn't have enough
time to completely cover it this time. Some things were still off-base.
I don't exactly see why it is a template. The entries should be
abstracted as well, so no need to make this class generic, it will take
entries of the abstract class. The requirement just after the
"FileSystemEntityContainer is a template" one is one requiring that the
entries should have some specific methods, this only reinforce this.
There should be simply an abstract class with those methods as pure
virtual, simple and straightforward.

I didn't understand one point completely clearly, but the general tree
of directories representing the filesystem itself and the directory data
in the archive are mostly unrelated. For example, an archive file format
might not have any tree structure, but only provide a string as a key to
get a file information, and that would be sufficient. At archive loading
time, you would read all of the string keys and by simply interpreting
them as pathnames, build up the general tree of directories representing
the filesystem.

> > The way I see this, there is a FooFile::SerializeTo method that just
> > stores the content of the FooFile to a stream. If the format of FooFile is
> > simply recursively serializing the objects that makes up FooFile, then so
> > be it (calling HashTable::SerializeTo if it uses that class in its
> > implementation). Maybe the format of FooFile is all done within a few
> > methods of the FooFile objects and just walking the directory tree nodes
> > to gather the information, then so be it.
> 
> Well, besides the fact that serialization happens at unmount, so its
> umount() and not SerializeTo() that initiates the serialization, the
> solution you propose here is perfectly possible to implement within the
> framework of the solution I have proposed.

Actually, that's how I saw it too. "Initiating" the serialization for me
would be doing the first SerializeTo() call, and that would be umount()
doing it.

> > If the internal implementation has to reflect one to one the internal
> > implementation of the archive itself, you're definitely gonna freak out
> > at some point. Just consider a zip file versus a tar.gz file. The first
> > has each file compressed by itself, and the second has the exact
> > reverse, compression is applied to the whole archive. If you force those
> > two to share the same internal implementation, you will only get
> > yourself in a lunatic asylum.
> 
> I get the feeling that you have not read my proposed solution. I don't
> blame you, its not a very interesting read if you don't really care
> about the implementation details too much.
> 
> The solution I propose completely abstracts out all and any
> implementation details of how a Directory derivative does anything,
> except for the fact that it stores pointers to files and
> sub-directories.

I had read your proposed solution, but a bit quickly, and I didn't talk
about it in fear of having misunderstood it and passing as a fool
afterward. :-)

I had the initial feeling that your solution was specifying a bit too
much the implementation, which isn't too far off, but reading it again
today helped.

> > There are actually where this is the case and everybody is perfectly
> > happy. Think of the NFS protocol. It isn't defined in term of packet
> > types and formats. It is defined in term of an IDL interface! The thing
> > assuring compatibility isn't NFS itself, but the Sun RPC layer
> > underneath it. Parallel the RPC layer to the serialization process, and
> > the NFS layer to the Pak classes.
> 
> Excuse my ignorance, but I really don't have a clue as to what NFS
> (filesystem?), IDL or RPC is.

NFS is Sun's Network File System. IDL is Interface Definition Language,
but don't get fooled by the name seemingly talking about one specific
language, there are a lot of very different IDLs, IDL is more of a
"kind" of language than one actual language (there is for example CORBA
IDL, RPC IDL, XPIDL, MIDL, and so on, all incompatible and different to
diverse degrees, but it is okay, since they have different purposes).
RPC is Sun's Remote Procedure Call.

The thing with RPC is that it is a system where you write an IDL file
that describe a set of functions prototypes (similar to a .h file), and
the RPC IDL compiler reads this file and produces a few files out of
them. It produces a .h and .c duo that seems to implement the functions,
but that actually contain code to do communication. Also, there is a .c
file that contains stubs of the functions that you are supposed to
properly implement and that will become the server.

The NFS protocol doesn't specify what packets are sent over the network,
it just specifies a set of RPC functions (for example, a function that
gets a number of bytes from a file). The thing that makes it compatible
is that RPC is compatible across platforms. Actually trying to know what
is the underlying packet format of NFS would be a relatively complicated
task, but no one needs to do it. You implement the RPC layer first in a
compatible way, and the NFS layer compatibility will just come through.

> > Youngster, you're showing your age and experience! What you don't
> > understand, you are condemn to reinvent, poorly (paraphrasing somebody I
> > do not remember).
> 
> Ok, I was exaggerating. What I meant to say was that: "C++ does this
> *way* better than LISP. If that is any indication of how it is to do
> stuff in LISP normally, then C++ must logically be better than LISP
> generally"



> I don't know first thing about LISP, so perhaps I shouldn't critisize
> it. If that's what *you* meant, well, then you are correct.
> 
> What do you prefer:
> 
> (defun fak1 (n)
>   (fak_iter 1 1 n)
> )
> 
> (defun fak_iter (product counter n)
>    (if (> counter n) product
>        (fak_iter (* counter product) (1+ counter) n)
>    )
> )
> 
> Or:
> 
> unsigned int ComputeFaculty(unsigned int num)
> {
>         for (unsigned int = num - 1; i > 1; --i)
>                 num *= i;
> 
>         return num;
> }

(BTW, i isn't defined!) ;-)

The LISP version, if it would be translated directly in C, would be
recursive too, but why it is called iterative anyway is that a LISP
compiler is normally smart enough to transform this into an iterative.
C/C++ compilers don't optimize over function/method boundaries (except
for inline stuff, of course).

> I'd say doing it the iterative way in C++ is as simple as doing it in
> the recursive way in LISP:
> 
> (defun fak (n)
>   (if (= n 1) 1
>       (* n (fak (1- n)))
>   )
> )

It could have been done iteratively in a similar way to the C++ way, but
the idea is that programming in a recursive manner is often easier, so
that's why the LISP version can have the elegance of the recursive
style, with the efficiency of the iterative style. C++ forces you to do
the loop yourself, it will not optimize a self-calling function with
sufficient parameters into a loop!

> It all comes down to skill, and how many programming problems you've
> solved in the past (ie, experience). For me, that list would be quite so
> extremely very very small. You obviously have a much larger experience
> than I, so even if the problem may seem almost non-existent to you, it
> takes me a little while to create a solution that both have a sound
> design and works the way its supposed to.

Sorry, but I am a very bad teacher, I tend to assume that everyone has
similar experience to myself... I may sound a bit rude at time, but I
don't mean it.

> Anyways, the issue wasn't really serialization, but proper design.
> That's a thing that it is certainly worth spending some time on.

Yes. What you are trying is fringing on more advanced serialization.
In-place edition of streams containing serialized objects is not
trivial. Partial deserialization is also non trivial. The StrList
example I gave you is particularly specific to those two problems, since
they are the real ones you'll have to face. "Ordinary" serialization is
much simpler.

The StrList example simplifies its workings by dividing the task into
two classes, one that is mutable (StrListWriter) and the other that is
read-only (StrList). The mutable one is much less efficient than the
read-only one, but it doesn't matter, since the string lists are used
read-only all the time except for their initial creation.

Note that Turbo Vision is old, in terms of sophistication and level of
C++. For example, there is a lot of inheritance, which can be harmful
(to serialization simplicity, among other things). Also, they use a
complex trick to do the equivalent of a simply dynamic_cast<> in the
serialization code (the serialization methods are in an "interface"
class that you mix in your classes using multiple inheritance, which is
a fine design, but without dynamic_cast<>, some parts of the code are
downright ugly).

-- 
Pierre Phaneuf
http://ludusdesign.com/