[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Funny situation (was: Re: Serialization et al)



Bjarke Hammersholt Roune wrote:

> Ahh, well, perhaps. It wasn't like a "this is the way it's going to be",
> more like "something sorta like this". The specifics, like the actual
> methods to declare, were only listed to get an idea what objects of the
> contained class should be able to do.

Yes, exactly. I feel this is going in a general good way, but I was just
discussing some (small or not so small) details.

> > Why not a single container? This is a case of the composite container (I
> > think this is the name, not sure) pattern. This pattern is for cases
> > where you have a tree of items where nodes and leaves share a lot of
> > functionality (nodes would be directory and leaves would be files, and
> > they are almost the same). You simply make the node class inherit the
> > leave class and have the pointer to the root of the tree be a pointer to
> > a leave. The node class would add a few content manipulation methods
> > (addEntry, removeEntry) and whatever else is specific to the node.
> 
> That's actually quite clever. Still, I don't like Directory being a
> derivative of File. Directories are not files (atleast, not on all
> OSes), and you don't do the same things to or with them.
> 
> Getting the size (or compressed size) of a file makes no sense for a
> Directory. You can't open a directory. Or, well, actually you can
> (getting a directory stream, which is very counterintuitive to me (the
> name)), but the return type is not the same.
> 
> Some of the attribute flags (just one?) doesn't mean the same for a file
> as for a directory.
> 
> File has 20 methods. The stuff I mention above makes up 5 of those.
> That's one quarter of the interface of File that does not make sense for
> Directory.

I guess I could be a little too POSIX-oriented, it feels all so natural
to be to think of directories, memory, serial ports, TCP/IP sockets,
audio devices and so on as files!

Directories are much closer to files than many of these other things
are! Ever tried to "seek" in a TCP/IP socket or a serial port? Hehehe!
But using the file abstraction gave us a very strong and reputedly
simple API that just about everybody love (sometimes, first contact is
harsh, but you just can't help but fall in love with it!).

It just seems so simple and straightforward to me:

 - a directory contains files.
 - a directory is a file.

Compared to:

 - a directory contains files and directories.
 - a directory is similar to a file, but isn't one.

In Unix, directories have most everything a file has, except actual raw
data. They *have* data and a size, but this is a directory entry
structure that is better left private (as it is unportable). For the
sake of sanity and non-POSIX portability, PFile directories should just
give an error when open()ed, similar to open(2) giving a EISDIR when
opening a directory for writing. The "size" of a directory could easily
be made to be zero. Or maybe made to reflect the number of entries in
it.

Off-hand, I do not remember what attribute is unique to a file on Unix.
*All* of them, as far as I know, map to something for a directory, some
similar to what they mean for a file, some different. The main
attributes mean the same to both.

> Also, whenever I'm passed a File pointer, I don't want to be wondering
> (and in some cases having to check) wheter its really a Directory.

If you don't have to check, you don't have to wonder either. It is a
File, as far as you are concerned. If you want to do a thing that is
specific to a directory (like recurse into it), you check.

void recursivelyDoSomethingWithFile(File* file) {
  Directory* dir = file->isDirectory();
  int i;

  if(dir)
    for(i = 0; i < dir->getCount(); i++)
      recursivelyDoSomethingWithFile(dir->getEntry(i))
  else
    doSomethingWithFile(file);
}

Maybe the iterator isn't very nice, but this would work and looks very
clear and simple to me.

> > class File {
> > public:
> >   int open(...);
> >   [ other File methods ]
> > };
> >
> > class Directory: public File {
> > public:
> >   [ implementation of inherited methods from File ]
> >   int addEntry(File*);
> >   int removeEntry(File*);
> > };
> 
> All things that make sense for File doesn't make sense for Directory.

Yes, that's true! And that's also part of the 10% I'm not going to
cover. Just do nothing, return either an error or success, depending on
the call. Ahh, the ease of implementing NOTHING, can't be beat. ;-)

> > This makes for one simple system. There could be a asDirectory() method
> > to get a properly casted Directory* or a NULL if the File is really only
> > a File (similar to the S_ISDIR macro applied to the POSIX structure
> > equivalent to those classes, struct stat). Or dynamic_cast could be
> > used.
> 
> Dynamic cast is *evil*. Atleast, it is in my world. Sometimes, you have
> to use RTTI, but often enough, you don't. To me, its like a goto, just
> with more proper utilizations.

Ah. I like to think of myself as evil sometimes. :-)

The specific asDirectory() call (returning a Directory* or NULL if it
isn't a directory) is a no-dynamic_cast way of doing it, very solid and
static, yet dynamic nonetheless. I just see dynamic_cast as the "generic
C++ way of doing this". I don't mind either, but some people find
dynamic_cast evil and some other don't like the asDirectory() "hidden
cast" saying that "C++ has dynamic_cast to do this"... Since I didn't
know what side you were on, I proposed both. :-)

I still can't believe that they didn't make "break" be able to break out
from multiple levels of loop constructs in C++ (like the break in Bourne
shell)! This is the last fair utilization of goto I see.

> "However, not all checking can be done at compile time. Trying to do so
> is a good design and implementation strategy, but if taken to excess it
> can lead to inflexibility and inconvinience."
> 
> [About RTTI] "Was this facility added to encourage the use of runtime
> type checking? No!" [Goes on to say it was for stopping vendor specific
> solutions, and it does have it uses]
> 
> "The basic rule is as ever: Use static (compile-time checked) mechanisms
> wherever possible - such mechanisms are feasible and superior to the
> runtime mechanisms in more cases than most programmers are aware of."

The asDirectory() method is an intermediate way of doing this: it is
static, as you can't do asDirectory() on just about any object to ask it
if it is a directory, but it let you have the dynamic advantages of
dynamic_cast. It is faster also (I think, depends on how dynamic_cast is
*really* implemented, which I am not sure, what I *am* sure is that
asDirectory() is just about optimal).

One application RTTI come handy is in *very* general cases, for example
the streaming mecanism in Turbo Vision, that takes a TObject* and
streams it. It isn't sufficient to call the virtual void write(TStream*)
of the TObject, you also have to identify the object and write down its
type ID first in the stream.

This could be done in a more general way, but it turns out the safest
way is the "unsafe" dynamic_cast!

> Sometimes, you will want to iterate over only the files or only the
> directories contained in a Directory. This is VERY ineffecient if those
> two are put together. Of course, having them apart makes it a little
> less effecient to iterate over all entries, but not nearly as badly.

Yes, that's true. Another case of 10% missing features. Iterators
methods or classes (I don't do any STL, so you can be sure I'm not
talking STL iterators here) could get over this easily, by providing
ways of iterating only directories or only files, which would do an
if(asDirectory()) in the worst case, but would profit of an internal
implementation using separate directories and files containers.

> Are you also suggesting moving functionality from
> FileSystemEntityContainer to Directory? (I'm not sure) If you are, I
> don't see the disadvantage of abstracting out the specific algorithm
> used in a Directory derivative.

I am suggesting moving the iteration and container interface to
Directory. Internally, something like FileSystemEntityContainer could be
used, but they would be specific to the Directory derivative, I do not
care about how they actually do that.

> > Also, File and Directory themselves would only be abstract classes. File
> > would have UnixFile, PakFile, ZipFile, and whatever else as childrens.
> > Note that I blatantly reused the "PakFile" name for a different class,
> > this clash would have to be fixed. The same would be done for
> 
> Actually, the name that is used is PakFileFile and PakFileDir. This
> makes sense since a PakFile is a collection of something. UnixFile would
> be ok, since Unix here would mean the Unix filesystem, which makes it up
> for a collection of something. A file in a Zip-file would be a
> ZipFileFile, as a Zip is a collection of something.

Okay. I prefer to call things like a "PakFile" an "archive", to refer to
the fact that they have a bunch of files inside them, beside being a
file themselves (this is confusing, so avoiding the overloading of words
helps me when talking about those things).

I knew it was used and "blatantly reused" the name because deep inside,
I'd like PakFile to be called PakArchive. Just ignore me. :-)

> > Simple API, maps quite nicely to the underlying concepts. 10% of the
> > work, and easily fixable! ;-)
> 
> Here I have to disagree: The API is the same (Directory is not exposed).
> Its not much easier to implement, and its less fixable, as there is less
> abstraction. Its also more error prone, as you have to remember to check
> the return type (or call AsDirectory()).

Oh hell, I'll have to just sit down and code it then! I guess we could
use a better resource file system for our libraries at Ludus Design! :-)

> > Yes, but can only contain one or the other of directories or file???
> > What, each directory would have two FileSystemEntityContainer, one for
> > the directories it contains and the other for the files it contains?
> 
> Yes. They are usually used apart, so having them apart makes sense. This
> particular implementation is not exposed, so to clients, it won't
> matter.

Ok, if this is hidden, that's fine.

> > You call that straightforward?
> 
> Why don't you find this to be so? Its perfectly intuitive to me...

I like a single loop and code of the minimum size. To me, a single loop
with a neat if() in it looks more elegant than two loops with no if().
For example, using File as the superclass of Directory (which would be a
container for File*) has its features.

If you add named pipes as a NamedPipe class to this system, would they
be files or another thing? They are not seekable, have no size and
plenty of other differences even worse than those that made you make
Directory a "different" thing, so if you choosed to make this a subclass
of File, I would find the choice of not making Directory also a subclass
of File quite suspicious.

On the other hand, making this NamedPipe class a "different" class,
inheriting from DirEntry, like File and Directory are, would mean adding
a FileSystemEntityContainer to Directory and *three* loops in user code.
This would definitively freak me out.

So rather than freak out later in a big way, I decide to see a bit
further along and freak out in a small way right now. Here I go: What
the hell is going on here?!?

Okay, done! :-)

> > Yes, but there are some semantic requirements defined in the C++
> > standard that prevents compilers from making all kinds of assumptions
> > that could enable safe optimizations.
> 
> I've always wondered why compilers don't have options like "I promise
> not to do this and that", and then the compiler would assume that you
> haven't (giving you errors when you do, if it can check it).
> 
> Also, why nobody has thougth of creating #pragma's that means that this
> or that specific variable or method or whatever doesn't do some specific
> thing that it would otherwise be impossible for the compiler to know.
> 
> Those two things together should make C++ as optimizable as anything
> else.

There's too many of those. HPC (High Performance Computing) compilers
like Kai have those actually, that's why they're the fastest C++. But it
doesn't suffice to get to Fortran level.

> > My day job is in high performance computing, if I may recall you, and
> > most amazingly, the reason a language that sucks like Fortran is so
> > popular, is, err, *because* it sucks! The language is so primitive and
> > so restrictive, while still mostly there and programmable (it *is* a
> > pain in the ass, but not nearly as much as assembler, and a portable
> > pain in the ass furthermore), that it is extremely optimizable!
> 
> I didn't know that. Worse is better ... :)

Indeed. Actually, for even more fun, Fortran 90 is to Fortran what C++
is to C: damn complicated and featureful. So damn complicated that it
compiles just as slow as C++ and is turning out to be just a little bit
faster than C++, so this is getting strange... :-)

> > My favorite example are pointers. Say you have a function that takes two
> > int*. In C++, you simply cannot increment one of the int (not its
> > pointer, but the int it points to) and assume the other didn't change,
> > because they could both point to the same int! There is no such thing as
> > pointers in Fortran, so compilers are free to assume one of the
> > parameters is an invariant thru a loop and avoid doing any costly
> > refetchs.
> 
> Hmm... That's smart. Must be annoying not to be able to pass by
> reference, though.

The compiler passes by value all the time, but there are restrictions at
the point of calling on what you can put as parameter (no the same
variable twice). Those were restrictions to accomodate old compilers in
the 70's and now they make the thing fast. Eek. :-)

> > Even the best C/C++ compilers, like Kai's or Compaq's (previously
> > Digital)
> 
> I believe Microsoft's is pretty good too.

VC++ isn't at all at the same level as Kai, Portland or Compaq. It's
closer to the GNU compilers (in the same league I might say). Those
compilers I listed are compilers made for HPC development, where
everything is tweaked to the bone. They take age to compile and their
code run faster than their shadow (for C/C++).

> > can't approach the level of performance many Fortran compilers
> > can do. By the way, GCC/EGCS is not there at all when it comes to high
> > performance computing (not enough support for things like vectors and
> > specialized architectures and instructions).
> 
> Sadly so :(

They are sufficient for games. Nobody has a vector machine handy to play
games anyway. ;-)

(hmm, the new NEC SX-5 we have has PCI slots... A Voodoo3 in there?
hehehe!!!)

-- 
Pierre Phaneuf
Ludus Design, http://ludusdesign.com/
"First they ignore you. Then they laugh at you.
Then they fight you. Then you win." -- Gandhi