[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SV: SV: SV: Progress Update



Bjarke Hammersholt Roune wrote:

> > Obviously, there is some misunderstanding between each other. I could be
> > wrong, but just in the case I'm right...
> 
> I now know where the problem is. The reason is that we have some quite
> specific needs of the HashTable, which makes my use of it quite economic,
> even if it wouldn't be in so many other situations.

I see, and this is a good reason. Personally, I'd have named that class
something more like StaticHashTable, to reflect its immutability and
leave the name free for a "normal" hash table. But maybe I'm nitpicking
here.

> > Nice idea. I use the order of the hash table as the maximum size of
> > buckets, so that as the size of the hash table increases (and thus
> > becomes more expensive to grow), it takes larger buckets to trigger the
> > growing of the table.
> > 
> > For the very worst case, you'd have a 2^32 entry hash table, each with
> > 32 entries in their bucket, which makes for a huge amount of total
> > entries. Even the worst case of having to look thru all of the 32
> > entries in a bucket is completely insignificant compared to the worst
> > case of a simple array or a linked list, thus I feel I'm winning enough.
> > 
> > I think that my straight method isn't O(n) as you mentioned, but rather
> > O(n log n), which is nice enough for me.
> 
> Hmm... I'm not sure I understand you, but it would seem to me that if all
> entries in the hash table hashes to the same location in the table, then
> they would all have to be stored in that location, making you search a
> serial one, which is clearly both very ineffecient and O(n).
> 
> Are you doing some kind of overflow checking that makes entries "spill over"
> to the next position if the position gets full. That would be ineffecient
> for my porpuse.

They *cannot* all hash to the same location, since I there is a load
limit. When an entry has to be added to a full bucket, I grow the hash
table. Of course, pathogen hash patterns (many keys hashing to similar
values) could cause excessive growing of the table, but that life with
hash tables.

When the hash table is grown and the entries are reshuffled in their new
bucket, one more bit of the hash value is used for choosing the bucket,
which, according to probabilities, should split the entries in half
across the two buckets involved by the additionnal bit. And the buckets
are one entry larger than before also.

My algorithm, in a nutshell, is this:

 - 2^n buckets in the hash table
 - n entries maximum in each bucket
 - the n less significant bits of the hashed key (which are normally
those
   with the most variance) are used as the index into the array of
buckets
 - when you try to add an entry to a full bucket, you have to increment
n by
   one.

Very simple and quite efficient. Growing the table is the only
significantly costly operation, hence the number of entries per bucket
decided by the size (cost) of the hash table, to reduce their occurence
as the hash table gets larger.

> > The reasoning is simply that it wasn't worth the effort to switch from
> > one type of hash table to another, then back as we're finished adding
> > stuff to it. We have one Hashtable class, that's it.
> 
> Well, going HashTable->DynHashTable is free, going DynHashTable->HashTable
> isn't more expensive than adding one entry would normally be.

By "worth the effort", I meant in programmer-time. You had to program
two different hash tables and each time you add stuff to one, you have
to invoke a bunch of methods and juggle conversions around.

I have *one* hash table implementation and I do ht->set("thekey",
"thevalue") and that's it.

> The one and almost only point (so far) of PenguinFile is to provide for fast
> and effecient file handling (convinience is the other goal). The opening of
> files is *the* area in which really substantial gains can be made, while
> still keeping things quite simple. Thus, this is quite an important thing.

Opening of the files? Heck, in Quadra, there is about *10* such file
opens over all the course of a typical short game, which includes
reading the configuration (once), reading the highscore file (once),
loading the resources (once) and loading the highscoring demos (when you
click on the button).

Per frame, we have an average of zero file opening. Okay, so how would
you rate the investment in time now for a game such as Quadra?

> We also plan things like dynamic on-demand disk-caching of slower medias
> like CD-ROMs or DVDs and transparent memory mapping of files. This is still
> a bit far off right now though.

This is better stuff (the CD-ROM caching).

> > Unix got where it is now with the "90% of the work with 10% of the
> > effort" rule, guys...
> 
> Well, yes, but it wouldn't work if they simply said "Well, there's much less
> effort in doing nothing at all, so lets leave it at that"... ;)

You'd be surprised. :-)

For example, do you know what Unix does when an interrupt breaks a
system call in half and messes the internal state of the system call?
NOTHING! It just returns with EINTR (interrupted system call)!
Outrageous, isn't it? :-)

But the good news is that this is fixable. You just have the libc, which
repeats the system calls until they work. :-)

Did they "do it properly"? Nope! But it's fixable...

> > Otherwise, you get Lisp and Lisp Machines, and you
> > know where they are now (in a garbage dump and museums). Nobody cares
> > that they were the best machines and environments ever conceived. Don't
> > get pedantic with that last 10% of work, Unix is *FINE* without it, see?
> 
> I think people *would* have cared if they could've had the best possible
> solution, while still having the complete free choice of OS and everything
> else... Hmm? :)

Yep, but they did have at least a much better solution and people didn't
care. NeXT also had a much better system and look where they are now
(OpenStep programming is a dream come true by the way, simply
fantastic).

And they didn't choose "free choice of OS" or whatever over that, they
used proprietary Unixes instead. *Cheap* Unixes. On *cheap* boxes.

Look at Windows: Unix exist since far longer and is obviously better,
and yet Windows is prevailing.

-- 
Pierre Phaneuf
Ludus Design, http://ludusdesign.com/
"First they ignore you. Then they laugh at you.
Then they fight you. Then you win." -- Gandhi