[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: Progress Update



>>Thus, to edit a HashTable, one needs to call ConvertToDynTable(), edit the
>>returned DynHashTable, and then call CreateFromDynTable(), passing the
>>DynHashTable as its parameter. The DynHashTable can then be deleted.
>>
>How costly is the HashTable->DynHashTable conversion?
>
Well, to get data from a HashTable into a DynHashTable is almost free. All
that is done in ConvertToDynTable() is the allocation of the DynHashTable,
and a few assignments to properly empty the HashTable and move ownersip over
to DynHashTable.

The other way around is a little more costly. DynHashTable maintains an
array of data, which contains the data it was originally given by the
creating HashTable, plus a linked list to handle new additions.

Now, to make a HashTable out of a DynHashTable, the array and the linked
list of the DynHashTable has to be copied into one, new array. With that
done, the job of the DynHashTable is finished. Then the HashTable needs to
sort the array (it now uses qsort()), and then index it (ie, create a hash
look-up table).

That's what happens. An allocation and copy to an array, a sort of this
array, an allocation of an additional array (for the hash table) and a pass
over the entry array to construct the hash table.

Combined, this is O(n).

>I'm asking because the current system supports mounting some fs at dir a/
>and then mounting another fs at a/b/ (i.e. in the "area" occupied by the
>first fs). For that a directory has to be added to the hash of die a/
>
>But I'm not sure anymore if this is really a useful feature....
>
Hmm... Wouldn't situations in which there are overlapping sub-directories
get nasty? I mean, if there is an a/b/c and then you mount a b at a, and
this b contains a c? (ie, there is already a/b/c, and you mount a directory
called b at a, and this b directory contains a c).

If we really want to do this, I think the only way to do it without going
overboard with massive resorting and generally a giant mess, is to have a
list (or array) for each directory, and then this list contains a number of
hash containers, each coming from different pak files. Doing the
implementation with a simple "last in get precedence" would make things very
easy to implement, aswell as not causing anything expensive. Look-ups get a
little slower, though.

>>This is easiest done by recursion:
>>
>>WritePakF0(FILE* pStream)
>
>I didn't see how the Pak writing is done - I assume the hash table calls
>some method its entries have to implement? And the directories recursively
>trigger the proper writing (first dirs / dir entries recursively, then
>file data recursively) ?
>
The MISC class (template parameter) takes care of serialization. Ie, it must
define static members SerializeIn(ENTRY_TYPE, FILE*) and
SerializeOut(ENTRY_TYPE, FILE*). I did it that way because it allowed me to
get by without altering anything else (compatibility with existing code). It
wouldn't take me 5 minutes to move functionality from MISC into entry
methods.

>It would be nice if the rest of the PPlay symbols could be changed to the
>new naming and namespace conventions. What's the estimated time frame for
>this?

Unknown. The problem is that there's quite much to do now:

>(1) New prefix/namespace convention
>
I think this is pretty critical. From my experience with the allocators, its
almost as easy as clicking the "replace" menu option (or whatever you use).
There's alot of code to do it to, though.

>(2) New debugging / error message macros (e.g. for API parameter checking)
>
I don't think that's that important. Its certainly not critical to get
something compiling, building or running.

>(3) Improving the headers, including defining the above macros plus some
>more
>
Well, I suspect that to be a somewhat minor job.

>(4) Adding the new URL system to PFile
>
That's a little bit worse. Stuff using the old APIs should still run, build
and compile, though.

>(5) Adding the new Hash container to PFile
>
The compiler will make it clear where this needs to be done, and I plan to
make the new interface of the Directory class correspond somewhat 1:1 to the
old, just with updated names. To use the new Directory class, though, client
code will have to use the new URL system.

>(6) Adding the custom allocators to PFile
>
That's REALLY easy. Everything is 1:1 with new and delete. Only hard
question is what allocator to choose. I've done the best job I can
explaining the advantages and drawbacks of each.

>(3) requires some more planning, but it's a prerequisite for (2)
>
I think getting the main headers rigth is pretty important. It gets
exponentially harder to alter them as implementation continues. What exactly
do you have in mind?

>(1) and (2) require changes throughout all source files, plus debugging of
>    the resulting typos
>
That's a little pain, but I don't think it'll take more than 1 or 2 hours at
most. Probably less. Its just a question of removing the "pp" from the code,
and we've got "search and replace"...

>(1) requires some testing/planning on how to best do the
>    export-from-internal-to-pp and export-to-global thing
>
First, I thougth having a header that included all external stuff form
::pp::internal to ::pp was the best way to go. Now, I'm more for simply, at
the bottom of each header file, including the external stuff that was
defined in that header.

perceps allows for a good deal of customisation. I think we should have a
way of telling wheter a symbol is internal or external, and make it possible
to generate documentation for only external stuff. Many poeple won't care
too much about the internals.

>(4) is almost completed (see below), but has to be combined with the new
>    hash stuff & tested
>
Yeah... The new URL handling is quite well though out (compliment), and
quite simple. I think its actually quite hard to create a bug with it.

>(5) has to be integrated properly with the rest of PFile. Completing (1)
>    beforehand makes this easier. An alternative would be to change the
>    hash/allocator naming/scoping to the old scheme for testing/integration
>
We could also simply use typedefs, so even though stuff hasn't really been
changed, it will look like that to new code. That's what I do to get
Directory to compile.

>(6) well, is this already done (HashTable?) or is the Directory class the
>    main user of these allocators? Anyway, use of them is pretty localized,
>   so this shouldn't be a too big mess. But it has the same problems with
>    (1) as the hash stuff.
>
The directory class is the reason we thougth of having them, as I recall.
I'm sure we'll find several other places to use them too, though. They are
quite good at linked lists (on a very fragmented memory system, you can get
several 1000 percent improvement as a direct cause of improved locality of
reference. Rare situation, though.), and DynHashTable uses an allocator for
that porpuse (DynMemAlloc I think).

They don't have the problems of (1) as the hash stuff, as they only provides
services: they don't rely on anything external. Ie, they don't care how
anything looks, they can still allocate it.

>Furthermore there are some other things I have to do:
>(1) My duties as coordinator of the LGDC
>(2) Planning for the library database
>(3) Write a good description of it for our homepage
>(4) Update the PFile docs (well, I guess I'll wait with that until the
>    thing stabilizes)
>
Do we plan on generating most documentation with perceps, or do we add
substantial stuff on top of that? Anyways, we don't have any users rigth
now, and the system isn't very usable, so I agree that this can easily wait.

Doing perceps documentation not-so-likely-to-be-changed stuff like the new
API handling migth be a good idea.

btw, can perceps write TeX?

>(5) Update the Coding Standard doc with the changes we discussed
>
I think that migth be a good idea to do pretty quickly. It'll make nice
newsitem to show people we haven't died ;)

>(6) Start the coding guide
>
What exactly is that? Stuff like this:

DON'T do this:

for (int i = 0; i < 100; ++i)
	ppStr[i] = "initialize all to this"

DO this:

char** ppTerm = ppStr;
for (char** ppPos = strs; ppPos < ppTerm; ++pPos)
	*ppPos = "initialize all to this"

(its more effecient, if someone's wondering)

Or more stuff like "try not to use templates and exceptions"

?

>(7) Do some preparation for my coming (mid-Oct) physics practical
>(8) Do some preparation for my coming (start-Nov) CS test
>
Well, good luck to you!

>(9) Do 40h/month of coding for money
>
Hmm... I didn't think firms actually hired programmers for just 10 hours a
week. I want a job like that! :)

>(10) live ;)
>
<Bjarke looks puzzeld>
<Bjarke looks VERY puzzeld>
<Steam is practically coming out of Bjarke's ears>
<Bjarke finally decides to find his dictionary>

>>How do I read .sgml files?
>
>ASCII editor ;)
>
Well, yes, that's what I'm doing, but that's what I DON'T want to do.

>SGML looks very similar to HTML (mainly because HTML originally was defined
>in SGML)
>
I noticed... ;)

>BTW - why do you want to read them directly (besides curiosity)?
>
I don't! I want to view them normally.

>don't match anymore. I'm uploading my code to ftpspace (same name as
>before) now so you can have a look at it.
>
Good. I'll see if I can get atleast an interface up as quikcly as possible,
so that we can get things atleast compiling.

>Ok, so how do we continue now? I'm not sure, but this looks good IMHO:
>
>(1) convert the PPlay sources to the namespace use
>(2) integrate the hash stuff & adapt the PakFile* classes
>(3) modify the HashTable thing to use the new URL code
>(4) debug. test.
>
That sounds good. I'll be finishing the Directory class first, though (right
now, its less than 50 lines).

The HashTable doesn't know about URL, or anything else. It shouldn't.
Directory will take care of this. HashTable just returns an entry that
matches a given key, if any.

>The bad thing here is that during the entire time we can't even compile the
>code, let alone run it. At least most of the code (HashTable and the main
>URL processing code are more or less exceptions).
>
Well, after (1) and (2) we should be able to compile it.

>As mentioned above we could also leave the namespace conversion for later
>and instead change the hash stuff to use the old convention (simple). That
>way we have a testable PFile sooner and we have to test less.
>
I'd prefer doing the namespace stuff first.




Well, now some comments about the code you sent me:

>* Why does for example HashTable::IsEmpty () return a ppInt8 instead of a
>bool?
>
Converting an integer to a bool requires processing. More specificly, it
requires the compiler to generate something like this:

if (val == 0)
	boolVal = false;
else
	boolVal = true;

The way the empty condition is checked for, is to just return the value of
m_pEntryArray. If I had made the return value a bool, I'd have the compiler
generate extra code that isn't nessecary.

As an eksample, take this:

if (hashTable.IsEmpty())
	// do something

converting to a bool is completely unnessecary here.

If a bool really is needed, the calling code can just aswell do the
conversion.

>* I'm no native english speaker, but I'm sure it's "purpose" instead of
>"porpuse" and "such" instead of "suchs" :)
>
Well, I didn't pay too much attention to the spelling (should be mostly ok,
though), as I was changing the wording alot all the time.

>* Many comments lack the short description part ( //: )
>
Many? As far as I can remember, ALL of them does, except for the class
description of the allocators. I simply have no idea what to write there.
I'll get around to doing it, though.

>* The template types are not documented and I especially was confused
>by the role of the MISC one (why the ALL_CAPS btw?).
>
Template types behave almost exactly like preprocessor symbols. That's the
reason for the ALL_CAPS.

The MISC handles miscellaneous task having to do with the entries, suchs as
compare, retrieving keys, computing hashes and serializing. Thus,
HashTable-specific code can be moved out of the table entries and into one
out-of-the-way place. That, and I didn't have to alter the class that I was
using as an entry, which was a boon. I can remove the MISC stuff and move
functionality into the entries themselves in less than 5 minutes, though.

The template parameters should be pretty self-explanatory. Look at the
documentation for the class HashTableMisc, it explains in detail what MISC
does.

>* Although C++ doesn't require it, methods that may throw std::bad_alloc
>should tell the user of that by using a proper exception declaration
>(void MyMethod (...) throw (std::bad_alloc))
>
MS VC++ doesn't support this (and throws warnings all the time about it.
They can be turned off with a #pragma, luckily), and I've never seen it
before. That's why I haven't used it. I'll update the stuff I've wrote for
this.

--

I've made some changes to the HashTable class to make it more effecient.
Code is cleaner and leaner, better algorithms is used and documentation have
been improved. I'll add the changes talked about in this Email to that, and
send it to you one of these days (I'll try to get it done tomorrow).