[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Some benchmarks



I actually intenteded to benchmark the tree currently used also, to see if
this new implementation is slower or faster. However, when I switched the
DLLs from debug to release versions, it htew protection errors at me at
every turn, not to mention that it also made protection errors from its
constructor, regardless of what set of DLLs were used. The currentl used
tree wes significantly slower than the new one, also without hashing, but
even though I had turned optimizations on, and the part of the tree didn't
use the debug DLLs, I still don't think it would be fair to compare release
versions with debug versions.

Also, these benchmarks are not gotten by profiling, but by the standard
library function clock(). This should be reasonably accurate, as the time
resolution (ie, the smallest interval that can be detected) is 10
milliseconds. That doens't sound very accurate, but as I've made sure that
each benchmark runs for longer than 2 seconds, being 10 milliseconds of
shouldn't matter at all. All these benchmarks are the average of atleast 3
runs, some of them much more (wasn't by intention, I just forgot to change
the parameters of the test program :).

The benchmarks for finding entries are made upon 3 different kinds of trees:
one without hashing, one with hashing that uses a 1-byte variable as its
hash value, one with hashing that uses a 2-byte value as its hash value and
one that uses a 4-byte value as its hash value. the benchmarks for
adding/removing is made only on an unhashed tree and a tree that uses a
1-byte variable as its hash value (I hadn't thought of doing larger hash
variables when I made that, and the patterns from the search benchmarks
should be appliccable to the other benchmarks). the benchmarks for the
adding/removal might be a little slower than is the reality, but the
idfference should be very minimal.

Here are the benchmarks:




>>>>>>><** building/adding **><<<<<<

** Average amount of time used to BUILD 10 seperate trees of 5000 entries
one entry at a time, from a set of 5000 entries that has been throughfully
randomised. **

Unhashed tree:           14s 660ms      3,41 entries per ms
1-byte hashed tree:    11s 250ms      4,44 entries per ms

** Average amount of time used to BUILD 10.000 seperate trees of 100 entries
one entry at a time, from a set of 100 entries that has been throughfully
randomised. **

Unhashed tree:           5s 500ms      181,81 entries per ms
1-byte hashed tree:    5s 680ms      176,05 entries per ms

(notice that at 150 entries, the hashed tree is a little bit faster. So the
treshold for 1-byte hashing paying of must be somewhere in the range
100-150)

** Average amount of time used to BUILD 10.000 seperate trees of 100 entries
one entry at a time, from a set of 100 entries that has been throughfully
randomised. **

Unhashed tree:           14s 660ms      , entries per ms
1-byte hashed tree:    11s 250ms      , entries per ms




>>>>>>><** search/finding **><<<<<<

** Average amount of time used to FIND each entry one at a time in a tree of
5.000 entries 1.000 times. **

Unhashed tree:           9s 830ms      508,64 entries per ms     100,00%
1-byte hashed tree:    7s 580ms      629,63 entries per ms     123,78%
2-byte hashed tree:    6s 770ms      738,55 entries per ms     145,20%
4-byte hashed tree:    5s 220ms      957,85 entries per ms     188,31%

** Average amount of time used to FIND each entry one at a time in a tree of
1.000 entries 10.000 times. **

Unhashed tree:           13s 400ms    746,26 entries per ms     100,00%
1-byte hashed tree:    10s 380ms    963,39 entries per ms     125,47%
2-byte hashed tree:    9s   420ms    1061,57 entries per ms   142,25%
4-byte hashed tree:    9s   100ms    1098,90 entries per ms   147,25%

** Average amount of time used to FIND each entry one at a time in a tree of
100 entries 100.000 times. **

Unhashed tree:           7s 830ms      1277,13 entries per ms     100,00%
1-byte hashed tree:    6s 100ms      1639,34 entries per ms     128,36%
2-byte hashed tree:    7s 300ms      1369,86 entries per ms     107,26%
4-byte hashed tree:    5s 500ms      1818,18 entries per ms     142,36%

** Average amount of time used to FIND each entry one at a time in a tree of
10 entries 1.000.000 times. **

Unhashed tree:           3s 600ms      2777,77 entries per ms     100,00%
1-byte hashed tree:    3s 730ms      2680,96 entries per ms       96,51%
2-byte hashed tree:    4s 760ms      2100,84 entries per ms       75,63%
4-byte hashed tree:    3s 570ms      2801,12 entries per ms     100,84%



>>>>>>><** removal **><<<<<<

well, I lost the piece of paper I wrote it down on, but it was a little bit
slower than adding. This isn't very important, though.





Ok. By the above benchmarks, it should IMHO be pretty clear that hashing is
the way to go. If anyone disagrees, look at the benchmarks one more time: in
*EVERY* benhmark, hashed trees clearly outperforms non-hashed trees. The
only exception is the trees with only 10 entries. Here, the difference isn't
that clear, and 1-byte hashing is a bit slower than no hashing, but 4-byte
hashing still is faster, albeit by very little. 10 entry trees should be
quite rare, though, and the performance for unhased, 1-byte hashed and
4-byte hashed trees is so close so as to be practically the same. So,
hashing IMHO is the way to go.

The only question is wheter to utilise 1-byte, 2-byte or 4-byte hashing. I
don't think 2-byte hashing is a very good idea, as it seems to be higly
unreliable (I don't see the reason, but after alot of benhmarking, I can
tell tou that this clearly is the case): sometimes its quite good, and other
times its very bad.

Then the chice is between 1-byte hashing and 4-byte hashing. With very large
trees (5000 entries), 4-byte hashing is dramaticly faster than 1-byte
hashing, gaining an additional 64,53% gain on top of 1-byte hashing's
already impressive gain of 23,78% over the non-hashed tree. This dramatic
difference gets a little more tame for trees that are still large, but not
that large (1000 entries), as 4-byte hashing "only" has an additional gain
of 24,78% over 1-byte hashing's gain of 25,47% compared to the non-hashed
trees. With 100 entry trees, 1-byte hashing catches up a little more, as
here the additional gain is only 14% over 1-byte hashing's 28,36% gain over
non-hashed trees. That's still *quite* something, though.

Last is the 10 entry trees, where unhashed trees finally gets faster than
1-byte hashed trees, getting the upper hand by 3,49%. 4-byte hashed trees
has about the performance of unhashed trees, with a gain of measealy 0,84%:
hardly worth talking about.

Now, my estimate is that there will almost be no 10-entry trees (and it
really doesn't matter, as they are so small performance is extremely fast
nomatter the tree used), but that most trees (atleast those that matter)
will be somewhere between 50-5000 entries. the 50-100 entry trees will most
probably be used for things like saved games, level updates or patches.
Performance isn't crusial here, though I guess it matters somewhat.

The 1000-5000 entry trees will most probably only be used for the data
dir(s) of very large games, though some games spread the data out in several
subdirs, which will most probably be something like 100 entry trees.
However, it is in the 1000-5000 range that performance really begins to
matter, as these are the trees that it takes the longest to find something
in (since they are large), and it is also the trees that will be accessed
the most times (since they contain the most entries).

Now, 1-byte or 4-byte hashing? My opinion is that it should be 4-byte
hashing, since I believe the 3-byte extra memory per entry is well-spent on
getting as much as a 64,53% additional gain over 1-byte hashing. Of course,
the actual additional performance gain will most likely be in the 23,78% -
14% range (for 100 to 1000 entry trees), but that is also plenty of reason
for me to shoose 4-byte hashing.

Putting things in a little perstective, adding 4 bytes for every file in a
pak will make a pak with 10.000 files larger by about 9,7 KBs (the same goes
for the memory image of the pak file structure)... Hardly worth talking
about when we consider the hefty performance gains with especially large
trees, don't you think?