[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: file access library



Dan Kegel wrote:

>BTW, I should mention that this mmap-centered approach is
>a research topic.  I *think* it'd be a big win, but I'm not
>sure yet.

Ok, let's start researching. (These are mostly raw thoughts):

1) mmapped files give the greatest speed gain if the files all fit in the
OS's disk cache - and stay there as long as they are used.

2) loading (mmapped or not) another big file will most likely purge the
disk cache, forcing most of the mmapped files to be re-read on the
next access

3) mmapped files have to be in the OS's disk cache as long as they are
being accessed to give a speed gain. On the contrary, if they're not in
cache when the data is accessed the system is slower than a streamed,
copying approach (stdio-like)

4) mmapping more data that can fit in the OS's disk cache at the same time
most likely (depending on the access patterns) wont give much speed gain or
even will slow things down - mainly depending of the size of the disk
cache and the amount of data accessed  in the same (short) period of time.

5) If the conditions are good (not much mmapped data accessed in the same
time frame, no or just very little other disk accesses, No other processes
accessing the disk very much, no other processes with big memory use)
mmapping files can give a big speed benefit as all copying is eliminated.

So if the game takes care of some factors (the "other processes" factors
usually can be neglected), mmapping can give a big speed gain. But perhaps
there is a better solution than raw mmapping...

>> >IMHO the only interface that matters is the one that gives you
>> >a pointer to the file's data, and its length.
>> Hmm, let's talk a bit more on this point. What kind of data will be
>> retrieved on the fly? Textures? Meshes?
>> It's data that doesnt occupy much space compared to the average amount of
>> RAM in an average computer, right (otherwise caching it simply wouldnt work
>> reliably)?
>Right.

Ok, now what if we'd leave the file access system simply stdio-like -
copying data on access, no mmapping.

Then we add a system that caches data. We can give this system a
pointer-length pair to cache, we can instruct it to purge the entire cache
(e.g. on entering a new level that uses a different set of textures/meshes)
and we can instruct it to cache no more than n bytes (to prevent accidental
memory hogging).

On top of both of these systems we add some code that simulates mmapped
files by supplying a
ppFOpenMM (const char *FileName, void **Pointer, int *length);
function. This code reads the requested file via the file access system,
decodes it into a directly usable form (e.g. jpeg -> raw RGBA), hands
the resulting data to the caching system and returns pointer and length.
Each time a file is requested this way the system looks if the data is
already cached, just returns the pointer-length pair if yes and
transparently re-reads it otherwise.

Perhaps the caching system should better be integrated into the third
system. I don't know..

IMHO this system provides almost the same speed as mmapped files (it's
slightly slower on the first access of a file and the same speed
afterwards) while being much more reliable (It will only slow down if
swapping starts - and that's much more unlikely than a disk cache purge).
Oh - and it also allows for fast access to compressed data (jpeg textures
etc) by allowing to cache the *decompressed* version of the file. raw
mmapping just caches the version as it is on the disk.

What do you think?

>> [doing our own caching] IMHO would be even faster than some caching file access
>> system because it has less overhead for access.
>I don't follow you here.  Where is the overhead in mmap?  Assuming

Yes. Please forget this comment of mine. I seemingly didn't have a clear
mind when writing it... 

>Mmap access can be used for absolutely everything, even streaming 
>video, if you like.  It'd be interesting to see how the OS decides

But it won't be of any use for streaming video because all data is only
accessed *once*...

>> The problem with writing OO in C instead of C++ is that it's much more
>> error prone. In C++ 70-80% of the bugs are caught by the compiler because
>> of strict type checking, access control, constructors/destructors.
>> Programming object orientedly in C means having to use many, many pointers
>> while having very little access control and automatic deallocation (which
>> is done in desctructors in C++).
>I'll concede that C++ makes OO style programs shorter than
>equivalent OO style C programs,
>but by making the constructors and destructors automatic,
>it takes control out of your hands, and by allowing inheritance,

I don't think you lose control by using constructors/destructors. They just
do what has to be done anyway - and you can't forget calling them.

>tempts programmers into writing severely bloated code.
>This is fine in a lot of areas, but not in writing core high-performance
>modules.

Hmm, yes, you're right...

>by the resulting classes.  IMHO avoiding inheritance
>completely is safer when defining a core component for use by games;
>you then have total control over CPU and memory use.
>
>> I'm not sure yet about how these things will be related. The only thing I'm
>> relatively sure of is that the directory structure of the PakFiles will be
>> represented with C++ classes. I tried once to do this in C - it was
>> absolutely ugly :(
>You must not have been writing OO-style C, then, if C++ helped.

You got me here :)
Actually the C++ "features" making the most difference in this case were the
STL "list" and "string" classes and constructors/destructors. 

Yes, I know how to spell "bloat" ;)

Hmmm, it seems I'm not the right one for the main File access
implementation - I never cared much about bloat etc. I'm more the high
level guy...

Anyway, speed and bloat don't matter in the PakFile compiler, so I'll
continue implementing it (in C++, with tons of STL classes ;). It's code is
also quite different from the PakFile reading code, so it's not much
duplication of effort.

Cu
	Christian
--

Daddy what does "FORMATING DRIVE C" mean?