[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: file access library



Christian Reiniger wrote:
> 1) mmapped files give the greatest speed gain if the files all fit in the
> OS's disk cache - and stay there as long as they are used.
Right, that's the situation one hopes for.
 
> 2) loading (mmapped or not) another big file will most likely purge the
> disk cache, forcing most of the mmapped files to be re-read on the
> next access
This might not be a common event.
Also, in the case of a file which you *know* you won't
need to read more than once (e.g. movies), there might be a
way to tell the OS not to cache it.  This is a topic for
research... I have seen ways to hint about how much
readahead to do, but don't know what Linux in particular offers here.
I could imagine using a mix of traditional reads (but noncached)
and mmap accesses (cached).

> 3) mmapped files have to be in the OS's disk cache as long as they are
> being accessed to give a speed gain. On the contrary, if they're not in
> cache when the data is accessed the system is slower than a streamed,
> copying approach (stdio-like)
Right, if the files get dropped from the cache when there's
really enough room to keep them in memory, it would be awful.
 
> 4) mmapping more data that can fit in the OS's disk cache at the same time
> most likely (depending on the access patterns) wont give much speed gain or
> even will slow things down - mainly depending of the size of the disk
> cache and the amount of data accessed  in the same (short) period of time.
It's ok to mmap a huge pakfile, as long as you don't access too much
of it.  So the problem isn't mmapping, it's how you access it.
Telling the OS to not cache certain very big accesses may be
a win regardless of whether mmaping is being used or not.
 
> 5) If the conditions are good (not much mmapped data accessed in the same
> time frame, no or just very little other disk accesses, No other processes
> accessing the disk very much, no other processes with big memory use)
> mmapping files can give a big speed benefit as all copying is eliminated.
> 
> So if the game takes care of some factors (the "other processes" factors
> usually can be neglected), mmapping can give a big speed gain. But perhaps
> there is a better solution than raw mmapping...
It boils down to the details- hard to tell from this altitude :-)
 
> Ok, now what if we'd leave the file access system simply stdio-like -
> copying data on access, no mmapping.
> 
> Then we add a system that caches data. We can give this system a
> pointer-length pair to cache, we can instruct it to purge the entire cache
> (e.g. on entering a new level that uses a different set of textures/meshes)
> and we can instruct it to cache no more than n bytes (to prevent accidental
> memory hogging).
> 
> On top of both of these systems we add some code that simulates mmapped
> files by supplying a
> ppFOpenMM (const char *FileName, void **Pointer, int *length);
> function. This code reads the requested file via the file access system,
> decodes it into a directly usable form (e.g. jpeg -> raw RGBA), hands
> the resulting data to the caching system and returns pointer and length.
> Each time a file is requested this way the system looks if the data is
> already cached, just returns the pointer-length pair if yes and
> transparently re-reads it otherwise.
This is the traditional approach.  Nobody will fire you for going this
route.

> >Mmap access can be used for absolutely everything, even streaming
> >video, if you like.  It'd be interesting to see how the OS decides
> But it won't be of any use for streaming video because all data is only
> accessed *once*...
I retract my statement about streaming video, unless there's a
way to mark certain pages of an mmaped file for no-caching,
which seems unlikely.  

> [ debate about C++ and code bloat deleted ]
Let's stick to one big topic per thread!  Besides,
I think we understand each other on the bloat issue well
enough.

I'll try to read up on Linux memory management.  I recall that
this is a big topic for rewriting in the 2.3 kernel... perhaps
we should pay attention to that discussion, and mention
our requirements... if 2.0 or 2.2 don't have good enough
memory management to make this mmap scheme practical, perhaps
we can influence 2.3.
- Dan