[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Re: Navier-Stokes



mojn',

i'm now finishing this paper about DCT.
A few hours and i'll upload it on seul.org.
But in the meantime.......

Juergen Goeritz wrote:
> Hi again,
jop.

> On Thu, 18 Apr 2002, Yann Guidon wrote:
> >i don't know why you called this cache-related post "Navier-Stokes",
> >unless you have an idea ;-)
> 
> :-)
> just some control change for heavy data reuse on large scale
> where normal LRU strategy always results in miss.

this happens but LRU (which creates a kind of "FIFO" before the "dirty"
data are written back to the main memory system) has a more predictable
behaviour than other strategies. With this method, for example,
you don't have problems like with 2-way caches which thrash all the
time if your "stride" is a multiple of the block granularity. ouch.

> >FC0 controls the L1 cache and uses 2 means to control the data locality :
> > - L1 works with LRU or whatever strategy the user implements.
> >   Personally i have better confidence in LRU because it's more
> >   predictive than others.
> 
> But if I want to use a switchable strategy I need some means
> to control this beast, don't I? Are there any registers where
> I could add those control bits or do I have to make a separate
> set?

At this time, there is nothing prepared yet. At least a few SRs should
be present to give the programmer the HW configuration (block size, line
width, replacement strategy) at least in a hardwired way, so the user
can select the proper algorithm. But a configurable setup is not excluded,
as long as it doesn't hurt the cooperation of tasks (like the MTRRs, it
must be accessed only by the superuser).

> > - cache hinting flags : the load/store instructions have a flag
> >   that indicates whether the accessed line can be stored in L1
> >   or directly flushed outside of FC0. So if you know that you won't
> >   reuse this data, this bypasses the L1 "buffer" (because it acts
> >   as a huge FIFO for the write back).
> 
>From this I understand that the load/store opcodes each have
> a flag telling the cache what to do with the data, i.e. keep
> or forget immediately after use.

yup. that's in the manual since... huh... a long time.

> >Together and with some adaptative algorithms, this is enough AFAIK.
> >Adaptative strip-mining is an efficient way to process large data sets
> >at the speed of the L1. I think that multi-level strip-mining is also
> >possible, though a bit more complex, but if i think what you think correctly,
> >this will do the trick.
> 
> Yes, this would do the trick. It's still open though how the
> high level language developer (f,c,c++) could use/influence
> this option manually.

Adaptative strip-mining is a very high-level construct which
requires the user to read the system clock so he can dynamically
adapt a set of parameters (usually a buffer size, which will converge
to the size of the the L1 minus the most used global variables).
It's not often straight-forward but highly portable across platforms,
because it will adapt to it automatically. For example, i had
designed a program on a PMMX and found different performances
when run on a PII because the cache strategy is completely different.

However, the LSU hints are not "portable" and not accessible from
portable C code. Intrinsics or macros are probably necessary,
but it's as ugly as using MMX intrinsics in C code, so go figure...
But if a compiler is "smart" (?) enough, it should do the job.
This goes along with the same process that is used to globally
allocate the registers, because program-wise statistics (and even
profiling) are necessary to set the good flags at the right place.


> readU2
> JG
> >readU,
> >WHYGEE
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/