[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Re: Navier-Stokes



On Thu, 18 Apr 2002, Yann Guidon wrote:
>mojn',

moin,

>i'm now finishing this paper about DCT.
>A few hours and i'll upload it on seul.org.
>But in the meantime.......
>
>Juergen Goeritz wrote:
>> just some control change for heavy data reuse on large scale
>> where normal LRU strategy always results in miss.
>this happens but LRU (which creates a kind of "FIFO" before the "dirty"
>data are written back to the main memory system) has a more predictable
>behaviour than other strategies. With this method, for example,
>you don't have problems like with 2-way caches which thrash all the
>time if your "stride" is a multiple of the block granularity. ouch.

I didn't want to argue about LRU, did I? Its purpose and
benefit is proven. Only if your data size exceeds cache
size it may be less performant - if you must access each
value multiple times and you can't find a clever order of
sequential access in your algorithm.

>> But if I want to use a switchable strategy I need some means
>> to control this beast, don't I? Are there any registers where
>> I could add those control bits or do I have to make a separate
>> set?
>At this time, there is nothing prepared yet. At least a few SRs should
>be present to give the programmer the HW configuration (block size, line
>width, replacement strategy) at least in a hardwired way, so the user
>can select the proper algorithm. But a configurable setup is not excluded,
>as long as it doesn't hurt the cooperation of tasks (like the MTRRs, it
>must be accessed only by the superuser).

How about special register access methodology opcodes?
This way you define the entry point how to access them
but do not need to fix everything (layout, number, etc.)
from the beginning. Just have it read/write for superuser
only, one parameter being SR#.

>>From this I understand that the load/store opcodes each have
>> a flag telling the cache what to do with the data, i.e. keep
>> or forget immediately after use.
>yup. that's in the manual since... huh... a long time.

... JG at his high desk carefully blowing the dust off
the ancient manual nearly falling apart. Carefully turning
each page to not have them crumbling to dust. Only by the
contrast enhancer glasses he is able to read the fading
letters from the yellowed surface... :-D

>> >Together and with some adaptative algorithms, this is enough AFAIK.
>> >Adaptative strip-mining is an efficient way to process large data sets
>> >at the speed of the L1. I think that multi-level strip-mining is also
>> >possible, though a bit more complex, but if i think what you think correctly,
>> >this will do the trick.
>> 
>> Yes, this would do the trick. It's still open though how the
>> high level language developer (f,c,c++) could use/influence
>> this option manually.
>
>Adaptative strip-mining is a very high-level construct which
>requires the user to read the system clock so he can dynamically
>adapt a set of parameters (usually a buffer size, which will converge
>to the size of the the L1 minus the most used global variables).
>It's not often straight-forward but highly portable across platforms,
>because it will adapt to it automatically. For example, i had
>designed a program on a PMMX and found different performances
>when run on a PII because the cache strategy is completely different.

Anyway, who wants to end up programming around hardware cache
strategies? :-/ Could probably be easier to change the cache
strategy on the fly?

>However, the LSU hints are not "portable" and not accessible from
>portable C code. Intrinsics or macros are probably necessary,
>but it's as ugly as using MMX intrinsics in C code, so go figure...
>But if a compiler is "smart" (?) enough, it should do the job.
>This goes along with the same process that is used to globally
>allocate the registers, because program-wise statistics (and even
>profiling) are necessary to set the good flags at the right place.

Jo, jo. But there ain't no PD compilers around that could
do the job, are there? The gcc 2stage profiling optimization
features are not thaaat convenient to use for global...

JG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/