[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

stream hint (was:Re: [f-cpu] loadconsx and stream hints)



On Thu, 09 Jan 2003 01:59:47 +0100
Yann Guidon <whygee@f-cpu.org> wrote:

> hi,
> 
> devik wrote:
<...>
> >Also if someone can write a few words what stream
> >bits can be good for - I can't find definition. Will
> >there be separate caches for each stream for example ?
> >  
> >
> stream hints are used to differentiate unrelated data streams,
> that is, flows (in and out) from memory from separate, independent
> arrays.
> 

Like Cray.

> By default there is no hint (hint #0) but this can be useful in future
> architectures where multiple Load/Store Units are implemented
> or when there is a direct SDRAM interface (the stream hint bit
> can then serve as a bank number, in order to optimise prefetch
> time and bandwidth).
> 

:) linking SDRAM bank to the stream bit is VERY bad idea ! SDRAM have 4
bank in each chit. RDRAM 32. How could you simply hanle that ?

Bank is a memory trick to enable pipelining access. So the bank number
is a part of the adress bit, we have a strong interrest to interleave
access to the bank. But the lower limit is the burst limit of memory
interface. So it will depend on bus size/ burst lenth/memory
technology/...



> It is not yet used and can remain zero, but i guess that the SDRAM
> bank trick
> will be used first because it simplifies the SDRAM interface logic.
> A "hint number" (or bank number, or transaction number)
> can be allocated to the stack, the others are used for continuous 
> (streaming)
> access to main memory (for example, one is needed for memset and two
> are needed for strcmp).
> 

Mainly, a stream in the supercomputer world are a hint to the memory sub
system, to say that the memory didn't need to be coherent. Other wise
each memory acces *must* be in order to avoid to lose the memory
consitency. For example, you must check the content of the adresse of a
unfinish write (in case of the use of write buffer to enable burst
transfert) before asking for a read.

AMD Opteron technology will used 2 access port L1 caches, to increase
speed. Stream could be a very great hint to adresse 7 or 8 L1
nano-caches !

A stream could be an array or the stack, so mainly data are more
"aligned", so burst are more effective and prefetch is also easier.

By using 7-8 caches, we simulate 7-8 port memory subsystem !

The main problem is to keep consistency between function call, for
example. We don't know which stream are used for a specific pointer
given to a function.

One ideas was to use a stream #0 (S0) that access the 7 nano-caches (of
the other stream) in parrallel. So if there is more than a hit, it must
fetch the data from the memory subsystem because it didn't know which
caches is the must up-to-date.

So when we "leave" a stream, it must be flush to the next level of
caches or the last write must be in write thought cache policy to keep
the coherency for futur access.

Maybe the S0 trick is not usefull anymore, if this could be taken into
account by the compiler.

This use of stream could be very usefull to produice superscalar
core that could access memory in parrallel.

nicO

> >thx, devik
> >
> 
> YG
> 
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
> _____________________________________________________________________
> GRAND JEU SMS : Pour gagner un NOKIA 7650, envoyez le mot IF au 61321
> (prix d'un SMS + 0.35 euro). Un SMS vous dira si vous avez gagn_.
> R_glement : http://www.ifrance.com/_reloc/sign.sms
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/