[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Rep:[f-cpu] Prefetch and Preload

-----Message d'origine-----
De: Thomas Lavergne <thomas.lavergne@laposte.net>
A: f-cpu@seul.org
Date: 13/06/02
Objet: [f-cpu] Prefetch and Preload

> 1) What distance should be used for prefectch (how many cycle beteween
> prefetch and the effective load)
> It depend, if you're in cache no need for it (you lose a cycle for
> nothing). If it's not in cache you could take 150-200 cycle. But if
> distance is too small, depending of the implementation, you could also
> lose one cycle (because the underlying system didn't find the adress
> cache and relaunch a read to the DRAM). If it's too far, the data
> have been trash by another data.
> Then trash or not and cache hit or not depend on the cache type (size
> and associativity) so it depend on the effective memory adress of the
> manipulated data. That's why i didn't like prefetch...but preload.

Ok so Prefetch is very hardware specific. Very bad :-( but I seem we 
don't have any other solutions.

You take about preload, I suppose you think load data in register few 
cycle before we need it like this

   load data in r1, r2, r3
   do somthing without r1, r2, r3
   uses data to make some computation

But in my case I can do that, I work under a lot of data so I can't load

all in register : a 640x480 rgba image take 640*480*4=1228800byte and I 
work on it sequentialy so the best thing was a pointer and prefetch, so 
we have a continuous flow of data.
A lot of thing need this structure (image processing, file processing, 
crypto...) so I think we need to do some experiment to see the prefetch 
minimum an d maximum distance, so Cedric we need your VM :-)

>>> No it's not exactly that.

load R1, R2, r3
Work on R4 r5 r6
load R4 R5 r6
work on R1 R2 R3
loop beginning

>>> it look pretty like prefetch but the load is effectively done, so
there is no penalty if the distance is too low. 
look at loadm (but instruction format should be change).

> 2) Usualy a prefetch is a "silent load" so it behave the same for the
> cache. A cache miss will generate a complete cache line load. The VM
> cedric will be used to calibrate such things (size of the line...).

Good but when you're at the end of the cache line, how could you tell to

the cpu to preload next data in memory because you need it soon ?
Is it doing automaticaly ?

>>> It could it's what is called prefetch buffer. This could be used to
have one cache line "in advance", by detecting access pattern. But it
could be easier to use the stream number associated to one prefetch

What is the length of a cache line ? Is it implementation dependent ?
>>> Yep !! typicaly 128-256 bits. Could be more.

Thomas Lavergne                       "Le vrai rÍveur est celui qui rÍve
                                        de l'impossible."  (Elsa
d-12@laposte.net    ICQ:#137121910     http://assoc.wanadoo.fr/thallium/

To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

ifrance.com, l'email gratuit le plus complet de l'Internet !
vos emails depuis un navigateur, en POP3, sur Minitel, sur le WAP...

To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/