[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Prefetch and Preload

hi !

Thomas Lavergne wrote:
> > 1) What distance should be used for prefectch (how many cycle beteween a
> > prefetch and the effective load)
> >
> > It depend, if you're in cache no need for it (you lose a cycle for
> > nothing). If it's not in cache you could take 150-200 cycle. But if the
> > distance is too small, depending of the implementation, you could also
> > lose one cycle (because the underlying system didn't find the adress in
> > cache and relaunch a read to the DRAM). If it's too far, the data should
> > have been trash by another data.
> >
> > Then trash or not and cache hit or not depend on the cache type (size
> > and associativity) so it depend on the effective memory adress of the
> > manipulated data. That's why i didn't like prefetch...but preload.
> Ok so Prefetch is very hardware specific. Very bad :-( but I seem we
> don't have any other solutions.
> You take about preload, I suppose you think load data in register few
> cycle before we need it like this
>    load data in r1, r2, r3
>    do somthing without r1, r2, r3
>    uses data to make some computation
> But in my case I can do that, I work under a lot of data so I can't load
> all in register : a 640x480 rgba image take 640*480*4=1228800byte and I
> work on it sequentialy so the best thing was a pointer and prefetch, so
> we have a continuous flow of data.
> A lot of thing need this structure (image processing, file processing,
> crypto...) so I think we need to do some experiment to see the prefetch
> minimum an d maximum distance, so Cedric we need your VM :-)
> > 2) Usualy a prefetch is a "silent load" so it behave the same for the
> > cache. A cache miss will generate a complete cache line load. The VM of
> > cedric will be used to calibrate such things (size of the line...).
> Good but when you're at the end of the cache line, how could you tell to
> the cpu to preload next data in memory because you need it soon ?
> Is it doing automaticaly ?
> What is the length of a cache line ? Is it implementation dependent ?

1) use "stream hint bits" to indicate where the data comes from :
 1 hint (ie, #1) for input data and 1 hint (#2) for the data you store back
to memory (to a different location). hint #0 is default and used for other management purposes.

2) when consecutive cache misses are detected, the LSU setups a dual-buffering
with the LSU lines : 2 lines are used in alternance so you can work on small
data sets in long rows/vectors. that's almost the same principle as on a T3E...

> Thomas Lavergne
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/