[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] second order prefetch in FC0



> I don't like prefetch. Did gcc could really calculate the very narrow
> windows where the prefetch is usefull ? Prefetch are implementation
> dependant but also clock speed dependant !

I agree. No gcc can't do it itself. I wanted to try to go
as back as I can in DF tree (idealy find nearest dominator of
current BB) where address to prefetch is already known and
place prefetch there.
With second order (indirect) prefetch I can go sometimes
one indirection further (before other load).
Of course I need to detect CFG cycles and don't withdraw
prefetch out of a loop.

OOOE core is better for hiding memory latency. Or at least
non-blocking loads would help in FC0 but I have little idea
how to implement them effeciently.
On other side non-blocking read still ties one register,
prefetch doesn't (it rather uses cache line).

The answer can be got only by simulation of real code IMHO.

> I prefer multi-load/store much more (a complete cache line for example
> that fill 4 or 8 registers).

What does it help ? It could be used in prolog/epilog but not in
midle. Maybe it could be useful when unrolling loops too to read
whole cacheline. But for example with OR-LSU I proposed you can
do it even without multiload.

> So you're proposal look like a kind of double load ( a = toto -> titi
> ) or load then store (toto->titi = a). This could be a feature of
> "internal cpu buses" and a new instruction. As we control L1/L2 access
> and we don't need to conform to the limited feature of SDRAM, this
> kind of bus cycle could be added and optimised closed to the cache
> controller.

yeah it sounds reasonable. Once I got time I'll look to gcc prefetches
so we get some stats on usage of this one.

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/