[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] second order prefetch in FC0



> i like the idea of "adaptative" computers that record resource
> utilisation patterns.
> This ensures that the code is highly portable and some code
> becomes efficient after a few loop runs. For example :
> [...]
>  - Alpha 21264 uses data and instruction cache chaining (i don't
> remember the right term).
>   Each cache line contains 2 physical addresses of the last two memory
> accesses
>   after the cache line was used. This speeds up linked lists because in
> the case
>   of the cache line being used, the cache mechanism will prefetch the 2
> cache lines
>   referenced by the tag.
> But, as i presume, these methods are certainly completely mined by patents.

Shit. Always when I "invent" something it is already patented. Well
one could only hope that some of those method was patented before
1990. Then it would expire in near future.
Or that the patent only covers adaptive use of it (like in cache as you
said). One need to read these patents.
Using this one via prefetch as I proposed has another advantage.
In-cache hint helps you only if the reference was used already.
But if you are going thru list/tree the compiler knows in advance
that node will be loaded and immediately used as source of a pointer.
If the node is new then cache with chained loads will hot help but
inteligent prefetch could.

> Concerning Devik's proposition of "delayed execution", the big problems are
>  - to generate the delays by a compiler (and force recompile if another
> arch is used)

Agree. But for F-CPU (OOOC) it is the same. If you read the paper you
see that if you fill bad values in then it will simply stall - the same
as with current f-cpu.
AND (!): if you use their proposed relative values (like delay of MULT+1
or delay of ADD+0) it would in fact ADD binary compatibility to current
core. However I'm not sure too with it - it is only interesting to know
about it.
It is surely good to finish "something" only it would be a bit unfortunate
to see that it sucks in final. I'm not sure (but I feel so) that nicO is
right about imbalance of 3r2w regset speed when compared with 6gts
restriction.
There is already finished openrisc which can be compiled as 64bit
AFAIK. I can't find description of its internals (only ISA) but
if f-cpu should be slower or at the same speed then I see no reason
to do it (unless one does it for fun only).

>  - there is no room left for this in the opcodes
> so my idea was to check how these delays could be generated 'on the fly'
> and invisibly, using a first "parsing pass" like for pentium's instruction
> alignment method. But in the end, it consumes much more resources
> and pipeline stages than FC0's OOOC so i don't see the point yet.

yes I agree here. It could be done dynamicaly by remembering for
each register type (delay) of insn which waits for it. When insn
then uses two registers one could select max of them.
But many hazard would need to be checked ... Probably if we would
like to do OOOE we would need to encode this info into insn.
But as you said it doesn't seem to be so important yet.

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/