[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [f-cpu] Prefetcher (instruction cache L0) our first draft
On Mon, Jan 12, 2004 at 04:35:06PM +0100, Pierre Tardy wrote:
> Attached the modifications of fctools0.3's emu, by tired.
It's not really meant for such things, but what the heck...
I'd just ask you to not re-indent my code. It's too hard to track
changes that way.
> We have added an instruction cache L0. As it is not well defined in the
> handbook, we are requesting comments about this code.
I noticed that you prefetch individual instructions, one per
clock cycle. The real `fetcher' unit should work more efficiently,
prefetching whole cache lines (256 bits = 8 instructions) at once.
On the other hand, it will hold less lines (4...8 should be a
reasonable value).
The algorithm isn't fully defined yet, but the basic outline is
something like this:
Let one fetcher line be the "current" line. This is the line
the next instruction will be fetched from. Every line shall
have an associated address which is the address of the first
instruction contained in the line (that is, the address is a
multiple of 32). If all instructions from the current line
have been fetched, the fetcher will switch to the "next"
line (which should have been prefetched while the current
line was executed). That is, the "next" line becomes the
new current line.
Any fetcher line can be in one of at least three different states:
1 - the line is invalid
2 - the line is being prefetched but not yet valid
3 - the line is valid
In case the current line is not valid, let the CPU stall until
it is. If the line is in `invalid' state, start prefetching
and proceed to state 2.
Whenever the current line is "switched" as outlined above,
let the fetcher take the associated address of the new current
line, add 32 to it (that's not really an add but a "shifted"
increment operation) and start prefetching the (new) next line
at the calculated address. If there were only these two lines,
they would work just like double or "tandem" buffers -- read
from one of them while the other is filled in the background.
When "loadaddr[i]" is executed, take the target address, mask
off the least significant 5 bits, and start prefetching at the
resulting address (if the corresponding line isn't already
being prefetched or even valid). In either case, associate
the register number with the corresponding fetcher line.
When a jump instruction is executed (and the jump is taken),
instructions from the target address may reside in the current
line or another (or none at all, which will cause a stall).
In the second and third case, switch to the new current line
and start prefetching the new "next" line as outlined above.
In the first case, simply continue.
If the return address is stored in a register (3-operand
form), associate the register number with the line the next
instruction would have been fetched from if the jump had not
been taken. This will be either the old current line (which
is already loaded) or the old next line (which should already
be prefetched), so there is no need to start another prefetch
operation if the CPU (or the emulator) is in a sane state.
If the return address is not stored, and the target address
does not reside in the current or next line, the fetcher may
(but need not) stop prefetching the old "next" line and/or
invalidate it.
If the jump is NOT taken, there's no need to do anything.
Whenever a register is overwritten (note: this applies to ALL
instructions!), break the association between the register
number and the corresponding fetcher line. If the line is
no longer associated with any register afterwards, it may
(but need not) be invalidated. Note that an instruction
may modify more than one register, so it may be necessary to
invalidate several associations at once. On the other hand,
it is impossible that any register is associated with more
than a single fetcher line (because it can hold only one
address at any time).
From a virtual point of view, the current line is always
associated with the instruction pointer (PC), and the "next"
line is associated with some nameless register inside the
prefetcher. These lines must never be invalidated.
There are also special events to consider, e.g. instructions
like `jump r1,r1' must be correctly handled. Another question
is whether it makes sense to start another prefetch if a
constant is added to or subtracted from an "associated"
pointer. It may speed up "calculated jumps", but it seems
to be pretty useless in other cases.
If the fetcher is "full" -- that is, all lines are in use --,
invalidate and overwrite the least recently used (LRU) line
that is NOT associated with the instruction pointer (current
line) or the prefetcher.
Yann, did I miss anything?
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/