[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [f-cpu] Prefetcher (instruction cache L0) our first draft
On Mon, 12 Jan 2004 21:17:05 +0100
Michael Riepe <firstname.lastname@example.org> wrote:
> On Mon, Jan 12, 2004 at 04:35:06PM +0100, Pierre Tardy wrote:
> > Attached the modifications of fctools0.3's emu, by tired.
> It's not really meant for such things, but what the heck...
> I'd just ask you to not re-indent my code. It's too hard to track
> changes that way.
arg sorry, it is a reflex.. ESC-C-q..
diff -E is your friend..
> > We have added an instruction cache L0. As it is not well defined in the
> > handbook, we are requesting comments about this code.
> I noticed that you prefetch individual instructions, one per
> clock cycle. The real `fetcher' unit should work more efficiently,
> prefetching whole cache lines (256 bits = 8 instructions) at once.
Yes, it is a good idea.
> On the other hand, it will hold less lines (4...8 should be a
> reasonable value).
64 is aproximatively the size we choose. It is of course #defined
see the 'p' command of the debugger to have a asci art representation of the cache.
> The algorithm isn't fully defined yet, but the basic outline is
> something like this:
> Let one fetcher line be the "current" line. This is the line
"current" line is PrefetchPC.
> the next instruction will be fetched from. Every line shall
> have an associated address which is the address of the first
> instruction contained in the line (that is, the address is a
> multiple of 32). If all instructions from the current line
> have been fetched, the fetcher will switch to the "next"
> line (which should have been prefetched while the current
> line was executed). That is, the "next" line becomes the
> new current line.
Ok, that is our behavior.
> Any fetcher line can be in one of at least three different states:
> 1 - the line is invalid
> 2 - the line is being prefetched but not yet valid
> 3 - the line is valid
> In case the current line is not valid, let the CPU stall until
> it is. If the line is in `invalid' state, start prefetching
> and proceed to state 2.
I do not need the state 2 in my implementation.
> Whenever the current line is "switched" as outlined above,
> let the fetcher take the associated address of the new current
> line, add 32 to it (that's not really an add but a "shifted"
> increment operation) and start prefetching the (new) next line
> at the calculated address. If there were only these two lines,
> they would work just like double or "tandem" buffers -- read
> from one of them while the other is filled in the background.
> When "loadaddr[i]" is executed, take the target address, mask
> off the least significant 5 bits, and start prefetching at the
> resulting address (if the corresponding line isn't already
> being prefetched or even valid). In either case, associate
> the register number with the corresponding fetcher line.
ok, but in the case of the 3 embraced loops, we have seen that the prefetcher stops prefetching the line immediatly, as the current stream is not yet in cache.
> When a jump instruction is executed (and the jump is taken),
> instructions from the target address may reside in the current
> line or another (or none at all, which will cause a stall).
> In the second and third case, switch to the new current line
> and start prefetching the new "next" line as outlined above.
> In the first case, simply continue.
> If the return address is stored in a register (3-operand
> form), associate the register number with the line the next
> instruction would have been fetched from if the jump had not
> been taken. This will be either the old current line (which
> is already loaded) or the old next line (which should already
> be prefetched), so there is no need to start another prefetch
> operation if the CPU (or the emulator) is in a sane state.
We have not taken in account the return register. [TODO].
> If the return address is not stored, and the target address
> does not reside in the current or next line, the fetcher may
> (but need not) stop prefetching the old "next" line and/or
> invalidate it.
> If the jump is NOT taken, there's no need to do anything.
> Whenever a register is overwritten (note: this applies to ALL
> instructions!), break the association between the register
> number and the corresponding fetcher line. If the line is
> no longer associated with any register afterwards, it may
> (but need not) be invalidated. Note that an instruction
> may modify more than one register, so it may be necessary to
> invalidate several associations at once. On the other hand,
> it is impossible that any register is associated with more
> than a single fetcher line (because it can hold only one
> address at any time).
ok this is done by putting 0xff in the register's line pointer.
> From a virtual point of view, the current line is always
> associated with the instruction pointer (PC), and the "next"
> line is associated with some nameless register inside the
> prefetcher. These lines must never be invalidated.
> There are also special events to consider, e.g. instructions
> like `jump r1,r1' must be correctly handled.
What do jump r1,r1? Jump then invalidation?
> Another question
> is whether it makes sense to start another prefetch if a
> constant is added to or subtracted from an "associated"
> pointer. It may speed up "calculated jumps", but it seems
> to be pretty useless in other cases.
> If the fetcher is "full" -- that is, all lines are in use --,
> invalidate and overwrite the least recently used (LRU) line
> that is NOT associated with the instruction pointer (current
> line) or the prefetcher.
We do not do like that. This is impling a slow search is'nt it?
You can see by tracing our exemple how the prefetcher will simply erase sometimes usefull lines, but its a L0, it is simple..
To unsubscribe, send an e-mail to email@example.com with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/