[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Prefetcher (instruction cache L0) our first draft

On Mon, Jan 12, 2004 at 04:35:06PM +0100, Pierre Tardy wrote:
> Attached the modifications of fctools0.3's emu, by tired.

It's not really meant for such things, but what the heck...

I'd just ask you to not re-indent my code.  It's too hard to track
changes that way.

> We have added an instruction cache L0. As it is not well defined in the
> handbook, we are requesting comments about this code.

I noticed that you prefetch individual instructions, one per
clock cycle.  The real `fetcher' unit should work more efficiently,
prefetching whole cache lines (256 bits = 8 instructions) at once.
On the other hand, it will hold less lines (4...8 should be a
reasonable value).

The algorithm isn't fully defined yet, but the basic outline is
something like this:

	Let one fetcher line be the "current" line.  This is the line
	the next instruction will be fetched from.  Every line shall
	have an associated address which is the address of the first
	instruction contained in the line (that is, the address is a
	multiple of 32).  If all instructions from the current line
	have been fetched, the fetcher will switch to the "next"
	line (which should have been prefetched while the current
	line was executed).  That is, the "next" line becomes the
	new current line.

	Any fetcher line can be in one of at least three different states:

		1 - the line is invalid
		2 - the line is being prefetched but not yet valid
		3 - the line is valid

	In case the current line is not valid, let the CPU stall until
	it is.	If the line is in `invalid' state, start prefetching
	and proceed to state 2.

	Whenever the current line is "switched" as outlined above,
	let the fetcher take the associated address of the new current
	line, add 32 to it (that's not really an add but a "shifted"
	increment operation) and start prefetching the (new) next line
	at the calculated address.  If there were only these two lines,
	they would work just like double or "tandem" buffers -- read
	from one of them while the other is filled in the background.

	When "loadaddr[i]" is executed, take the target address, mask
	off the least significant 5 bits, and start prefetching at the
	resulting address (if the corresponding line isn't already
	being prefetched or even valid).  In either case, associate
	the register number with the corresponding fetcher line.

	When a jump instruction is executed (and the jump is taken),
	instructions from the target address may reside in the current
	line or another (or none at all, which will cause a stall).
	In the second and third case, switch to the new current line
	and start prefetching the new "next" line as outlined above.
	In the first case, simply continue.

	If the return address is stored in a register (3-operand
	form), associate the register number with the line the next
	instruction would have been fetched from if the jump had not
	been taken.  This will be either the old current line (which
	is already loaded) or the old next line (which should already
	be prefetched), so there is no need to start another prefetch
	operation if the CPU (or the emulator) is in a sane state.

	If the return address is not stored, and the target address
	does not reside in the current or next line, the fetcher may
	(but need not) stop prefetching the old "next" line and/or
	invalidate it.

	If the jump is NOT taken, there's no need to do anything.

	Whenever a register is overwritten (note: this applies to ALL
	instructions!), break the association between the register
	number and the corresponding fetcher line.  If the line is
	no longer associated with any register afterwards, it may
	(but need not) be invalidated.	Note that an instruction
	may modify more than one register, so it may be necessary to
	invalidate several associations at once.  On the other hand,
	it is impossible that any register is associated with more
	than a single fetcher line (because it can hold only one
	address at any time).

	From a virtual point of view, the current line is always
	associated with the instruction pointer (PC), and the "next"
	line is associated with some nameless register inside the
	prefetcher.  These lines must never be invalidated.

	There are also special events to consider, e.g. instructions
	like `jump r1,r1' must be correctly handled.  Another question
	is whether it makes sense to start another prefetch if a
	constant is added to or subtracted from an "associated"
	pointer.  It may speed up "calculated jumps", but it seems
	to be pretty useless in other cases.

	If the fetcher is "full" -- that is, all lines are in use --,
	invalidate and overwrite the least recently used (LRU) line
	that is NOT associated with the instruction pointer (current
	line) or the prefetcher.

Yann, did I miss anything?
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/