[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Prefetcher (instruction cache L0) our first draft

To: f-cpu@seul.org
Subject: Re: [f-cpu] Prefetcher (instruction cache L0) our first draft
From: Pierre Tardy <tardyp@free.fr>
Date: Mon, 12 Jan 2004 22:46:10 +0100
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Mon, 12 Jan 2004 16:46:34 -0500
In-reply-to: <20040112211705.35416@thrai.stud.uni-hannover.de>
References: <1073921706.4002beaa32c0e@imp2-a.free.fr><20040112211705.35416@thrai.stud.uni-hannover.de>
Reply-to: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

On Mon, 12 Jan 2004 21:17:05 +0100
Michael Riepe <michael+fcpu@stud.uni-hannover.de> wrote:

> On Mon, Jan 12, 2004 at 04:35:06PM +0100, Pierre Tardy wrote:
> > Attached the modifications of fctools0.3's emu, by tired.
> 
> It's not really meant for such things, but what the heck...
> 
> I'd just ask you to not re-indent my code.  It's too hard to track
> changes that way.
arg sorry, it is a reflex.. ESC-C-q..

diff -E is your friend..

> 
> > We have added an instruction cache L0. As it is not well defined in the
> > handbook, we are requesting comments about this code.
> 
> I noticed that you prefetch individual instructions, one per
> clock cycle.  The real `fetcher' unit should work more efficiently,
> prefetching whole cache lines (256 bits = 8 instructions) at once.
Yes, it is a good idea.
> On the other hand, it will hold less lines (4...8 should be a
> reasonable value).
64 is aproximatively the size we choose. It is of course #defined

see the 'p' command of the debugger to have a asci art representation of the cache.

> 
> The algorithm isn't fully defined yet, but the basic outline is
> something like this:
> 
> 	Let one fetcher line be the "current" line.  This is the line
	"current" line is PrefetchPC.

> 	the next instruction will be fetched from.  Every line shall
> 	have an associated address which is the address of the first
> 	instruction contained in the line (that is, the address is a
> 	multiple of 32).  If all instructions from the current line
> 	have been fetched, the fetcher will switch to the "next"
> 	line (which should have been prefetched while the current
> 	line was executed).  That is, the "next" line becomes the
> 	new current line.
Ok, that is our behavior.
> 
> 	Any fetcher line can be in one of at least three different states:
> 
> 		1 - the line is invalid
> 		2 - the line is being prefetched but not yet valid
> 		3 - the line is valid
> 
> 	In case the current line is not valid, let the CPU stall until
> 	it is.	If the line is in `invalid' state, start prefetching
> 	and proceed to state 2.
I do not need the state 2 in my implementation.

> 
> 	Whenever the current line is "switched" as outlined above,
> 	let the fetcher take the associated address of the new current
> 	line, add 32 to it (that's not really an add but a "shifted"
> 	increment operation) and start prefetching the (new) next line
> 	at the calculated address.  If there were only these two lines,
> 	they would work just like double or "tandem" buffers -- read
> 	from one of them while the other is filled in the background.
ok.
> 
> 	When "loadaddr[i]" is executed, take the target address, mask
> 	off the least significant 5 bits, and start prefetching at the
> 	resulting address (if the corresponding line isn't already
> 	being prefetched or even valid).  In either case, associate
> 	the register number with the corresponding fetcher line.
ok, but in the case of the 3 embraced loops, we have seen that the prefetcher stops prefetching the line immediatly, as the current stream is not yet in cache.

> 
> 	When a jump instruction is executed (and the jump is taken),
> 	instructions from the target address may reside in the current
> 	line or another (or none at all, which will cause a stall).
> 	In the second and third case, switch to the new current line
> 	and start prefetching the new "next" line as outlined above.
> 	In the first case, simply continue.
ok.
> 
> 	If the return address is stored in a register (3-operand
> 	form), associate the register number with the line the next
> 	instruction would have been fetched from if the jump had not
> 	been taken.  This will be either the old current line (which
> 	is already loaded) or the old next line (which should already
> 	be prefetched), so there is no need to start another prefetch
> 	operation if the CPU (or the emulator) is in a sane state.
We have not taken in account the return register. [TODO].

> 
> 	If the return address is not stored, and the target address
> 	does not reside in the current or next line, the fetcher may
> 	(but need not) stop prefetching the old "next" line and/or
> 	invalidate it.
ok.
> 
> 	If the jump is NOT taken, there's no need to do anything.
yes.

> 
> 	Whenever a register is overwritten (note: this applies to ALL
> 	instructions!), break the association between the register
> 	number and the corresponding fetcher line.  If the line is
> 	no longer associated with any register afterwards, it may
> 	(but need not) be invalidated.	Note that an instruction
> 	may modify more than one register, so it may be necessary to
> 	invalidate several associations at once.  On the other hand,
> 	it is impossible that any register is associated with more
> 	than a single fetcher line (because it can hold only one
> 	address at any time).
ok this is done by putting 0xff in the register's line pointer.

> 	From a virtual point of view, the current line is always
> 	associated with the instruction pointer (PC), and the "next"
> 	line is associated with some nameless register inside the
> 	prefetcher.  These lines must never be invalidated.
> 
> 	There are also special events to consider, e.g. instructions
> 	like `jump r1,r1' must be correctly handled.
What do jump r1,r1? Jump then invalidation?

>  Another question
> 	is whether it makes sense to start another prefetch if a
> 	constant is added to or subtracted from an "associated"
> 	pointer.  It may speed up "calculated jumps", but it seems
> 	to be pretty useless in other cases.
yes.

> 
> 	If the fetcher is "full" -- that is, all lines are in use --,
> 	invalidate and overwrite the least recently used (LRU) line
> 	that is NOT associated with the instruction pointer (current
> 	line) or the prefetcher.
We do not do like that. This is impling a slow search is'nt it?
You can see by tracing our exemple how the prefetcher will simply erase sometimes usefull lines, but its a L0, it is simple..

-- 
Pierre
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: [f-cpu] Prefetcher (instruction cache L0) our first draft
  - From: Michael Riepe <michael+fcpu@stud.uni-hannover.de>

References:
- [f-cpu] Prefetcher (instruction cache L0) our first draft
  - From: Pierre Tardy <tardyp@free.fr>
- Re: [f-cpu] Prefetcher (instruction cache L0) our first draft
  - From: Michael Riepe <michael+fcpu@stud.uni-hannover.de>

Prev by Author: [f-cpu] Prefetcher (instruction cache L0) our first draft
Next by Author: Re: [f-cpu] about register mapping
Previous by thread: Re: [f-cpu] Prefetcher (instruction cache L0) our first draft
Next by thread: Re: [f-cpu] Prefetcher (instruction cache L0) our first draft
Index(es):
- Author
- Thread