[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Smooth Register backup issues...



Michael Riepe wrote:

On Thu, Nov 13, 2003 at 10:47:53PM +0100, Beat Steiner wrote:

Some thoughts about SRB:

Expensive approach: On initiation of SRB, all registers are backed
up to a mirror set of registers. Every register is directly connected
to its mirror partner (makes 64x63 connections!). Mirroring can
take place in one clock cycle this way (much like a snapshot backup
of a journaling filesystem). Writing out the backup to the RAM is
performed as described in the SRB section of the manual.

Cheap approach: The compiler shuffles up registers in a random way,
reducing the probability that r1 has always to be backed up and is the
first register used by the new context (programmers tend to use r1
first).

r1...r15 hold the arguments of a function (or system call).  That is,
at least some of them will be in use most of the time.  Well-optimized
code should use as many registers as possible, so most other registers
will be busy as well (and unused argument registers can be reclaimed
and used as temporaries).

I prefer a double-buffering approach: The "shadow" register is loaded
from the new ("incoming") CMB, while the "foreground" register is
saved to the old ("outgoing") CMB.  As soon as the shadow is loaded,
register and shadow can be swapped (by flipping an address or enable
bit).  That way, a direct connection between a register and its
shadow counterpart is not required since you never have to transfer
data between them.  In addition to that, it enables you to process
interrupts very quickly -- just swap the registers on entry and exit
of the interrupt service routine.

Did anybody say "Zilog Z80"? ;)

If we implement this swap story, the crossbar will definitely explode
or every register having one mux more to pass.
My idea was to shadow the registers on an interrupt/context change
(one-way data path)
and do a SRB from the shadow. The new thread won't have to wait
for the SRB to partially complete. Locking mechanism might be simplified.
SRB only accesses shadow registers. Restore could be made as before.

example SRB as before:
did not touch r5
r1 := bla bla
*** context change ***
* SRB starts and exec new thread starts
r5 := bla bla (immediately executed because r5 not locked by SRB -- great idea)
r1:= bla bla (having to wait for SRB of r1 to complete if no shadow copy is made)

example SRB with shadow:
did not touch r5
r1 := bla bla
*** context change ***
* wait for "r1 := bla bla" to complete while starting to fetch new instruction stream
* Shadow registers
* SRB starts while exec new thread starts
r5 := bla bla (might start later because we waited above, but only if instructions already queued)
r1 := bla bla (not waiting for SRB of r1 to complete because SRB runs from shadow)

We loose litle (or nothing if fetching new inscruction stream takes as long
as "r1 := bla bla" to complete) and win much, and we pay for 50% of the
register transistors sitting idle 99.99% of the time as stated by YG;-).
Makes only sense if SRB is significantly simplified by shadowing.

BTW: How many context switches do we expect per second?

Sorry for stirrung up the list with already-discussed topics. Maybe I better shut up for a while and
read the many manuals.

For getting the thing first running on a chicken/egg (=FPGA), it's better to save such ideas for later.


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/