[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Smooth Register backup issues...


Beat Steiner wrote:

My idea was to shadow the registers on an interrupt/context change
(one-way data path)
and do a SRB from the shadow. The new thread won't have to wait
for the SRB to partially complete.
the "old" SRB just has to wait for the IRQ handling code to become

Locking mechanism might be simplified.
SRB only accesses shadow registers. Restore could be made as before.
You forgot something .....

FC0 is an "out of order completion" core !

this means that you'll have to wait for the WHOLE
pipeline to flush (and it means : "complete" because flushing is not an option)
before starting a "safe" miroring of the register set.

OTOH, the "old" SRB does not have to wait for a loooong
non-blocking operation to complete (like : integer division),
but only for the handler's code.

example SRB as before:
did not touch r5
r1 := bla bla
*** context change ***
* SRB starts and exec new thread starts
r5 := bla bla (immediately executed because r5 not locked by SRB -- great idea)
r1:= bla bla (having to wait for SRB of r1 to complete if no shadow copy is made)
hey ! you forget another detail here :-P

do you remember the register set ?
it has 3 read ports, including the register number corresponding
to the "write" destination.

What does that mean ?

when you do the 2nd r1:=something in the
handler, you actually _save_ it at the same time
because the _old_ value of R1 is available on the Xbar
the LSU can save it.

example SRB with shadow:
did not touch r5
r1 := bla bla
*** context change ***
* wait for "r1 := bla bla" to complete while starting to fetch new instruction stream
* Shadow registers
* SRB starts while exec new thread starts
r5 := bla bla (might start later because we waited above, but only if instructions already queued)
r1 := bla bla (not waiting for SRB of r1 to complete because SRB runs from shadow)
no, we're even on that one.

We loose litle (or nothing if fetching new inscruction stream takes as long
as "r1 := bla bla" to complete) and win much, and we pay for 50% of the
register transistors sitting idle 99.99% of the time as stated by YG;-).
Makes only sense if SRB is significantly simplified by shadowing.
but you can't beat years of mailing-list flamewars ;-P

you have to take speed into account. Not only execution speed but
raw clock speed. A 2x larger register set can drop the frequency
more than 2x. 64 registers is the most we can reasonably do.
I have done much work to avoid useless data caching and miroring
because managing the duplicated data increases the core's complexity
and hence the speed.

so what's the point of switching processes/threads/IRQs/whatever
in 1 cycle if the cycle frequency is much reduced ?

the latest estimations say that FC0's fastest frequency is around 300MHz
using current silicon etching technologies, because of the size of the
register set. We can't add anything to the register set array or FC0 will
not be interesting at all.Competing CPUs of this class (like embedded
64-bit MIPS from IDT for example) already achieve a similar performance
at a ridiculous cost.

Sorry for stirrung up the list with already-discussed topics. Maybe I better shut up for a while and
read the many manuals.
well, we better update the manual and the website as well.
but i'm not a student anymore and i can't spend whole weeks
as before on that matter :-(

For getting the thing first running on a chicken/egg (=FPGA), it's better to save such ideas for later.
no, the first thing to solve is the code development process,
including 200% testabiliy from the source code and 300% portability garanties.
Then, even if coding will be 10x times more painful than a "quick and dirty hack",
the code will be stable and useful for everybody.
Hence the word "freedom" in the project's name.


To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/