[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[f-cpu] the end of the Smooth Register backup issue ?

hi again,

Christophe Avoinne wrote:

No no ! you people are mixing wrong things...

I don't think it's the same as for ARM with some special registers mapped.


The main trouble is that having not a dedicated register for stack, we
cannot design interrupt and trap gates for F-CPU (that is, saving caller
return address in stack). So we are all kept in a struggle to find the best
way to accomodate CMB into a fast switching. But is it really worth ? I mean
is there any real situation where a handler really need to use all the 64
registers ? Im' quite doubtful...

i do not think that the absence of a stack is such a problem
because a stack brings its own issues.

May I also refresh the common memory ?
Year ago, it was found that the best way to accomodate the
lack of SRB in the early days is to create a set of "scratch SRs".
They are located outside the datapath and use the normal get/put
instruction, so there is no architectural impact.

These scratch registers are used in the entry of  "fast" handlers
to save the first few vital registers : the absence of stack
does not let us "push" registers, so we can do it in fixed
locations in the SR space.

So when a trap/IRQ/exception/whatever occcurs AND no SRB
is available/possible/anti-orthodox/whatever, the handling code
simply start with
 put SR_SCRATCH_1, r1
 put SR_SCRATCH_2, r2
 put SR_SCRATCH_3, r3
 put SR_SCRATCH_4, r4
 loadcons pointer_constant, r1

there must also be an additional mechanism to store the
PC in another SR.

i repeat : there is *no* need of register banks and "fast" switches.
Overall, the first latency problem comes from the availability of the
handler's code and the OOOC-flushing. Hence, considering
L1 cache misses, and accounting the fact that the handler is
running without address translation (hence avoiding any TLB miss),
the trap/exception/IRQ/...  latency is roughly 10 cycles in a normal
system (and probably around 5 cycles in a dedicated system).
Latency is the famous cost of performance and security
(caching and protection mechanisms have obvious side effects).

A 1-cycle register bank switch is thus not justified
because it doesn't help smash this fixed latency.

I hope this post will help calm the debate :-)



To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/