[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Smooth Register backup issues...


Martin Vahi wrote:

Or ARM. That's called swadow register. The problem with SRB is :
- how do you handle nested interrupt ?

this is exactly the same problem with shadow registers and hardware stacks.
same problem, same solution.

- how do you "allocate" a new CMB ?

through the OS. In a device driver,
when you install an IRQ handler,
you allocate a CMB if your IRQ routine is interruptibe.
this case has been already discussed for a long while.

- If saving is automatique even on very light it handler you must save all
of the register.

This case can be solved with the help of the hidden "dirty" flags
which indicate whether a register has been modified since SRB has
been started.

You also seem to infer that the backup does not let handler's code run.
in fact, SRB is designed so that the IRQ handler can start servicing hardware
request as soon as the instructions are available. In a "normal" system,
fetching the new code will already take 10 or 20 clock cycles (considering
that misses etc). By that time, SRB will have saved 10 to 20 registers,
that is, roughly one third of the register set. Now, if the IRQ handler
has to "speak" with I/O, this takes several cycles each time the
peripheral is accessed so SRB will continue without problem.

Now, in case of cached code for handling page miss, for example,
the hidden "dirty" flag will record what register has been saved or not
and what register belongs to which "thread". Since TLB miss handlers
are not interruptible and "terminal", there is no worry about allocating an
additional CMB. So SRB can then be interrupted (the only case where
it is possible) to restore the faulty thread as soon as the TLB is updated.

 But would a "shadow stack" be a bad idea? Like, instead of
one shadow-register, there would be, let's say, 1000-level deep stack??
OK, it takes up some mount of silicon or some quantity of gates in the
FPGA case, but it should work somehow...

it "works" for such CPU as SPARC or NIOS.
however, consider this fact : we have 64 registers in the main bank.
do the math : how many bits of data are needed to store 8 nested IRQs ?
64*64*8, or 4K bytes ! it's as large as a cache memory, and indeed it
IS a sort of cache memory without all the benefits !

the main register bank is already a huge piece of HW and there was a lot
of concerns about this, so adding more registers that would not be directly
accessible by the user application is like a waste of silicon area.

speed ? you want thread switch speed ? What's the use of switching
a whole register set in 1cycles when it takes a MINIMUM of 30 cycles
to overwrite the register set ? FC0 only issues 1 instruction per cycle
and can AT MOST write 2 data to the register set. 63/2=32 cycles.

This time can be better used by flushing directly the register set to the
existing cache memory. This is what SRB does with few hardware cost
and almost no software interference.

And after the stack get's filled,
it's being automatically saved(and later retreived) to(from) some,
specially reserved region of the system's memory.

automatically ?
why wait ?
the stack underflow and overflow exceptions of the SPARC architecture
have been a concern for a long while. Now it's somehow modified but
the early implementations (AFAIK) did the stack flush "by hand".

All the housekeeping
would be the job for the memory management unit.
After all, the newest CPU's do have onboard
caches with sizes reaching atleast to 1MB.

that's not the kind of target that we can reach.
in FPGA or ASIC, we don't have "generous donators"
that can offer us such a football space. Of course that would
be cool but real implementations for FC0 are more in the
16K cache domain, because the chip would be much cheaper.
Of course, if you have a FPGA, you can always tweak the
cache size. But don't count on huge caches.

In the VHDL-code,
the ctack size can be stored in a variable, so, if anybody wants, he/she
set's it to 1, 10, 100, or what ever her/his hardware can afford.

that's what the NIOS does (AFAIK) but it's only a 16-bit or 32-bit CPU.

I'm a newby chip design's point of view, so, I hope You don't mind my
huge technical mistakes and I hope that my mistakes are getting

well, it's ok, if at least the same questions didn't pop up every year again ;-)
and the same old "debate" goes on every year too.

Martin Vahi


To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/