[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[f-cpu] New suggestion about call convention
- To: <f-cpu@seul.org>
- Subject: [f-cpu] New suggestion about call convention
- From: cedric <cedric.bail@free.fr>
- Date: Mon, 4 Nov 2002 23:34:36 +0000
- Delivered-To: archiver@seul.org
- Delivered-To: f-cpu-outgoing@seul.org
- Delivered-To: f-cpu@seul.org
- Delivery-Date: Mon, 04 Nov 2002 17:44:22 -0500
- Reply-To: f-cpu@seul.org
- Sender: owner-f-cpu@seul.org
- User-Agent: KMail/1.4.3
Hi everybody,
On the French mailing-list Antoine (<Antoine@rezo.net>) has suggested a new
idea for the call convention. At the beginning it just say that it was a
funny idea, but it could be very interresting finally.
So he suggest to specify a new register MR (mask register). Each bit in this
register specify if the corresponding register need to be saved or not before
using it. In the prologue of a function you make a "and" between the MR and a
local constant that represent which register are used, then you conditionally
load register to stack if a collision occur. Finally in the epilogue you
restore register with the same idea.
When you call a function you update mr with something like this :
mr = mr | register to preserve. Of course this mask can evolve during the
function.
If you "randomly" select which register to use (when you don't which function
call me), you have some chance that no collision occur (You have more in most
case a chance that not a full collision occur). A second possibility when you
allocated your registers is to use feedback from run-time, but each time you
compile and run, you can have some different result...
With this idea came 2 different call convention proposition :
- 15 parameters registers
- 16 temporary registers
- 26 mask saved registers
- 6 "system" registers (mr, plt, got, fp, sp, ra)
Or :
- 7 parameters registers
- 8 temporary registers
- 42 mask saved register
- 6 "system" registers
I prefer the second solution, but that's only my point of view. And perhaps
some other can be better.
Too use this mr, we need some instructions. Antoine first suggest to use a
maskload and a maskstore. This instruction will act like a storem/loadm but
with the mask technique. They will certainly look like this :
- maskload r3, [r2]
- maskstore r3, [r2]
I have found that Michael RIEPE have suggested a similar instruction in a
post ("Re: [f-cpu] Re: Floating-Point?" [15/08/2001]) but the discussion was
lost. Perhaps Michael have some other idea on how to use it, or a reason why
this instruction was lost (I don't find any reason in my archive).
With this instruction the epilogue/prologue can look like this :
; epilogue
move r0, t0
loadcons.1 0xFFFF, t0
loadcons.2 0xFFFF, t0
loadcons.3 0xFFFF, t0
and mr, t0, t1
maskstore t1, [sp]
; If we call a function we need to save/restore mr
move mr, m1
; prologue
move r0, t0
loadcons.1 0xFFFF, t0
loadcons.2 0xFFFF, t0
loadcons.3 0xFFFF, t0
and mr, t0, t1
maskload t1, [sp]
jmp ra
The value loaded in t0 correspond to the register that are used in this
function and that will trash registers.
A second possibility, proposed by Cristophe Avoinne
(<christophe.avoinne@laposte.net>) is to split maskstore/maskload in 4 chunk
like loadcons. You will have something like this for epilogue/prologue:
; epilogue
loadcons.1 0xFFFF, t0
loadcons.2 0xFFFF, t0
loadcons.3 0xFFFF, t0
and mr, t0, t1
maskstore.1 t1, [sp] ; save register from r16 to r31
maskstore.2 t1, [sp] ; from r32 to r47
maskstore.3 t1, [sp] ; I am sure that you understood the idea ;-)
; If we call a function we need to save/restore mr
move mr, m1
; prologue
loadcons.1 0xFFFF, t0
loadcons.2 0xFFFF, t0
loadcons.3 0xFFFF, t0
and mr, t0, t1
maskload.1 t1, [sp]
maskload.2 t1, [sp]
maskload.3 t1, [sp]
jmp ra
The objective of this instruction is to be less complex and perhaps more easy
to put in FC0. (Of course maskload.0 and maskstore.0 exist ;-).
Finally a last proposition, that only work on one register. It will look like
this for epilogue/prologue :
; epilogue
loadcons.1 0xFFFF, t0
loadcons.2 0xFFFF, t0
loadcons.3 0xFFFF, t0
and mr, t0, t1
rotr 16, t1, t1 ; pass first 16 registers
maskstore t1, [sp] ; save r16 if needed and rotr t1
maskstore t1, [sp] ; save r17 if needed and rotr t1
...
maskstore t1, [sp] ; I am sure that you understood the idea ;-)
; If we call a function we need to save/restore mr
move mr, m1
; prologue
loadcons.1 0xFFFF, t0
loadcons.2 0xFFFF, t0
loadcons.3 0xFFFF, t0
and mr, t0, t1
rotr 16, t1, t1 ; pass first 16 register
maskload t1, [sp]
...
maskload t1, [sp]
jmp ra
The problem of the first solution are :
- complexity
- popcount unit must not be optional
- block the CPU for 3/4 cycles (before being sure that no TLB trap append)
For the second solution :
- complexity
- popcount unit must not be optional
- block the CPU for 3/4 cycles like the first solution, but you need to use
this instruction more frequently than the previous solution, but this
solution give you the possibility to pass a chunk if not needed.
The last solution :
- stack problem (same problem as storei/loadi that need when you are change
direction to add an instruction for alignment)
- In big function you need to call it a lot
From a software point of view I prefer the first solution from Antoine, but it
can be a mess to implement it in hardware ! What are your point of view about
this and what did you think about this idea.
Not really linked with this discussion, it appear that when you only want to
load a constant that is bigger than 8 bits, but smaller than 64 bits, you
always need to do a move r0, your_constant. I think it will be a good idea to
add a loadconsz that will set all the chunk to zero before putting his
immediate value.
Sorry for this long, but I hope it could be interresting,
Cedric
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/