[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] New suggestion about call convention



Hi everybody,

	On the French mailing-list Antoine (<Antoine@rezo.net>) has suggested a new 
idea for the call convention. At the beginning it just say that it was a 
funny idea, but it could be very interresting finally.
	So he suggest to specify a new register MR (mask register). Each bit in this 
register specify if the corresponding register need to be saved or not before 
using it. In the prologue of a function you make a "and" between the MR and a 
local constant that represent which register are used, then you conditionally 
load register to stack if a collision occur. Finally in the epilogue you 
restore register with the same idea.
	When you call a function you update mr with something like this :
mr = mr | register to preserve. Of course this mask can evolve during the 
function.
	If you "randomly" select which register to use (when you don't which function 
call me), you have some chance that no collision occur (You have more in most 
case a chance that not a full collision occur). A second possibility when you 
allocated your registers is to use feedback from run-time, but each time you 
compile and run, you can have some different result...
	With this idea came 2 different call convention proposition :
		- 15 parameters registers
		- 16 temporary registers
		- 26 mask saved registers
		- 6 "system" registers (mr, plt, got, fp, sp, ra)
	Or :
		- 7 parameters registers
		- 8 temporary registers
		- 42 mask saved register
		- 6 "system" registers
	I prefer the second solution, but that's only my point of view. And perhaps 
some other can be better.

	Too use this mr, we need some instructions. Antoine first suggest to use a 
maskload and a maskstore. This instruction will act like a storem/loadm but 
with the mask technique. They will certainly look like this : 
- maskload r3, [r2]
- maskstore r3, [r2]
	I have found that Michael RIEPE have suggested a similar instruction in a 
post ("Re:  [f-cpu] Re: Floating-Point?" [15/08/2001]) but the discussion was 
lost. Perhaps Michael have some other idea on how to use it, or a reason why 
this instruction was lost (I don't find any reason in my archive).
	With this instruction the epilogue/prologue can look like this :

		; epilogue
		move r0, t0
		loadcons.1 0xFFFF, t0
		loadcons.2 0xFFFF, t0
		loadcons.3 0xFFFF, t0
		and mr, t0, t1
		maskstore t1, [sp]
		; If we call a function we need to save/restore mr
		move mr, m1

		; prologue
		move r0, t0
		loadcons.1 0xFFFF, t0
		loadcons.2 0xFFFF, t0
		loadcons.3 0xFFFF, t0
		and mr, t0, t1
		maskload t1, [sp]
		jmp ra

The value loaded in t0 correspond to the register that are used in this 
function and that will trash registers.

	A second possibility, proposed by Cristophe Avoinne 
(<christophe.avoinne@laposte.net>) is to split maskstore/maskload in 4 chunk 
like loadcons. You will have something like this for epilogue/prologue:

		; epilogue
		loadcons.1 0xFFFF, t0
		loadcons.2 0xFFFF, t0
		loadcons.3 0xFFFF, t0
		and mr, t0, t1
		maskstore.1 t1, [sp] ; save register from r16 to r31
		maskstore.2 t1, [sp] ; from r32 to r47
		maskstore.3 t1, [sp] ; I am sure that you understood the idea ;-)
		; If we call a function we need to save/restore mr
		move mr, m1

		; prologue
		loadcons.1 0xFFFF, t0
		loadcons.2 0xFFFF, t0
		loadcons.3 0xFFFF, t0
		and mr, t0, t1
		maskload.1 t1, [sp]
		maskload.2 t1, [sp]
		maskload.3 t1, [sp]
		jmp ra

	The objective of this instruction is to be less complex and perhaps more easy 
to put in FC0. (Of course maskload.0 and maskstore.0 exist ;-).

	Finally a last proposition, that only work on one register. It will look like 
this for epilogue/prologue :

		; epilogue
		loadcons.1 0xFFFF, t0
		loadcons.2 0xFFFF, t0
		loadcons.3 0xFFFF, t0
		and mr, t0, t1
		rotr 16, t1, t1 ; pass first 16 registers
		maskstore t1, [sp] ; save r16 if needed and rotr t1
		maskstore t1, [sp] ; save r17 if needed and rotr t1
		...
		maskstore t1, [sp] ; I am sure that you understood the idea ;-)
		; If we call a function we need to save/restore mr
		move mr, m1

		; prologue
		loadcons.1 0xFFFF, t0
		loadcons.2 0xFFFF, t0
		loadcons.3 0xFFFF, t0
		and mr, t0, t1
		rotr 16, t1, t1 ; pass first 16 register
		maskload t1, [sp] 
		...
		maskload t1, [sp]
		jmp ra

The problem of the first solution are :
- complexity
- popcount unit must not be optional
- block the CPU for 3/4 cycles (before being sure that no TLB trap append)

For the second solution :
- complexity
- popcount unit must not be optional
- block the CPU for 3/4 cycles like the first solution, but you need to use 
this instruction more frequently than the previous solution, but this 
solution give you the possibility to pass a chunk if not needed.

The last solution :
- stack problem (same problem as storei/loadi that need when you are change 
direction to add an instruction for alignment)
- In big function you need to call it a lot


From a software point of view I prefer the first solution from Antoine, but it 
can be a mess to implement it in hardware ! What are your point of view about 
this and what did you think about this idea.

	Not really linked with this discussion, it appear that when you only want to 
load a constant that is bigger than 8 bits, but smaller than 64 bits, you 
always need to do a move r0, your_constant. I think it will be a good idea to 
add a loadconsz that will set all the chunk to zero before putting his 
immediate value.

Sorry for this long, but I hope it could be interresting,
	Cedric


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/