[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] New suggestion about call convention



hi,

cedric wrote:

>Hi everybody,
>
>	On the French mailing-list Antoine (<Antoine@rezo.net>) has suggested a new 
>idea for the call convention. At the beginning it just say that it was a 
>funny idea, but it could be very interresting finally.
>
and in the end, it can be disapointing.

>	So he suggest to specify a new register MR (mask register). Each bit in this 
>register specify if the corresponding register need to be saved or not before 
>using it. In the prologue of a function you make a "and" between the MR and a 
>local constant that represent which register are used, then you conditionally 
>load register to stack if a collision occur. Finally in the epilogue you 
>restore register with the same idea.
>	When you call a function you update mr with something like this :
>mr = mr | register to preserve. Of course this mask can evolve during the 
>function.
>	If you "randomly" select which register to use (when you don't which function 
>call me), you have some chance that no collision occur (You have more in most 
>case a chance that not a full collision occur). A second possibility when you 
>allocated your registers is to use feedback from run-time, but each time you 
>compile and run, you can have some different result...
>	With this idea came 2 different call convention proposition :
>		- 15 parameters registers
>		- 16 temporary registers
>		- 26 mask saved registers
>		- 6 "system" registers (mr, plt, got, fp, sp, ra)
>	Or :
>		- 7 parameters registers
>		- 8 temporary registers
>		- 42 mask saved register
>		- 6 "system" registers
>	I prefer the second solution, but that's only my point of view. And perhaps 
>some other can be better.
>  
>
one could choose to use the middle : 32 mask saved registers ?....

>	Too use this mr, we need some instructions. Antoine first suggest to use a 
>maskload and a maskstore. This instruction will act like a storem/loadm but 
>with the mask technique.
>
in fact it is more complex because for each un/masked register, it does
both the pointer increment and the memory access. The pointer thing
makes the whole thing even more complex that loadm/storem/whatever.

<snip examples>

>The problem of the first solution are :
>- complexity
>- popcount unit must not be optional
>- block the CPU for 3/4 cycles (before being sure that no TLB trap append)
>  
>
not only that, but :
 - instruction lifelength is not static ==> more difficult to decode and 
schedule
 - instruction cannot be interrupted in the middle
     (IRQ/whatever) ==> IRQ response time is unpredictable :-(
 - it can't be pipelined (issued and then another instruction can be 
decoded)
 - the read port is connected to the instruction buffer ==> it is not 
possible
   to generate the sequence of registers to be saved. And even a counter 
would
   not be ok (in order to generate the register numbers), because the 
mask can
   have holes !

>For the second solution :
>- complexity
>- popcount unit must not be optional
>- block the CPU for 3/4 cycles like the first solution, but you need to use 
>this instruction more frequently than the previous solution, but this 
>solution give you the possibility to pass a chunk if not needed.
>  
>
same remarks as before.
it's multicycle, CISC instrtuction with most of the problems.

>The last solution :
>- stack problem (same problem as storei/loadi that need when you are change 
>direction to add an instruction for alignment)
>- In big function you need to call it a lot
>  
>
there is also :
- it is more complex and heavy than a classical store/load with 
post-incrementation
with the same result (except that it is conditionnal)
 - there are not enough register set ports to allow all the writes at 
the same time.
   Since this kind of instructions is used in bursts, it's a big problem 
because
  the differing latencies can't be hidden by other instructions that 
don't use all the
   write ports.

>>From a software point of view I prefer the first solution from Antoine, but it 
>can be a mess to implement it in hardware !
>
it is.

> What are your point of view about 
>this and what did you think about this idea.
>  
>
I am completely against this idea in F-CPU up to v.1 and in FC0, where 
the pipeline
is not adapted at all for these kinds of CISC gymnastics.

BTW, there was also the evocation of using SRB for doing the 
backup/restore, but
there are a lot of problems as well, do you'd better not think of it.

>	Not really linked with this discussion, it appear that when you only want to 
>load a constant that is bigger than 8 bits, but smaller than 64 bits, you 
>always need to do a move r0, your_constant. I think it will be a good idea to 
>add a loadconsz that will set all the chunk to zero before putting his 
>immediate value.
>  
>
Here is what i remember :
loadcons doesn't sign extend the constant.
loadconsx does it.

>Sorry for this long, but I hope it could be interresting,
>	Cedric
>
well, at last it made it to this list.

YG



*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/