[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] New suggestion about call convention

On Tue, 05 Nov 2002 00:50:27 +0100
Yann Guidon <whygee@f-cpu.org> wrote:

> hi,
> cedric wrote:
> >Hi everybody,
> >
> >	On the French mailing-list Antoine (<Antoine@rezo.net>) has suggested a new 
> >idea for the call convention. At the beginning it just say that it was a 
> >funny idea, but it could be very interresting finally.
> >
> and in the end, it can be disapointing.
> >	So he suggest to specify a new register MR (mask register). Each bit in this 
> >register specify if the corresponding register need to be saved or not before 
> >using it. In the prologue of a function you make a "and" between the MR and a 
> >local constant that represent which register are used, then you conditionally 
> >load register to stack if a collision occur. Finally in the epilogue you 
> >restore register with the same idea.
> >	When you call a function you update mr with something like this :
> >mr = mr | register to preserve. Of course this mask can evolve during the 
> >function.
> >	If you "randomly" select which register to use (when you don't which function 
> >call me), you have some chance that no collision occur (You have more in most 
> >case a chance that not a full collision occur). A second possibility when you 
> >allocated your registers is to use feedback from run-time, but each time you 
> >compile and run, you can have some different result...
> >	With this idea came 2 different call convention proposition :
> >		- 15 parameters registers
> >		- 16 temporary registers
> >		- 26 mask saved registers
> >		- 6 "system" registers (mr, plt, got, fp, sp, ra)
> >	Or :
> >		- 7 parameters registers
> >		- 8 temporary registers
> >		- 42 mask saved register
> >		- 6 "system" registers
> >	I prefer the second solution, but that's only my point of view. And perhaps 
> >some other can be better.
> >  
> >
> one could choose to use the middle : 32 mask saved registers ?....
> >	Too use this mr, we need some instructions. Antoine first suggest to use a 
> >maskload and a maskstore. This instruction will act like a storem/loadm but 
> >with the mask technique.
> >
> in fact it is more complex because for each un/masked register, it does
> both the pointer increment and the memory access. The pointer thing
> makes the whole thing even more complex that loadm/storem/whatever.
> <snip examples>
> >The problem of the first solution are :
> >- complexity
> >- popcount unit must not be optional
> >- block the CPU for 3/4 cycles (before being sure that no TLB trap append)
> >  
> >
> not only that, but :
>  - instruction lifelength is not static ==> more difficult to decode and 
> schedule

??? I need to see a proof of that.

>  - instruction cannot be interrupted in the middle
>      (IRQ/whatever) ==> IRQ response time is unpredictable :-(

Like our /0 trick, the pipeline should check IRQ first. And then the following stay asynchronous.

>  - it can't be pipelined (issued and then another instruction can be 
> decoded)

It could. Where is the probleme ? You have to play with a contention on the register bank.

>  - the read port is connected to the instruction buffer ==> it is not 
> possible
>    to generate the sequence of registers to be saved. And even a counter 
> would
>    not be ok (in order to generate the register numbers), because the 
> mask can
>    have holes !

You could mask hole. But then you loose cycle. I'm pretty sure that a
"sequencer generator" could be used.

> >For the second solution :
> >- complexity
> >- popcount unit must not be optional
> >- block the CPU for 3/4 cycles like the first solution, but you need to use 
> >this instruction more frequently than the previous solution, but this 
> >solution give you the possibility to pass a chunk if not needed.
> >  
> >
> same remarks as before.
> it's multicycle, CISC instrtuction with most of the problems.

the biggest probleme is the connection of the read/write port that
annoyed instruction buffer but that the case of SRB, too.

> >The last solution :
> >- stack problem (same problem as storei/loadi that need when you are change 
> >direction to add an instruction for alignment)
> >- In big function you need to call it a lot
> >  
> >
> there is also :
> - it is more complex and heavy than a classical store/load with 
> post-incrementation with the same result (except that it is conditionnal)
>  - there are not enough register set ports to allow all the writes at 
> the same time.
>    Since this kind of instructions is used in bursts, it's a big problem 
> because
>   the differing latencies can't be hidden by other instructions that 
> don't use all the
>    write ports.
> >>From a software point of view I prefer the first solution from Antoine, but it 
> >can be a mess to implement it in hardware !
> >
> it is.
> > What are your point of view about 
> >this and what did you think about this idea.
> >  
> >
> I am completely against this idea in F-CPU up to v.1 and in FC0, where 
> the pipeline
> is not adapted at all for these kinds of CISC gymnastics.
> BTW, there was also the evocation of using SRB for doing the 
> backup/restore, but
> there are a lot of problems as well, do you'd better not think of it.
> >	Not really linked with this discussion, it appear that when you only want to 
> >load a constant that is bigger than 8 bits, but smaller than 64 bits, you 
> >always need to do a move r0, your_constant. I think it will be a good idea to 
> >add a loadconsz that will set all the chunk to zero before putting his 
> >immediate value.
> >  
> >
> Here is what i remember :
> loadcons doesn't sign extend the constant.
> loadconsx does it.
> >Sorry for this long, but I hope it could be interresting,
> >	Cedric
> >
> well, at last it made it to this list.
> YG 

Maybe the idea of Michael is better (SW). It's okay if the linker could really do the job. Otherwise...
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/