[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] New suggestion about call convention
On Tue, 05 Nov 2002 00:50:27 +0100
Yann Guidon <whygee@f-cpu.org> wrote:
> hi,
>
> cedric wrote:
>
> >Hi everybody,
> >
> > On the French mailing-list Antoine (<Antoine@rezo.net>) has suggested a new
> >idea for the call convention. At the beginning it just say that it was a
> >funny idea, but it could be very interresting finally.
> >
> and in the end, it can be disapointing.
>
> > So he suggest to specify a new register MR (mask register). Each bit in this
> >register specify if the corresponding register need to be saved or not before
> >using it. In the prologue of a function you make a "and" between the MR and a
> >local constant that represent which register are used, then you conditionally
> >load register to stack if a collision occur. Finally in the epilogue you
> >restore register with the same idea.
> > When you call a function you update mr with something like this :
> >mr = mr | register to preserve. Of course this mask can evolve during the
> >function.
> > If you "randomly" select which register to use (when you don't which function
> >call me), you have some chance that no collision occur (You have more in most
> >case a chance that not a full collision occur). A second possibility when you
> >allocated your registers is to use feedback from run-time, but each time you
> >compile and run, you can have some different result...
> > With this idea came 2 different call convention proposition :
> > - 15 parameters registers
> > - 16 temporary registers
> > - 26 mask saved registers
> > - 6 "system" registers (mr, plt, got, fp, sp, ra)
> > Or :
> > - 7 parameters registers
> > - 8 temporary registers
> > - 42 mask saved register
> > - 6 "system" registers
> > I prefer the second solution, but that's only my point of view. And perhaps
> >some other can be better.
> >
> >
> one could choose to use the middle : 32 mask saved registers ?....
>
> > Too use this mr, we need some instructions. Antoine first suggest to use a
> >maskload and a maskstore. This instruction will act like a storem/loadm but
> >with the mask technique.
> >
> in fact it is more complex because for each un/masked register, it does
> both the pointer increment and the memory access. The pointer thing
> makes the whole thing even more complex that loadm/storem/whatever.
>
> <snip examples>
>
> >The problem of the first solution are :
> >- complexity
> >- popcount unit must not be optional
> >- block the CPU for 3/4 cycles (before being sure that no TLB trap append)
> >
> >
> not only that, but :
> - instruction lifelength is not static ==> more difficult to decode and
> schedule
??? I need to see a proof of that.
> - instruction cannot be interrupted in the middle
> (IRQ/whatever) ==> IRQ response time is unpredictable :-(
Like our /0 trick, the pipeline should check IRQ first. And then the following stay asynchronous.
> - it can't be pipelined (issued and then another instruction can be
> decoded)
It could. Where is the probleme ? You have to play with a contention on the register bank.
> - the read port is connected to the instruction buffer ==> it is not
> possible
> to generate the sequence of registers to be saved. And even a counter
> would
> not be ok (in order to generate the register numbers), because the
> mask can
> have holes !
You could mask hole. But then you loose cycle. I'm pretty sure that a
"sequencer generator" could be used.
>
> >For the second solution :
> >- complexity
> >- popcount unit must not be optional
> >- block the CPU for 3/4 cycles like the first solution, but you need to use
> >this instruction more frequently than the previous solution, but this
> >solution give you the possibility to pass a chunk if not needed.
> >
> >
> same remarks as before.
> it's multicycle, CISC instrtuction with most of the problems.
the biggest probleme is the connection of the read/write port that
annoyed instruction buffer but that the case of SRB, too.
>
> >The last solution :
> >- stack problem (same problem as storei/loadi that need when you are change
> >direction to add an instruction for alignment)
> >- In big function you need to call it a lot
> >
> >
> there is also :
> - it is more complex and heavy than a classical store/load with
> post-incrementation with the same result (except that it is conditionnal)
> - there are not enough register set ports to allow all the writes at
> the same time.
> Since this kind of instructions is used in bursts, it's a big problem
> because
> the differing latencies can't be hidden by other instructions that
> don't use all the
> write ports.
>
> >>From a software point of view I prefer the first solution from Antoine, but it
> >can be a mess to implement it in hardware !
> >
> it is.
>
> > What are your point of view about
> >this and what did you think about this idea.
> >
> >
> I am completely against this idea in F-CPU up to v.1 and in FC0, where
> the pipeline
> is not adapted at all for these kinds of CISC gymnastics.
>
> BTW, there was also the evocation of using SRB for doing the
> backup/restore, but
> there are a lot of problems as well, do you'd better not think of it.
>
> > Not really linked with this discussion, it appear that when you only want to
> >load a constant that is bigger than 8 bits, but smaller than 64 bits, you
> >always need to do a move r0, your_constant. I think it will be a good idea to
> >add a loadconsz that will set all the chunk to zero before putting his
> >immediate value.
> >
> >
> Here is what i remember :
> loadcons doesn't sign extend the constant.
> loadconsx does it.
>
> >Sorry for this long, but I hope it could be interresting,
> > Cedric
> >
> well, at last it made it to this list.
>
> YG
Maybe the idea of Michael is better (SW). It's okay if the linker could really do the job. Otherwise...
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/