[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] New suggestion about call convention



Hi,

> >	With this idea came 2 different call convention proposition :
> >		- 15 parameters registers
> >		- 16 temporary registers
> >		- 26 mask saved registers
> >		- 6 "system" registers (mr, plt, got, fp, sp, ra)
> >	Or :
> >		- 7 parameters registers
> >		- 8 temporary registers
> >		- 42 mask saved register
> >		- 6 "system" registers
> >	I prefer the second solution, but that's only my point of view. And
> > perhaps some other can be better.

> one could choose to use the middle : 32 mask saved registers ?....

The problem of  32 mask saved registers is : where did you start ? and where
did you end ? Don't forget that you load 16 bits constants.
 
> >The problem of the first solution are :
> >- complexity
> >- popcount unit must not be optional
> >- block the CPU for 3/4 cycles (before being sure that no TLB trap
> > append)

> not only that, but :
>  - instruction lifelength is not static ==> more difficult to decode and
> schedule

Same problem as SRB for scheduling. But for decode I don't understant the
problem.

>  - instruction cannot be interrupted in the middle
>      (IRQ/whatever) ==> IRQ response time is unpredictable :-(

Like SRB, and I don't see the problem. If you start IRQ now or in 64 cycles,
when you run at 100 Mhz, you will loose only 0,00000064 s...

>  - it can't be pipelined (issued and then another instruction can be 
> decoded)

If you mean that you must first finish maskstore before starting maskload or
maskstore it's correct. If you mean that no other instruction can start before
the end of masksotre/load that's false, because you can check TLB for each
bound before memory operation.

>  - the read port is connected to the instruction buffer ==> it is not 
> possible to generate the sequence of registers to be saved. And even a
> counter would not be ok (in order to generate the register numbers), because
> the mask can have holes !

I think that you want to implement it with a jump over zero or something like
that. Why isn't it possible to start from first register to the end and check
if the corresponding bit is set or not ? (You certainly need to add a bit per
register and make a "or" with, I don't remember, the dirty bit).

> >For the second solution :
> >- complexity
> >- popcount unit must not be optional
> >- block the CPU for 3/4 cycles like the first solution, but you need to
> > use this instruction more frequently than the previous solution, but this
> >solution give you the possibility to pass a chunk if not needed.

> same remarks as before.
> it's multicycle, CISC instrtuction with most of the problems.
 
> >The last solution :
> >- stack problem (same problem as storei/loadi that need when you are
> > change direction to add an instruction for alignment)
> >- In big function you need to call it a lot

> there is also :
> - it is more complex and heavy than a classical store/load with 
> post-incrementation with the same result (except that it is conditionnal)

Yes, but not so much (only add a rotation in parallele).

>  - there are not enough register set ports to allow all the writes at 
> the same time.

I forgot to count for maskload, sorry.

> Since this kind of instructions is used in bursts, it's a big problem
> because the differing latencies can't be hidden by other instructions that 
> don't use all the write ports.

I was thinking that the idea behind this instruction is the possibility to
schedule it with the rest of the function code.

> > What are your point of view about 
> >this and what did you think about this idea.

> I am completely against this idea in F-CPU up to v.1 and in FC0, where
> the pipeline is not adapted at all for these kinds of CISC gymnastics.
 
> BTW, there was also the evocation of using SRB for doing the backup/restore, 
> but there are a lot of problems as well, do you'd better not think of it.
 
> >	Not really linked with this discussion, it appear that when you only
> > want to load a constant that is bigger than 8 bits, but smaller than 64 
> > bits, you always need to do a move r0, your_constant. I think it will be a
> > good idea to add a loadconsz that will set all the chunk to zero before
> > putting his immediate value.

> Here is what i remember :
> loadcons doesn't sign extend the constant.
> loadconsx does it.

Yes, but it doesn't set the lower bit, imagine : you want to put 0xFFFF0000,
you need to do :
  move r0, t0
  loadcons.1 0xFFFF, t0

or from the discussion about bypassing I understood that setting all the other
bit to zero will be possible. And loadconsx doesn't exist anymore because we
have widen (I find an other error in loadcons into the manual, of course is
opcode is OP_LOADCONS).

> well, at last it made it to this list.
long mail take long time to write ;-)

Cedric
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/