[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Register set revised



hi,

devik wrote:

Hi,

sorry for opening this again. While working on other
project we discovered other possibilty of register set.
First, it would assume 2r2w set. I know some don't
like it but I think too that 3r is unnecessary
(MAC with 4 cycle latency is questionable for pipelning,
store with postincrement usually works with immediates
only, MUX is rarely used and simple to do with andn).

I modified GCC to handle split set of two 1r1w
register sets.

why not two 2r1w ?

Each binary op can use one operand
from set A and one from B.
Pointers are only in B. Calling convention places
pointers to B and others to A.

that's quite restrictive.
the goal of "RISC" computer is to reduce the coding rules
and ease both SW and HW.
By restricting pointers to certain locations,
coding SW is more complex and less flexible
for example.

I've done it in testing mode for binary ops and
stores and it seems that 70% of ops are ok.

the remaining 30% are what hurts most when you need them.
If this increases code size by 10% it's already a hit.
A more complex register set could give better results
and still avoid the problems of its size.

When we will spend more time on it I believe that
we can reach about 90%.

i hope that a 4-bank 2r1w register set can give better results.
90% efficiency is not enough. Don't forget that FC0
is a scalar CPU and increase of code size has a direct impact
on performance. This is why the instruction set is so rich now.

Mis-placed read will have 1 cycle penalty needed
for second read. Writes can be to any bank - compiler
should attempt to store results so that banks
are interleaved (whenever possible for commutative ops).
One could also disable cross-split reads and change
6 bit read register id to 5 bit (writes are still 6bit).

Such split set would be both simpler and faster. And
because of many comutative ops ("add" is most used)
it can be expected no big penalty will occur.

but it breaks a lot of expected behaviours.
For example, the register allocator has much more pressure.
The unified register set is an important feature from the SW point of
view, even though it can be implemented in more or less smart ways ...

Just idea ....

-------------------------------
Martin Devera aka devik
http://luxik.cdi.cz/~devik/


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/