[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] registers



hmmmmm maybe i hit "send" too quickly.

Yann Guidon wrote:

> hi !
>
> Christophe Avoinne wrote:
>
>>> Mohamed Ali Kilani wrote:
>>>
>>>> Hi Michael, All,
>>>>      
>>>
>>>> <snip> 
>>>
>> for 64-bit architecture :
>>
>> add.8 R1,R2,R3 -> bit SIMD = 0 => R3[7..0] = R1[7..0] + R2[7..0]
>
>> add.16 R1,R2,R3 -> bit SIMD = 0 => R3[15..0] = R1[15..0] + R2[15..0]
>> add.32 R1,R2,R3 -> bit SIMD = 0 => R3[31..0] = R1[31..0] + R2[31..0]
>> add.64 R1,R2,R3 -> bit SIMD = 0 => R3[63..0] = R1[63..0] + R2[63..0]
>

One of the latest modifications was to solve the problems related to the MSB
in the "partial write" cases. Maintaining the high part became a headache
so we decided to clear it.

Now we can write :
add.8 R1,R2,R3 -> bit SIMD = 0 => R3[7..0] = R1[7..0] + R2[7..0]
                                          and R3[MSB..8] = 0
add.16 R1,R2,R3 -> bit SIMD = 0 => R3[15..0] = R1[15..0] + R2[15..0]
                                          and R3[MSB..16] = 0
and so on.

Even though it might make some algorithms or programming habits
more difficult to use, it removes a lot of pressure on the architecture
and its scalability. And when these are necessary, shifts and ORs
work without needing to mask things out.

<technical detail>

Clearing the MSB can occur at several places :
the simplest one is on the register set's write ports. a simple row of
AND gates do the trick. However, this creates problems when bypass
occurs : the second execution unit gets data that is different from
what is written to the register set.
The second solution is to mask the EU's results out, but this
adds a lot of AND gates all over the critical datapath.
Another last solution is to put the ANDs on the read ports
of the register set, trying to exploit some properties of the
operations : 0 + 0 = 0 etc.... There are some exceptions
(0 nor 0 = 1) and they can be found and trteated case by case.

Finally, i think that these solutions will be used partially,
in order to globally reduce the number of ANDs
in the critical datapath. Solution 3 looks best.
For the IDU, i don't know what solution is preferred.
It is very specific because division by zero is not wanted.
However, zero divided by something is zero.
Maybe there are some particular cases that can be
used in a smart way...

</technical detail>

I hope it is even more clearer now :-)

YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/