[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] registers
Yann,
I got the picture about the SIMD flag. It makes totally sense.
Yann Guidon wrote:
> hmmmmm maybe i hit "send" too quickly.
>
> Yann Guidon wrote:
>
>
>>hi !
>>
>>Christophe Avoinne wrote:
>>
>>
>>>>Mohamed Ali Kilani wrote:
>>>>
>>>>
>>>>>Hi Michael, All,
>>>>>
>>>>
>>>>><snip>
>>>>
>>>for 64-bit architecture :
>>>
>>>add.8 R1,R2,R3 -> bit SIMD = 0 => R3[7..0] = R1[7..0] + R2[7..0]
>>
>>>add.16 R1,R2,R3 -> bit SIMD = 0 => R3[15..0] = R1[15..0] + R2[15..0]
>>>add.32 R1,R2,R3 -> bit SIMD = 0 => R3[31..0] = R1[31..0] + R2[31..0]
>>>add.64 R1,R2,R3 -> bit SIMD = 0 => R3[63..0] = R1[63..0] + R2[63..0]
>>
>
> One of the latest modifications was to solve the problems related to the MSB
> in the "partial write" cases. Maintaining the high part became a headache
> so we decided to clear it.
>
> Now we can write :
> add.8 R1,R2,R3 -> bit SIMD = 0 => R3[7..0] = R1[7..0] + R2[7..0]
> and R3[MSB..8] = 0
> add.16 R1,R2,R3 -> bit SIMD = 0 => R3[15..0] = R1[15..0] + R2[15..0]
> and R3[MSB..16] = 0
> and so on.
>
> Even though it might make some algorithms or programming habits
> more difficult to use, it removes a lot of pressure on the architecture
> and its scalability. And when these are necessary, shifts and ORs
> work without needing to mask things out.
>
> <technical detail>
>
> Clearing the MSB can occur at several places :
> the simplest one is on the register set's write ports. a simple row of
> AND gates do the trick. However, this creates problems when bypass
> occurs : the second execution unit gets data that is different from
> what is written to the register set.
> The second solution is to mask the EU's results out, but this
> adds a lot of AND gates all over the critical datapath.
> Another last solution is to put the ANDs on the read ports
> of the register set, trying to exploit some properties of the
> operations : 0 + 0 = 0 etc.... There are some exceptions
> (0 nor 0 = 1) and they can be found and trteated case by case.
Just one question here, The output of every EU is registered (stored in
a FF) so the combinatorial path out of an EU is either the connection to
a Xbar port or to a bypass network, unless I am missing something ...
If this is right, I don't think an extra AND gate would hurt that path
timing and make it critical at the design level.
>
> Finally, i think that these solutions will be used partially,
> in order to globally reduce the number of ANDs
> in the critical datapath. Solution 3 looks best.
> For the IDU, i don't know what solution is preferred.
> It is very specific because division by zero is not wanted.
> However, zero divided by something is zero.
> Maybe there are some particular cases that can be
> used in a smart way...
>
> </technical detail>
>
> I hope it is even more clearer now :-)
>
> YG
>
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu in the body. http://f-cpu.seul.org/
> ______________________________________________________________________
> Etudiant: Wanadoo t'offre le Pack eXtense Haut Débit soit 150,92 euros
> d'économies ! Clique ici : http://www.ifrance.com/_reloc/mail.etudiant
>
Dali
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/