[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] registers



Yann,

I got the picture about the SIMD flag. It makes totally sense.

Yann Guidon wrote:
> hmmmmm maybe i hit "send" too quickly.
> 
> Yann Guidon wrote:
> 
> 
>>hi !
>>
>>Christophe Avoinne wrote:
>>
>>
>>>>Mohamed Ali Kilani wrote:
>>>>
>>>>
>>>>>Hi Michael, All,
>>>>>     
>>>>
>>>>><snip> 
>>>>
>>>for 64-bit architecture :
>>>
>>>add.8 R1,R2,R3 -> bit SIMD = 0 => R3[7..0] = R1[7..0] + R2[7..0]
>>
>>>add.16 R1,R2,R3 -> bit SIMD = 0 => R3[15..0] = R1[15..0] + R2[15..0]
>>>add.32 R1,R2,R3 -> bit SIMD = 0 => R3[31..0] = R1[31..0] + R2[31..0]
>>>add.64 R1,R2,R3 -> bit SIMD = 0 => R3[63..0] = R1[63..0] + R2[63..0]
>>
> 
> One of the latest modifications was to solve the problems related to the MSB
> in the "partial write" cases. Maintaining the high part became a headache
> so we decided to clear it.
> 
> Now we can write :
> add.8 R1,R2,R3 -> bit SIMD = 0 => R3[7..0] = R1[7..0] + R2[7..0]
>                                           and R3[MSB..8] = 0
> add.16 R1,R2,R3 -> bit SIMD = 0 => R3[15..0] = R1[15..0] + R2[15..0]
>                                           and R3[MSB..16] = 0
> and so on.
> 
> Even though it might make some algorithms or programming habits
> more difficult to use, it removes a lot of pressure on the architecture
> and its scalability. And when these are necessary, shifts and ORs
> work without needing to mask things out.
> 
> <technical detail>
> 
> Clearing the MSB can occur at several places :
> the simplest one is on the register set's write ports. a simple row of
> AND gates do the trick. However, this creates problems when bypass
> occurs : the second execution unit gets data that is different from
> what is written to the register set.
> The second solution is to mask the EU's results out, but this
> adds a lot of AND gates all over the critical datapath.
> Another last solution is to put the ANDs on the read ports
> of the register set, trying to exploit some properties of the
> operations : 0 + 0 = 0 etc.... There are some exceptions
> (0 nor 0 = 1) and they can be found and trteated case by case.

Just one question here, The output of every EU is registered (stored in 
a FF) so the combinatorial path out of an EU is either the connection to 
a Xbar port or to a bypass network, unless I am missing something ...

If this is right, I don't think an extra AND gate would hurt that path 
timing and make it critical at the design level.

> 
> Finally, i think that these solutions will be used partially,
> in order to globally reduce the number of ANDs
> in the critical datapath. Solution 3 looks best.
> For the IDU, i don't know what solution is preferred.
> It is very specific because division by zero is not wanted.
> However, zero divided by something is zero.
> Maybe there are some particular cases that can be
> used in a smart way...
> 
> </technical detail>
> 
> I hope it is even more clearer now :-)
> 
> YG
> 
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
> ______________________________________________________________________
> Etudiant: Wanadoo t'offre le Pack eXtense Haut Débit soit 150,92 euros
> d'économies ! Clique ici : http://www.ifrance.com/_reloc/mail.etudiant 
> 

Dali

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/