[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] Re: your mail



On Thu, Oct 11, 2001 at 06:30:05PM +0200, Yann Guidon [systeme] wrote:
> yet another new version ...
[...]
> -- 1 : last fanout for the function bits + the ROP2 operator itself :
> -- (the older fanout loop was merged with this ROP2 (MUX) operation)
>   rop_loop : for i in F_RANGE generate
> --  begin
>     with std_ulogic_vector(1 downto 0)'(ROP2_in_A(i) & ROP2_in_B(i)) select 
>     partial_ROP(i) <=
>       ROP2_function_bit0(i/4) when "00",
>       ROP2_function_bit1(i/4) when "10",  -- the order must be verified !!!
>       ROP2_function_bit2(i/4) when "01",
>       ROP2_function_bit3(i/4) when others; -- "11"
> -- YG> i hope that this will be recognized as a MUX4 operator,
> -- instead of the decomposed version used before (boolean version)
> -- which is probably slower and heavier.

Yep, it should.  `with ... select' is the concurrent equivalent of the
sequential `case ... end case' statement, which will result in a MUX.

On the other hand, a boolean expression can be transformed to a more
suitable, equivalent expression (DeMorgan), while a MUX probably can't.
This may make a difference in an ASIC or full-custom chip.

> -- 2 bis : the 'MUX' operation (in parallel with the ROP2 operation)
>     with ROP2_in_C(i) select
>      partial_MUX(i) <=
>        ROP2_in_A(i) when '1',
>        ROP2_in_B(i) when others; -- '0'
>   end generate rop_loop;
> 
> 
> -- 3 : partial ORs and ANDs on the byte chuncks :
>   BYTE_COMBINE : for i in MAXSIZE-1 downto 0 generate
>     partial_OR(8*i+7 downto 8*i) <= "11111111" when
>       partial_ROP(8*i+7 downto 8*i) /= "00000000"
>       else "00000000";
>     partial_AND(8*i+7 downto 8*i) <= "11111111" when
>       partial_ROP(8*i+7 downto 8*i) = "11111111"
>       else "00000000";
>   end generate BYTE_COMBINE;
> -- YG> I'm still uncertain about the best way to write a multi-size version.
> -- YG> Plus, the latency might explode the ROP2 unit's performance.
> -- YG> So the multi-size version is dropped until it becomes necessary.
> -- YG> Let's stick to plain bytes...
> -- YG> Note : rop2.eps contains a trick to relieve the fanout (1->8) problem.

What about a second pipeline stage?  The first one could do the direct
and mux modes, while the second would perform the combine operations.
That would also leave some room in the first stage where we can add
a 6:64 decoder (or two 3:8, oder three 2:4 decoders) for the `bitop'
instruction.  I know that bitop is supposed to be handled by the SHL
unit, but that won't work anyway (or will make the shifter even *more*
complex), and I really don't care if combines need one or two cycles
(IMHO we could also drop them completely -- it's still possible to
do combines with the cmpl[e] instruction, at *any* chunk size).

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/