[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] SIMD and exception



[...]

So here is a problem:
There is in the first stage the exponent substraction. For single, exponent size is 8 bit, so the substractor delay is 6.So it just fit into the stage
But for double, it takes more, so i will have to slip it between stage 1 and stage 2 (putting csv and cla). So the two datapathes (single and double) will not arrive on the same time on the main adder (mantissa adder). So i cannot use the same adder for both operation (except if i bufferize single datapath and "wait" for double one).
Is there any restriction about register number because it will add lot of registers (for double and single datapath)...
[...]

The subtractor will cross the first pipeline register in either case. Don't forget that you'll have to select different slices from the operands:
unfortunatly it's true... i'll have to cut lot of part...

    -- assuming separate adders for chunk 0 and 1
    -- suffixes -0/-1 indicate the chunk number
    Ea1 := (others => '0'); -- at least 11 bits
    Eb1 := (others => '0');
    Ea0 := (others => '0'); -- at least 8 bits
    Eb0 := (others => '0');
    if mode = double then
        Ea1(10 downto 0) := A(62 downto 52);
        Eb1(10 downto 0) := B(62 downto 52);
        -- Ea0/Eb0 not used in double mode
    else
        Ea1(7 downto 0) := A(62 downto 55);
        Eb1(7 downto 0) := B(62 downto 55);
        Ea0(7 downto 0) := A(30 downto 23);
        Eb0(7 downto 0) := B(30 downto 23);
    end if;
    -- and then subtract:
    De1 := Ea1 - Eb1; -- for upper single or double
    De0 := Ea0 - Eb0; -- for lower single
ok i understand...

but since the decoder works with 2 bit control vector, it should take
d=2 delay (in the general case)...
that's why i cannot decode in every stage, so i will have 2 data path in
//. Except for mantissa adders, everything else will be doubled...


or similar. That will add some latency before the subtractor. After that, you should have enough room for a row of 4-bit adders (d=4/t=6) and input inverters for one operand (d=1/t=1), maybe even for the final CLA (d=5/t=6 from the beginning of the adder).
mode decoder d=1, 2
inverter d=1
4bit adder d=4
it's already d=6 (or 7)...

But the CSV will have to reside in stage #2.
yes it's true

The same will be true for the "el cheapo" variant using a single 16-bit subtractor (with an 8+8 split for SIMD):

Ea := (others => '0'); -- 16 bits
Eb := (others => '0'); -- 16 bits
-- left-align operands
Ea(15 downto 8) := A(62 downto 55);
Eb(15 downto 8) := B(62 downto 55);
if mode = double then
-- add least significant bits
Ea(7 downto 5) := A(54 downto 52);
Eb(7 downto 5) := B(54 downto 52);
split_adder = '0';
else
-- add second chunk
Ea(7 downto 0) := A(30 downto 23);
Eb(7 downto 0) := B(30 downto 23);
split_adder = '1';
end if;
-- and then subtract:
De := SIMD_subtract(Ea, Eb, split_adder);

Yet another solution is this one:

De_1 := A(62 downto 55) - B(62 downto 55); -- 8 bits
De_2 := A(54 downto 52) - B(54 downto 52); -- 3 bits
De_3 := A(30 downto 23) - B(30 downto 23); -- 8 bits

Then, in stage #2, select De := De_1 & De_2 for double (requires another CLA/CSV step), and one of (De_1, De_3) for single.

But whatever you do, you'll have to sacrifice part of stage #2 if you implement variable operand sizes.

will I have to make a carry select tree for mantissa adder like in the
CSAdd if i split the mantissa in 8x8bit adders?

ok thank you very much. !!
If i count well, i'm far from 4 stages for fadder... it will take at
least takes 6 cycles... :'(


Michael.

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

--

~~ Gaetan ~~
http://www.xeberon.net



*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/