[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] fadder forever



Hi!

Michael Riepe a écrit :

Here we go again.

On Fri, Feb 20, 2004 at 06:14:16PM +0100, gaetan@xeberon.net wrote:
[...]

i don't understand why

Ea1 := (others => '0');
should work and not Ea1(DBL_SIZE-1 downto SGL_SIZE) := (others => '0')
???

As far as I remember, it's because the first statement is silently
converted to

Ea1 := (Ea1'range => '0');

but the second one is left as-is. This kind of conversion only happens
when the left hand side of the assignment is a full vector, not when
it is an aggregate or slice (as in the second case). And since the
index range of the second "others" expression remains undefined,
some tools treat it as an error.


isn't it in the vhdl standard?

It is. But as you can see, VHDL semantics aren't always obvious.


annoying indeed....

[...]

else -- SIMD mode = double
-- (d=1)
Ea0(SGL_E_SIZE-1 downto 0) := (others => '0'); -- d=0
Eb0(SGL_E_SIZE-1 downto 0) := (others => '0'); -- d=0


Could be set to (others => 'X') since we don't care about it.




'X' state will not be accepted by the synthetiser?

It will. In fact, it's our way to tell the synthesizer that we don't
care about the value of a variable or signal. And it's a good thing
to do so, because it gives the synthesizer an opportunity to simplify
the circuit. If I write e.g.

case X(1 downto 0) is
when "01" => Y := A;
when "10" => Y := B;
when others => Y := 'X';
end case;

then the synthesizer may substitute any value for the 'X'. And
usually, it makes a clever choice. Let's see what happens if it
[...]

One of the select lines has been dropped, and if it's not used
elsewhere, the circuit providing it may be dropped as well.

Neither optimization would have been possible if I had written

when others => Y := '0';

In your code, the synthesizer might have removed the MUXes for Ea0
and Eb0 completely if you had used 'X' instead of '0'.


ok i see very well.. i though synthetiser only wanted to see '0' or '1'...

[...]

-- First part 11-bits adders ('DOUBLE or highest single' datapath)
-- (d=3)
-- computing Ea-Eb
fasu_ExpAdder_PartOne(Ea1, Eb1, gm10, pm10, sv10, tv10); -- d=3
-- computing Eb-Ea
fasu_ExpAdder_PartOne(Eb1, Ea1, gm11, pm11, sv11, tv11); -- d=3
-- (d=6)


Not necessary to calculate the difference both ways. Remember that
CSAdd is a compound adder. You get `Ea - Eb' at the incremented output
and `not (Eb - Ea)' at the normal output. Thus, calculating `Eb - Ea'
requires only a single row of inverters, not a complete second adder.




yes, that's true... it add a 1-gate delay

Do you know how much delay the second adder adds? Don't forget that
the circuit grows, and the wires become longer. A row of inverters
is probably cheaper.

ok


it causes a little problem in stage 3:
i had a 4-bit shifter (a shifter that can shift a vector following an other 'driver' 4bit-vector, so
which can shift by 0 to 2^4 positions). So d=4, t=4
So i had a conditional bit inverter (d=2) and it fitted into the stage.
But now i have a 5 bit adder, so there is not enough space to put the inverter...
do i have the right to violate the 6-gate delay?

If you don't violate the 10-transistor delay, I vote yes.


i have 2 solutions i don't know how to balance:
- I can put the whole conditionnal inverter in the 4th stage, - or i can precompute the inverted vector in the 3th stage (so i need an additional register vector).

At the beginning of the design, I would not precompute anything I
don't have to precompute.

[...]

If you have the next level's gm/pm, only pm, sv00/10 and tv00/10
are needed in stage 2. If you manage to squeeze another row of XORs
into stage 1 (which is likely, considering the delay calculation),
you can also calculate

ym00 := pm00 xor sv00;
zm00 := pm00 xor tv00;
ym10 := pm10 xor sv10;
zm10 := pm10 xor tv10;

in stage 1 and pass those instead of pm/sv/tv (saves some more
registers).

ok here i do not understand anymore...
how can i build the results without sv and tv ?
isn't it like if i only have 4-bits adders?

sv/tv are "local" vectors that aren't passed to the next level of
CLA/CSV. You only need to pass ym/zm (the partial results) and gm/pm
(the carry/propagate bits) along. The basic outline is:

-- half adders (delivering 1-bit results)
gm0 := a and b; -- d=1 t=1
pm0 := a xor b; -- d=1 t=2

-- select vectors for this level
CSV(gm0, pm0, sv0, tv0); -- d=3 t=4
-- carry look-ahead for next level (if there is one)
CLA(gm0, pm0, gm1, pm1); -- d=3 t=4

-- 4-bit results
ym1 := pm0 xor sv0; -- d=4 t=6
zm1 := pm0 xor tv0; -- d=4 t=6

At this point, you can drop gm0, pm0, sv0 and tv0. gm1, pm1, ym1 and
zm1 are passed along to the next level:

-- select vectors for this level
CSV(gm1, pm1, sv1, tv1); -- d=5 t=6
-- carry look-ahead for next level (if there is one)
CLA(gm1, pm1, gm2, pm2); -- d=5 t=6

-- 16-bit results
-- d=6 t=7
for i in WIDTH/4-1 downto 0 loop
if sv1(i) = '1' then
ym2(4*i+3 downto 4*i) := zm1(4*i+3 downto 4*i);
else
ym2(4*i+3 downto 4*i) := ym1(4*i+3 downto 4*i);
end if;
if tv1(i) = '1' then
zm2(4*i+3 downto 4*i) := zm1(4*i+3 downto 4*i);
else
zm2(4*i+3 downto 4*i) := ym1(4*i+3 downto 4*i);
end if;
end loop;

For another level, increment all suffixes by 1, substitute 16 for
4 and 15 for 3, and repeat. Again, the old gm1/pm1, ym1/zm1 and
sv1/tv1 vectors are dropped, only gm2/pm2 and ym2/zm2 propagate.

Pipeline registers can be placed directly before or after every CSV/CLA
block (in this example: in all places where I left an empty line),
and of course at the beginning and at the end of the adder.

I admit that it looks a little more difficult in CSAdd. That's because
I had to use the same variables again and again in a loop. Otherwise,
I wouldn't have been able to write a generic one-size-fits-all
procedure...

A final note. You were worried about the extra row of inverters for
the exponent subtractor. You may try to "fold" them into the last
level of the adder to improve the total delay:

-- 16-bit results
-- d=6 t=8
for i in WIDTH/4-1 downto 0 loop
if sv1(i) = '1' then
ym2(4*i+3 downto 4*i) := not zm1(4*i+3 downto 4*i);
-- ^^^ invert here...
else
ym2(4*i+3 downto 4*i) := not ym1(4*i+3 downto 4*i);
-- ^^^ and here.
end if;
if tv1(i) = '1' then
zm2(4*i+3 downto 4*i) := zm1(4*i+3 downto 4*i);
else
zm2(4*i+3 downto 4*i) := ym1(4*i+3 downto 4*i);
end if;
end loop;


oh!
thank you i think i will have an interesting week end...

--

~~ Gaetan ~~
http://www.xeberon.net


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/