[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] fadder forever



Here we go again.

On Fri, Feb 20, 2004 at 06:14:16PM +0100, gaetan@xeberon.net wrote:
[...]
> i don't understand why
> 
> Ea1 := (others => '0');
> should work and not 
> Ea1(DBL_SIZE-1 downto SGL_SIZE) := (others => '0')
> ???

As far as I remember, it's because the first statement is silently
converted to

	Ea1 := (Ea1'range => '0');

but the second one is left as-is.  This kind of conversion only happens
when the left hand side of the assignment is a full vector, not when
it is an aggregate or slice (as in the second case).  And since the
index range of the second "others" expression remains undefined,
some tools treat it as an error.

> isn't it in the vhdl standard?

It is.  But as you can see, VHDL semantics aren't always obvious.

[...]
> >>            else -- SIMD mode = double
> >>              -- (d=1)
> >>              Ea0(SGL_E_SIZE-1 downto 0) := (others => '0'); -- d=0
> >>              Eb0(SGL_E_SIZE-1 downto 0) := (others => '0'); -- d=0
> >>    
> >>
> >
> >Could be set to (others => 'X') since we don't care about it.
> >
> >  
> >
> 'X' state will not be accepted by the synthetiser?

It will.  In fact, it's our way to tell the synthesizer that we don't
care about the value of a variable or signal.  And it's a good thing
to do so, because it gives the synthesizer an opportunity to simplify
the circuit.  If I write e.g.

	case X(1 downto 0) is
		when "01" => Y := A;
		when "10" => Y := B;
		when others => Y := 'X';
	end case;

then the synthesizer may substitute any value for the 'X'.  And
usually, it makes a clever choice.  Let's see what happens if it
substitutes B for 'X':

	case X(1 downto 0) is
		when "01" => Y := A;
		when "10" => Y := B;
		when others => Y := B;
	end case;

The "10" case can now be dropped since it's redundant:

	case X(1 downto 0) is
		when "01" => Y := A;
		when others => Y := B;
	end case;

That is, the 3:1 MUX becomes a 2:1 MUX.

The synthesizer may also substitute different values at different
times, e.g. A if X(0) is '1' and B in any other case, which results
in the even simpler circuit

	if X(0) = '1' then
		Y := A;
	else
		Y := B;
	end if;

One of the select lines has been dropped, and if it's not used
elsewhere, the circuit providing it may be dropped as well.

Neither optimization would have been possible if I had written

	when others => Y := '0';

In your code, the synthesizer might have removed the MUXes for Ea0
and Eb0 completely if you had used 'X' instead of '0'.

[...]
> >>              -- First part 11-bits adders ('DOUBLE or highest single' 
> >>datapath)
> >>              -- (d=3)
> >>              -- computing Ea-Eb
> >>              fasu_ExpAdder_PartOne(Ea1, Eb1, gm10, pm10, sv10, tv10); 
> >>-- d=3
> >>              -- computing Eb-Ea
> >>              fasu_ExpAdder_PartOne(Eb1, Ea1, gm11, pm11, sv11, tv11); 
> >>-- d=3
> >>              -- (d=6)
> >>    
> >>
> >
> >Not necessary to calculate the difference both ways.  Remember that
> >CSAdd is a compound adder.  You get `Ea - Eb' at the incremented output
> >and `not (Eb - Ea)' at the normal output.  Thus, calculating `Eb - Ea'
> >requires only a single row of inverters, not a complete second adder.
> >
> >  
> >
> yes, that's true... it add a 1-gate delay

Do you know how much delay the second adder adds?  Don't forget that
the circuit grows, and the wires become longer.  A row of inverters
is probably cheaper.

> it causes a little problem in stage 3:
> i had a 4-bit shifter (a shifter that can shift a vector following an 
> other 'driver' 4bit-vector, so
> which can shift by 0 to 2^4 positions). So d=4, t=4
> So i had a conditional bit inverter (d=2) and it fitted into the stage.
> But now i have a 5 bit adder, so there is not enough space to put the 
> inverter...
> do i have the right to violate the 6-gate delay?

If you don't violate the 10-transistor delay, I vote yes.

> i have 2 solutions i don't know how to balance:
> - I can put the whole conditionnal inverter in the 4th stage, 
> - or i can precompute the inverted vector in the 3th stage (so i need an 
> additional register vector).

At the beginning of the design, I would not precompute anything I
don't have to precompute.

[...]
> >If you have the next level's gm/pm, only pm, sv00/10 and tv00/10
> >are needed in stage 2.  If you manage to squeeze another row of XORs
> >into stage 1 (which is likely, considering the delay calculation),
> >you can also calculate
> >
> >	ym00 := pm00 xor sv00;
> >	zm00 := pm00 xor tv00;
> >	ym10 := pm10 xor sv10;
> >	zm10 := pm10 xor tv10;
> >
> >in stage 1 and pass those instead of pm/sv/tv (saves some more
> >registers). 
> >
> ok here i do not understand anymore...
> how can i build the results without sv and tv ?
> isn't it like if i only have 4-bits adders?

sv/tv are "local" vectors that aren't passed to the next level of
CLA/CSV.  You only need to pass ym/zm (the partial results) and gm/pm
(the carry/propagate bits) along.  The basic outline is:

	-- half adders (delivering 1-bit results)
	gm0 := a and b;	-- d=1 t=1
	pm0 := a xor b;	-- d=1 t=2

	-- select vectors for this level
	CSV(gm0, pm0, sv0, tv0);	-- d=3 t=4
	-- carry look-ahead for next level (if there is one)
	CLA(gm0, pm0, gm1, pm1);	-- d=3 t=4

	-- 4-bit results
	ym1 := pm0 xor sv0;	-- d=4 t=6
	zm1 := pm0 xor tv0;	-- d=4 t=6

At this point, you can drop gm0, pm0, sv0 and tv0.  gm1, pm1, ym1 and
zm1 are passed along to the next level:

	-- select vectors for this level
	CSV(gm1, pm1, sv1, tv1);	-- d=5 t=6
	-- carry look-ahead for next level (if there is one)
	CLA(gm1, pm1, gm2, pm2);	-- d=5 t=6

	-- 16-bit results
	-- d=6 t=7
	for i in WIDTH/4-1 downto 0 loop
		if sv1(i) = '1' then
			ym2(4*i+3 downto 4*i) := zm1(4*i+3 downto 4*i);
		else
			ym2(4*i+3 downto 4*i) := ym1(4*i+3 downto 4*i);
		end if;
		if tv1(i) = '1' then
			zm2(4*i+3 downto 4*i) := zm1(4*i+3 downto 4*i);
		else
			zm2(4*i+3 downto 4*i) := ym1(4*i+3 downto 4*i);
		end if;
	end loop;

For another level, increment all suffixes by 1, substitute 16 for
4 and 15 for 3, and repeat.  Again, the old gm1/pm1, ym1/zm1 and
sv1/tv1 vectors are dropped, only gm2/pm2 and ym2/zm2 propagate.

Pipeline registers can be placed directly before or after every CSV/CLA
block (in this example: in all places where I left an empty line),
and of course at the beginning and at the end of the adder.

I admit that it looks a little more difficult in CSAdd.  That's because
I had to use the same variables again and again in a loop.  Otherwise,
I wouldn't have been able to write a generic one-size-fits-all
procedure...

A final note.  You were worried about the extra row of inverters for
the exponent subtractor.  You may try to "fold" them into the last
level of the adder to improve the total delay:

	-- 16-bit results
	-- d=6 t=8
	for i in WIDTH/4-1 downto 0 loop
		if sv1(i) = '1' then
			ym2(4*i+3 downto 4*i) := not zm1(4*i+3 downto 4*i);
			--                       ^^^ invert here...
		else
			ym2(4*i+3 downto 4*i) := not ym1(4*i+3 downto 4*i);
			--                       ^^^ and here.
		end if;
		if tv1(i) = '1' then
			zm2(4*i+3 downto 4*i) := zm1(4*i+3 downto 4*i);
		else
			zm2(4*i+3 downto 4*i) := ym1(4*i+3 downto 4*i);
		end if;
	end loop;

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/