[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] magnetude comparison

To: f-cpu@seul.org
Subject: Re: [f-cpu] magnetude comparison
From: Michael Riepe <michael+fcpu@stud.uni-hannover.de>
Date: Tue, 2 Mar 2004 05:54:21 +0100
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Tue, 02 Mar 2004 07:38:51 -0500
In-reply-to: <4043E1FD.1060105@xeberon.net>; from gaetan@xeberon.net on Tue, Mar 02, 2004 at 02:23:09AM +0100
References: <40422843.2090605@xeberon.net> <40422BBC.8040607@f-cpu.org> <4042428F.50602@xeberon.net> <20040301002528.14378@thrai.stud.uni-hannover.de> <40428896.6070802@xeberon.net> <20040301234546.26608@thrai.stud.uni-hannover.de> <4043D1C4.1080104@xeberon.net> <20040302015331.05383@thrai.stud.uni-hannover.de> <4043E1FD.1060105@xeberon.net>
Reply-to: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

Hi once more,

On Tue, Mar 02, 2004 at 02:23:09AM +0100, gaetan@xeberon.net wrote:
[...]
> i have something like:
> sel =  C and (not(g) or (MSB and L) )
> d=3/t=3

That could at least be written as

	sel := (C and not g) or (C and MSB and L);

Another version would be

	case C & MSB is
		when "00" | "01" => sel := '0';
		when "10" => sel := not g;
		when "11" => sel := not g or L;
		when others => sel := 'X';
	end case;

> it's the output selector of compound adder in close datapath (rounding 
> to nearest)
> since MSB, L and C comes from the A+B output of the adder, is a full 
> length delay.
> + mux4 to select beween the 4 rounding mode selectors (d=1/t=1)
> + mux2 to select the output
> => i have the rounded result ~d=5/t=5 after the adder.

You're going the wrong way.

You have two signals that arrive rather late (C and MSB).  You combine
them with other (earlier) signals (g and L), and the result is even
later.  Then you combine them with another set of even earlier signals
(the rounding mode), producing an even later result and so on.  See
what is wrong with that?  There's a huge pile of logic "behind" the
late signals, which delays the result even further.

I'll give you an example how to do better.  It's based on a
transformation called "decomposition":  Given a boolean function
F with n inputs, y := F(a<1>, a<2>, ... a<i>, ... a<n-1>, a<n>),
you can also write

	if a<i> = '1' then
		y := F(a<1>, a<2>, ... '1', ... a<n-1>, a<n>);
	else
		y := F(a<1>, a<2>, ... '0', ... a<n-1>, a<n>);
	end if;

a<i> is "factored out", leaving two simpler subexpressions with n-1
inputs each.  The transformation can be repeated if necessary; if
you repeat it recursively until all function values are constants,
you will obtain a so-called "binary decision diagram", BDD.

There are also other kinds of decompositions, using different formulas.
Another popular one is

	y := u xor (a<i> and v);

with the subexpressions

	u := F(a<1>, a<2>, ... '0', ... a<n-1>, a<n>);
	v := F(a<1>, a<2>, ... '1', ... a<n-1>, a<n>) xor u;

but it's less suitable in our case.  It's a good choice when you
already have u and v, and u arrives later than both v and a<i>,
as in a carry-increment adder, for example.  If a particular input
signal arrives late, the first decomposition variant is better.

> one of  the most complex signal is in the far datapath:
> seladd = (not (c) and g and (LSB or r or s) ) or
>          ( c and L and ( [LSB-1] or g or r or s) )

Ok, let's go with that.  I assume L and LSB are the same thing...

LSB and [LSB-1] can be calculated rather fast; in fact, they're part
of the first 4-bit intermediate results the adder delivers.  g, r
and s are already available at the inputs of the adder.  The latest
signal is C (I suppose it's the carry-out).  So let's decompose the
circuit by factoring out c:

	if c = '1' then
		seladd := LSB and ([LSB-1] or g or r or s);
	else
		seladd := g and (LSB or r or s);
	end if;

Both subexpressions can be calculated while the adder ist running.
Only the 2:1 MUX will contribute to the total latency.

> selsub = (c and (   (not(g) and not(r) and not(s)) or
>                           ( g and r) or
>                           ( MSB and g and (LSB or s)
>                      )
>           )

This one requires a slightly different strategy...

	if MSB = '1' then
		selsub := c and ((not(g) and not(r) and not(s))
		              or (g and r)
		              or (g and (LSB or s)));
	else
		selsub := c and ((not(g) and not(r) and not(s))
		              or (g and r));
	end if;

which can be simplified to

	if MSB = '1' then
		selsub := c and ((not(g) and not(r) and not(s))
		              or (g and (LSB or r or s)));
	else
		selsub := c and ((not(g) and not(r) and not(s))
		              or (g and r));
	end if;

Again, LSB, g, r and s arrive rather early.  MSB and c, on the other
hand, arrive at about the same time (c is a little faster if you
use CSAdd).  Thus, you'll first calculate

	t1 := (not(g) and not(r) and not(s)) or (g and (LSB or r or s));
	t2 := (not(g) and not(r) and not(s)) or (g and r);

and then

	if MSB = '1' then
		selsub := c and t1;
	else
		selsub := c and t2;
	end if;

If MSB and c would arrive simultaneously, a 4:1 MUX might be more
appropriate:

	case MSB & c is
		when "11" => selsub := t1;
		when "01" => selsub := t2;
		when "00" | "10" => selsub := '0';
		when others => selsub := 'X';
	end case;

> delay: d=3/t=3 (if i could have cound an additionnal d/t for not(g) and 
> not(r) and not(s), but it can be computed before)
> and then
> if effective addition then
>     sel = seladd
> else -- effective substraction
>    sel = selsub
> end;
> d=4/t=4

That can be improved in the same fashion.  Put all the pieces together,
then decompose the result, factoring out the `late' signals (in this
case, MSB and c).  The remaining subexpressions will be less complex,
and what's more important, can be calculated earlier.  You should
also include the rounding mode which is known from the beginning.
Then, use MSB and c to control a 4:1 MUX that selects the correct
subexpression, and you're done (d=2/t=2).

> it's the compound adder selector in the far datapath (for rounding to 
> nearest)
> => so in this datapath i will have the rounded result d=6/t=6 after the 
> end of the added... :(

Probably closer to d=3/t=3 if you follow my advice.

Finally, before nico objects to my coding style again:

Synthesizers may perform decomposition as well, and may obtain similar
results even from simple, "delay-ignorant" models.  I doubt that many
of them will do it, however.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: [f-cpu] magnetude comparison
  - From: "gaetan@xeberon.net" <gaetan@xeberon.net>

References:
- Re: [f-cpu] magnetude comparison
  - From: Michael Riepe <michael+fcpu@stud.uni-hannover.de>
- Re: [f-cpu] magnetude comparison
  - From: "gaetan@xeberon.net" <gaetan@xeberon.net>
- Re: [f-cpu] magnetude comparison
  - From: Michael Riepe <michael+fcpu@stud.uni-hannover.de>
- Re: [f-cpu] magnetude comparison
  - From: "gaetan@xeberon.net" <gaetan@xeberon.net>

Prev by Author: Re: [f-cpu] magnetude comparison
Next by Author: Re: [f-cpu] magnetude comparison
Previous by thread: Re: [f-cpu] magnetude comparison
Next by thread: Re: [f-cpu] magnetude comparison
Index(es):
- Author
- Thread