[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [f-cpu] magnetude comparison
- To: f-cpu@seul.org
- Subject: Re: [f-cpu] magnetude comparison
- From: Michael Riepe <michael+fcpu@stud.uni-hannover.de>
- Date: Mon, 1 Mar 2004 14:51:16 +0100
- Delivered-to: archiver@seul.org
- Delivered-to: f-cpu-outgoing@seul.org
- Delivered-to: f-cpu@seul.org
- Delivery-date: Mon, 01 Mar 2004 11:49:50 -0500
- In-reply-to: <40428896.6070802@xeberon.net>; from gaetan@xeberon.net on Mon, Mar 01, 2004 at 01:49:26AM +0100
- References: <40422843.2090605@xeberon.net> <40422BBC.8040607@f-cpu.org> <4042428F.50602@xeberon.net> <20040301002528.14378@thrai.stud.uni-hannover.de> <40428896.6070802@xeberon.net>
- Reply-to: f-cpu@seul.org
- Sender: owner-f-cpu@seul.org
Hi F-gang,
On Mon, Mar 01, 2004 at 01:49:26AM +0100, gaetan@xeberon.net wrote:
[...]
> what do you think about this new version:
>
> -- estimated delay
> -- 2 < L <= 4 : d=4/t=5
> -- 5 < L <= 8 : d=5/t=6
> -- 9 < L <= 16 : d=6/t=7
> -- 17 < L <= 32 : d=7/t=8
> -- 33 < L <= 64 : d=8/t=9
> function compare_vector(a, b : std_ulogic_vector) return std_ulogic is
> constant L : natural := a'length;
> constant aa : std_ulogic_vector(L-1 downto 0) := a;
> constant bb : std_ulogic_vector(L-1 downto 0) := b;
> variable pp, vv: std_ulogic_vector(L-1 downto 0);
> variable p, v, swap : std_ulogic;
> variable step, level, left : natural;
> begin
>
> -- (d=0/t=0)
> for i in L-1 downto 0 loop
> pp(i) := b(i);
> vv(i) := a(i) xor b(i);
> end loop;
>
> -- (d=1/t=2)
>
> for level in 1 to 15 loop
> step := 2**level;
> left := L/step;
> for i in 0 to left-1 loop
> if (vv(2*i)='0') then
> pp(i) := pp(2*i+1);
> else
> pp(i) := pp(2*i);
> end if; -- d=1/t=1
Shouldn't this be the other way round?
if vv(2*i+1) = '1' then
pp(i) := pp(2*i+1);
else
pp(i) := pp(2*i);
end if;
The most significant bit is on the left.
> vv(i) := vv(2*i+1) or vv(2*i); -- d=1/t=1
> end loop;
> exit when step >= L;
> -- cost for each loop : d=1/t=1
> end loop;
>
> swap := pp(0) and vv(0); -- d=1/t=1
>
> -- print_vector("a", a);
> -- print_vector("b", b);
> -- print_stdval("swap?", swap);
> return swap;
> end;
>
> i estimate the delay around 7 or 8 bits for 32 bit or 64 bit. It's easy
> to cut between the first and second stage...
Yep. I once considered a similar circuit for the EU_CMP unit, too.
I finally dropped it, but I don't remember why :( But I still have
the source file :)
I still see a minor delay problem. It's true that I count a MUX
as d=1/t=1 but only for the datapath -- that is, the control signal
should arrive early. In this circuit, it's always late. You may get
better latency using 4:1 MUXes (at least in the first two stages):
for level in 1 to 15 loop
step := 4**level;
left := L / step;
for i in 0 to left-1 loop
if vv(4*i+3) = '1' then
pp(i) := pp(4*i+3);
elsif vv(4*i+2) = '1' then
pp(i) := pp(4*i+2);
elsif vv(4*i+1) = '1' then
pp(i) := pp(4*i+1);
else
pp(i) := pp(4*i+0);
end if;
vv(i) := vv(4*i+3) or vv(4*i+2) or vv(4*i+1) or vv(4*i+0);
end loop;
exit when step >= L;
end loop;
Each loop counts as d=2 t=2 now (which is realistic), but you'll
need only half the number of loops. I used this kind of stage in
my alternate EU_CMP version. In fact, my stage was a little more
complex because it did not only extract the "leading" bit but also
calculated its index and bit mask (for the msb instruction).
The drawback is that a 4:1 stage can't be realized in most FPGAs'
cells because the core element has too many inputs. On the other
hand, one may extract the 4:1 core and put it in a function:
function compare4 (pp, vv : in std_ulogic_vector)
return std_ulogic_vector;
for level in 1 to 15 loop
step := 4**level;
left := L / step;
for i in 0 to left-1 loop
pp(i) := compare4(pp(4*i+3 downto 4*i), vv(4*i+3 downto 4*i));
vv(i) := vv(4*i+3) or vv(4*i+2) or vv(4*i+1) or vv(4*i+0);
end loop;
exit when step >= L;
end loop;
and the function could use 2:1 stages internally if necessary.
> it's not I don't want to use your way with the compound adder, but i
> have a method using Leading One Predictor
> (which assumed than mantissa A should be greater than mantissa B, and
> so, the problem is when exponents
> are equals, the document i have use a comparator. I propose to put it in
> the first stage).
The LOP needs its operands in a particular order?
Maybe that can be changed.
> I use a compound adder
> anyway (for rounding, not for normalization).
> With leading one predictor, it's really fast to get the number of shift
> to apply to the mantissa in the normalisation
> process.
Yep. Without prediction of the shift count, you would have to
calculate it from the result, which takes at least half a stage.
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/