[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] IDU News; synthesis report



On Thu, Apr 10, 2003 at 02:46:11PM +0200, devik wrote:
[...]
> > Can you try a simple carry-select adder?  I'll attach a copy.
> 
> Ok, 310 slices, 65 MHz on Spartan2E. Seems ripple one
> is superior on fpga because they have dedicated circuits
> for it. Maybe one could create 8 8bit ripple adders
> and add carries in next stage - these +1 adders could
> use fast carry chain again ...

Design criteria for FPGAs are very different.  In most FPGAs, every
logical function with 3 or 4 inputs will have the same delay, for example.
In some chips, a 1-bit full adder is as fast as a 2-input nand gate,
which is definitely not true for custom chips (where the nand gate is
approximately 3-4 times as fast).

The delay of a ripple-carry adder is O(n), while a carry-select adder
has a delay of O(log(n)).  Ripple-carry adders work fine in some FPGAs
because they have fast carry chains *and* cell delays that are much
higher than the delay of a single gate in a custom chip -- remember that
a cell usually contains input and output muxes and a LUT (at least).
Their trick is to bypass the normal data path and use pre-routed lines to
speed up the carry chain.  The whole secret is that the O(x) part of the
delay formula *varies* with the kind of adder, or with the propagation
direction of your data: forward = slow, sideways = fast.  In a custom
chip, O(x) is mostly immutable; as a consequence, a 64-bit parallel-prefix
adder will run a lot faster than a 64-bit ripple-carry adder.

Earlier synthesis runs with Synopsys reported 389 MHz for IAdd and 458 MHz
for IMul64 (that were the previous versions; the current ones are probably
even faster, but I haven't got any numbers yet).  You'll never beat that
with a ripple-carry adder, not even in 8-bit mode: The carry chain will
be 16 gates long (16 transistor levels).  My carry-select adder requires
only 5 gates (7 transistors) for an 8-bit result.  The F-CPU pipeline
limit is 6 gates / 10 transistors (which is usually referred to as the
`6G Rule' or `6G/10T Rule').  In an FPGA, things are turned upside down
and inside out:  The 8-bit carry-select adder will be ~4 cells deep,
while an equivalent ripple-carry adder with fast carry gets away with
a depth of only 1-2 cells.  Without fast carry, on the other hand,
its delay will increase dramatically (best case: 8 cells).

To cut a long story short:  For an *efficient* FPGA implementation, we
would have to re-design most units.  In fact, we would also have to take
the type of FPGA into account if we want maximum performance.  That would
be too much work -- remember that there are only few active developers,
and even less VHDL coders.  On the other hand, we distribute the source
code, and any user who wants to tune it for a specific FPGA may do so.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/