[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] registers
On Sat, Oct 05, 2002 at 12:43:38PM +0000, Mohamed Ali Kilani wrote:
[...]
> I am looking at SRT division with higher radix, like Radix-4 division
> which produces 2 bits at each iteration, or even radix-16 which produces
> 4 bits at each iteration. If the last solution could produce the 4
> bits in 3 cycles for example given the adequate hardware, there is a
> gain (6 cycles of processing versus 8 cycles for 8 bits division). I am
> going to spend some time working through the implementation and try to
> get an idea of the critical path.
Radix-16 is probably the smallest size that can produce more bits than
it uses cycles. But the code will be rather complicated, compared to a
radix-2 SRT, and will take a lot of die space.
> > Compare-and-subtract, subtract-and-restore or non-restore algorithms will
> > work with chunk sizes > 8 bit, but they will take two cycles/bit because
> > the add/subtract subunit doesn't fit into a single pipeline stage.
> > They're fine for 8-bit operands but too expensive for larger chunks.
> > Cedric has written an 8-bit divider that is included in the latest
> > snapshot.
>
> I have seen it. The problem with these dividers is that the critical
> path is a add/sub which is impossible to handle in 6 gates delays for
> large chuncks. I am still not clear what speed do we want to acheive
> with this unit and what is exactly the hardware budget available. Also,
> do we need for example to handle 64 bits division as well as 2x32 bits
> division/ 4x16 bits/8x8 division SIMD style?
That's the goal, yes. Speed should be reasonable (that is,
approximately 1 bit/cycle).
> > Finally, iterative solutions (like Newton-Raphson) will not only be rather
> > slow (each step requires at least two successive arithmetic operations,
> > at least one of them being a multiplication), but may also produce
> > incorrect results due to limited precision (rounding errors). IMHO,
> > they're suited for FP division only.
>
> If we can afford a multiplier and an add/sub unit quick enough (to be
> precised), we can go with this solution since it doubles the precision
> at each iteration, getting a 64 bits precision is just 3 iterations(one
> iteration = how many cycles?, I will get this later) after we have an 8
> bits precision.
You can count 6 cycles for a 64x64->128 bit multiplication, and 2 cycles
for 64-bit add/sub. Plus the Xbar cycles to transfer data to/from the
IDU and ASU units, since we're not going to duplicate them for division
(the multiplier is huge!).
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/