[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] registers



On Sat, Oct 05, 2002 at 12:43:38PM +0000, Mohamed Ali Kilani wrote:
[...]
> I am looking at SRT division with higher radix, like Radix-4 division 
> which produces 2 bits at each iteration, or even radix-16 which produces 
>   4 bits at each iteration. If the last solution could produce the 4 
> bits in 3 cycles for example given the adequate hardware, there is a 
> gain (6 cycles of processing versus 8 cycles for 8 bits division). I am 
> going to spend some time working through the implementation and try to 
> get an idea of the critical path.

Radix-16 is probably the smallest size that can produce more bits than
it uses cycles. But the code will be rather complicated, compared to a
radix-2 SRT, and will take a lot of die space.

> > Compare-and-subtract, subtract-and-restore or non-restore algorithms will
> > work with chunk sizes > 8 bit, but they will take two cycles/bit because
> > the add/subtract subunit doesn't fit into a single pipeline stage.
> > They're fine for 8-bit operands but too expensive for larger chunks.
> > Cedric has written an 8-bit divider that is included in the latest
> > snapshot.
> 
> I have seen it. The problem with these dividers is that the critical 
> path is a add/sub which is impossible to handle in 6 gates delays for 
> large chuncks. I am still not clear what speed do we want to acheive 
> with this unit and what is exactly the hardware budget available. Also, 
> do we need for example to handle 64 bits division as well as 2x32 bits 
> division/ 4x16 bits/8x8 division SIMD style?

That's the goal, yes. Speed should be reasonable (that is,
approximately 1 bit/cycle).

> > Finally, iterative solutions (like Newton-Raphson) will not only be rather
> > slow (each step requires at least two successive arithmetic operations,
> > at least one of them being a multiplication), but may also produce
> > incorrect results due to limited precision (rounding errors). IMHO,
> > they're suited for FP division only.
> 
> If we can afford a multiplier and an add/sub unit quick enough (to be 
> precised), we can go with this solution since it doubles the precision 
> at each iteration, getting a 64 bits precision is just 3 iterations(one 
> iteration = how many cycles?, I will get this later) after we have an 8 
> bits precision.

You can count 6 cycles for a 64x64->128 bit multiplication, and 2 cycles
for 64-bit add/sub. Plus the Xbar cycles to transfer data to/from the
IDU and ASU units, since we're not going to duplicate them for division
(the multiplier is huge!).

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/