[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] registers
Hi Michael, All,
Here is the WE summary: division, division and even more division ;)
SRT with radix 2 is definitely the way to go in our case.
I found these two references that explain in details the trade offs:
http://citeseer.nj.nec.com/rd/83262034%2C370636%2C1%2C0.25%2CDownload/http://citeseer.nj.nec.com/cache/papers/cs/3067/ftp:zSzzSzumunhum.stanford.eduzSztrzSzsrtcomplexity_TVLSI.pdf/oberman98minimizing.pdf
http://citeseer.nj.nec.com/rd/83262034%2C477278%2C1%2C0.25%2CDownload/http://citeseer.nj.nec.com/cache/papers/cs/23488/ftp:zSzzSzumunhum.stanford.eduzSztrzSz12_liddicoat_a.pdf/high-performance-floating-point.pdf
I guess you are planning to fit an iteration in one cycle so we maintain
1bit/iteration throughput. The only way that I see to acheive this, for
larger chuncks than 8 bits, is using CSAs for partial reminder(PR)
computation and archiving it in redundant form.
On the other hand, this architecture probably would work pretty good for
the SIMD datapath.
I think you said that you a block diagram of the Divider you're
designing. I would be glad to see it.
If you could also incorporate the code you have in the repository on
seul.org so I can be really up to date.
By the way, I have a general question about SIMD registers. I have read
in the manual that any 64 bit general purpose register would have a flag
indicating if it is a SIMD register or not. how about the SIMD mode?
i.e: 8x8 bits vs 4x16 bits vs 2x32 bits ? Are we assuming a default mode
or any of the three is allowed?
Dali
Michael Riepe wrote:
> On Sat, Oct 05, 2002 at 12:43:38PM +0000, Mohamed Ali Kilani wrote:
> [...]
>
>>I am looking at SRT division with higher radix, like Radix-4 division
>>which produces 2 bits at each iteration, or even radix-16 which produces
>> 4 bits at each iteration. If the last solution could produce the 4
>>bits in 3 cycles for example given the adequate hardware, there is a
>>gain (6 cycles of processing versus 8 cycles for 8 bits division). I am
>>going to spend some time working through the implementation and try to
>>get an idea of the critical path.
>
>
> Radix-16 is probably the smallest size that can produce more bits than
> it uses cycles. But the code will be rather complicated, compared to a
> radix-2 SRT, and will take a lot of die space.
>
>
>>>Compare-and-subtract, subtract-and-restore or non-restore algorithms will
>>>work with chunk sizes > 8 bit, but they will take two cycles/bit because
>>>the add/subtract subunit doesn't fit into a single pipeline stage.
>>>They're fine for 8-bit operands but too expensive for larger chunks.
>>>Cedric has written an 8-bit divider that is included in the latest
>>>snapshot.
>>
>>I have seen it. The problem with these dividers is that the critical
>>path is a add/sub which is impossible to handle in 6 gates delays for
>>large chuncks. I am still not clear what speed do we want to acheive
>>with this unit and what is exactly the hardware budget available. Also,
>>do we need for example to handle 64 bits division as well as 2x32 bits
>>division/ 4x16 bits/8x8 division SIMD style?
>
>
> That's the goal, yes. Speed should be reasonable (that is,
> approximately 1 bit/cycle).
>
>
>>>Finally, iterative solutions (like Newton-Raphson) will not only be rather
>>>slow (each step requires at least two successive arithmetic operations,
>>>at least one of them being a multiplication), but may also produce
>>>incorrect results due to limited precision (rounding errors). IMHO,
>>>they're suited for FP division only.
>>
>>If we can afford a multiplier and an add/sub unit quick enough (to be
>>precised), we can go with this solution since it doubles the precision
>>at each iteration, getting a 64 bits precision is just 3 iterations(one
>>iteration = how many cycles?, I will get this later) after we have an 8
>>bits precision.
>
>
> You can count 6 cycles for a 64x64->128 bit multiplication, and 2 cycles
> for 64-bit add/sub. Plus the Xbar cycles to transfer data to/from the
> IDU and ASU units, since we're not going to duplicate them for division
> (the multiplier is huge!).
>
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/