[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] registers



Hi Michael, All,

Here is the WE summary: division, division and even more division ;)

SRT with radix 2 is definitely the way to go in our case.
I found these two references that explain in details the trade offs:

http://citeseer.nj.nec.com/rd/83262034%2C370636%2C1%2C0.25%2CDownload/http://citeseer.nj.nec.com/cache/papers/cs/3067/ftp:zSzzSzumunhum.stanford.eduzSztrzSzsrtcomplexity_TVLSI.pdf/oberman98minimizing.pdf
http://citeseer.nj.nec.com/rd/83262034%2C477278%2C1%2C0.25%2CDownload/http://citeseer.nj.nec.com/cache/papers/cs/23488/ftp:zSzzSzumunhum.stanford.eduzSztrzSz12_liddicoat_a.pdf/high-performance-floating-point.pdf

I guess you are planning to fit an iteration in one cycle so we maintain 
1bit/iteration throughput. The only way that I see to acheive this, for 
larger chuncks than 8 bits, is using CSAs for partial reminder(PR) 
computation and archiving it in redundant form.

On the other hand, this architecture probably would work pretty good for 
the SIMD datapath.

I think you said that you a block diagram of the Divider you're 
designing. I would be glad to see it.
If you could also incorporate the code you have in the repository on 
seul.org so I can be really up to date.

By the way, I have a general question about SIMD registers. I have read 
in the manual that any 64 bit general purpose register would have a flag 
indicating if it is a SIMD register or not. how about the SIMD mode? 
i.e: 8x8 bits vs 4x16 bits vs 2x32 bits ? Are we assuming a default mode 
or any of the three is allowed?

Dali

Michael Riepe wrote:
> On Sat, Oct 05, 2002 at 12:43:38PM +0000, Mohamed Ali Kilani wrote:
> [...]
> 
>>I am looking at SRT division with higher radix, like Radix-4 division 
>>which produces 2 bits at each iteration, or even radix-16 which produces 
>>  4 bits at each iteration. If the last solution could produce the 4 
>>bits in 3 cycles for example given the adequate hardware, there is a 
>>gain (6 cycles of processing versus 8 cycles for 8 bits division). I am 
>>going to spend some time working through the implementation and try to 
>>get an idea of the critical path.
> 
> 
> Radix-16 is probably the smallest size that can produce more bits than
> it uses cycles. But the code will be rather complicated, compared to a
> radix-2 SRT, and will take a lot of die space.
> 
> 
>>>Compare-and-subtract, subtract-and-restore or non-restore algorithms will
>>>work with chunk sizes > 8 bit, but they will take two cycles/bit because
>>>the add/subtract subunit doesn't fit into a single pipeline stage.
>>>They're fine for 8-bit operands but too expensive for larger chunks.
>>>Cedric has written an 8-bit divider that is included in the latest
>>>snapshot.
>>
>>I have seen it. The problem with these dividers is that the critical 
>>path is a add/sub which is impossible to handle in 6 gates delays for 
>>large chuncks. I am still not clear what speed do we want to acheive 
>>with this unit and what is exactly the hardware budget available. Also, 
>>do we need for example to handle 64 bits division as well as 2x32 bits 
>>division/ 4x16 bits/8x8 division SIMD style?
> 
> 
> That's the goal, yes. Speed should be reasonable (that is,
> approximately 1 bit/cycle).
> 
> 
>>>Finally, iterative solutions (like Newton-Raphson) will not only be rather
>>>slow (each step requires at least two successive arithmetic operations,
>>>at least one of them being a multiplication), but may also produce
>>>incorrect results due to limited precision (rounding errors). IMHO,
>>>they're suited for FP division only.
>>
>>If we can afford a multiplier and an add/sub unit quick enough (to be 
>>precised), we can go with this solution since it doubles the precision 
>>at each iteration, getting a 64 bits precision is just 3 iterations(one 
>>iteration = how many cycles?, I will get this later) after we have an 8 
>>bits precision.
> 
> 
> You can count 6 cycles for a 64x64->128 bit multiplication, and 2 cycles
> for 64-bit add/sub. Plus the Xbar cycles to transfer data to/from the
> IDU and ASU units, since we're not going to duplicate them for division
> (the multiplier is huge!).
> 



*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/