[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] registers



Michael, All,

Michael Riepe wrote:
> Hi Mohamed,
> hi rest-of-the-f-gang,
> 
> 
>>So I was saying in the e-mail that I just joined the team. I am trying 
>>to get some information on the EU_IDU in order to look and see what I 
>>can do.
> 
> 
> Everything you like to. But first let me summarize the status quo:

It is always a good starting point.

> 
> I'm still playing with the SRT divider, and I guess I found a way to do
> SIMD integer divisions with few extra code (that is, gates). The unit
> will deliver one bit/cycle but needs some extra cycles for preparation
> and post-processing. This approach works best for large chunks.

I am looking at SRT division with higher radix, like Radix-4 division 
which produces 2 bits at each iteration, or even radix-16 which produces 
  4 bits at each iteration. If the last solution could produce the 4 
bits in 3 cycles for example given the adequate hardware, there is a 
gain (6 cycles of processing versus 8 cycles for 8 bits division). I am 
going to spend some time working through the implementation and try to 
get an idea of the critical path.

> 
> Compare-and-subtract, subtract-and-restore or non-restore algorithms will
> work with chunk sizes > 8 bit, but they will take two cycles/bit because
> the add/subtract subunit doesn't fit into a single pipeline stage.
> They're fine for 8-bit operands but too expensive for larger chunks.
> Cedric has written an 8-bit divider that is included in the latest
> snapshot.

I have seen it. The problem with these dividers is that the critical 
path is a add/sub which is impossible to handle in 6 gates delays for 
large chuncks. I am still not clear what speed do we want to acheive 
with this unit and what is exactly the hardware budget available. Also, 
do we need for example to handle 64 bits division as well as 2x32 bits 
division/ 4x16 bits/8x8 division SIMD style?

> 
> Finally, iterative solutions (like Newton-Raphson) will not only be rather
> slow (each step requires at least two successive arithmetic operations,
> at least one of them being a multiplication), but may also produce
> incorrect results due to limited precision (rounding errors). IMHO,
> they're suited for FP division only.

If we can afford a multiplier and an add/sub unit quick enough (to be 
precised), we can go with this solution since it doubles the precision 
at each iteration, getting a 64 bits precision is just 3 iterations(one 
iteration = how many cycles?, I will get this later) after we have an 8 
bits precision.

If the error is less than 2^(-64) it can not show up in our 64 bits 
representation.

> 
> Ideas/suggestions welcome.
> 

I will work on the details of the different ideas I am talking about and 
try to come up with the Pros/Cons of each one.

Dali

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/