[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] DivMod unit



On Thu, Dec 06, 2001 at 02:25:43PM +0100, Cedric BAIL wrote:

[signed division]
> > There's a general problem here: when the word size grows, the delay
> > will increase as well (O(log(n))).  A 64-bit divider will need 2 cycles for
> > each result bit, larger units will be even slower.  Another drawback
> > is that this kind of algorithm doesn't work with signed operands.
> It's sure. But i don't know if we need to do signed division, only a prestage
> to positive them and a postage to negative them back, if they are negative is
> needed. So i don't know, if we must do this in divide unit or not.

Separate abs/neg stages are an option (at least for FC0).  They'll
take two more cycles, but I don't really care for it.

[...]
> > If operating on 8-bit bytes, the simple approach is faster -- it can
> > perform an 8-bit division in 8 cycles --, but for larger word sizes,
> > you need additional cycles for the subtract operation, which makes the
> > SRT algorithm faster (1 cycle per bit, plus a few cycles for input
> > preparation -- operand normalization -- and output postprocessing).
> So, it's ok. I am know learning howto do a good SRT/RADIX division. The
> question that I have, is how big must I do it (I read some little things
> about 2-, 4-, 8-bits Raddix/SRT divider). Perhaps a solution is to do a C
> program that will generate the SRT/RADDIX divider that you want.

I've been playing with the SRT approach for quite a while, and my
conclusion was to stick with radix-2 (one result bit at a time).
You won't be able to calculate 2 or more bits in one F-CPU (6 gate)
pipeline stage, and the decision logic becomes much more complex quickly
when you increase the radix.  Maybe there will be a speed improvement if
you use radix-16 or greater, but you'll have to pay for it (complexity,
area, and possibility of errors).

On the other hand, a radix-2 SRT *can* do 1 result bit per cycle
independent of the operand size, while most other algorithms can't.

> > Since we can calculate eight 8-bit quotients at a time (SIMD),
> > pipelining doesn't seem really useful to me.
> I am really stupid, I forgot this ;-)
>  
> An other things, I am currently looking to the divider unit (idiv.vhdl). And
> I thing that it's possible to do 8, 16, 32 and 64bits divisions in parallel.

Yes and no.  You can do e.g. 8-but divisions with a 32-bit divider (just
set the unused bits to zero), but it seems to be impossible (within the
scope of the F-CPU) to split the divider and let both halves operate
independently, because the split logic will increase the delay too much.
I proposed a solution a while ago (YG said it was "crayzee as usual"),
but it requires a lot of code duplication (150% overhead, compared to a
"straight" 64-bit divider).

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/