[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] DivMod unit



On Fri, Dec 07, 2001 at 06:25:47PM +0100, Cedric BAIL wrote:
[...]
> > I've been playing with the SRT approach for quite a while, and my
> > conclusion was to stick with radix-2 (one result bit at a time).
> > You won't be able to calculate 2 or more bits in one F-CPU (6 gate)
> > pipeline stage, and the decision logic becomes much more complex
> I don't understood the problem, the division is not signed, so were is the
> problem with pipeline stage ?

In a SRT divider, the remainder is signed (that is, it may have a
negative value while the operation hasn't finished) even if the operands
themselves aren't.

But the real problem is that you have to fit too many things into a
single stage. For radix-2, the list contains:

	#1 - input selector (for looping)
	#2 - decision logic
	#3 - divisor selection (positive, negative or zero)
	#4 - 3-operand adder (a row of full adders, normally)

where all subunits depend on the one before.  You may get away with
duplicating #4, moving #3 to the end of the stage (making it a result
selector), and merging it with the input mux of the "next" stage.
But even then you'll have a 4:1 mux and at least four gates (including
two XOR gates, which are slower than AND/OR) in the critical data
path, which is already pretty close to the 6-gate limit.

In a radix-4 divider, both #2 and #4 won't fit into a single stage
any longer.  You'll have to split the core into two separate stages,
but that is nonsense -- why add extra circuitry to calculate 2 bits
every other cycle if you can calculate 1 bit per cycle with less effort?

[...]
> > > An other things, I am currently looking to the divider unit (idiv.vhdl). 
> > > And I thing that it's possible to do 8, 16, 32 and 64bits divisions in
> > > parallel.
> 
> > Yes and no.  You can do e.g. 8-but divisions with a 32-bit divider
> > (just set the unused bits to zero), but it seems to be impossible (within
> > the scope of the F-CPU) to split the divider and let both halves operate
> > independently, because the split logic will increase the delay too
> > much.
> So, if i correctly understand this, this is not the solution, right ? ;-)

Right -- unless you (or anybody else) can build a "split" SRT stage that
fits into a single stage.

> I currently instantiate 8 * 8bits divider, 4 * 16bits divider and so on. So
> we can use them in parallel. I dont know how it increase the size of the unit
> compared to the reuse of 32bits divider.
> 
> > I proposed a solution a while ago (YG said it was "crayzee as usual"),
> > but it requires a lot of code duplication (150% overhead, compared to
> > a "straight" 64-bit divider).
> And what is the solution ? I am really interested by it.

I planned to re-use the wide units for smaller chunks.  That is, the
64-bit divider also handles the uppermost chunk in all SIMD modes, and
separate (smaller) units handle the rest.  All you need is a 64-bit
divider, a 32-bit divider, two 16-bit and four 8-bit ones (that's
where the 150% overhead come from, btw).

If we label the units like this:

	A   : 64-bit unit
	B   : 32-bit unit
	C-D : 16-bit units
	E-H :  8-bit units

the operand bytes will be handled this way:

	76543210 | 
	=========#=======
	AAAAAAAA | 64-bit
	AAAABBBB | 32-bit
	AACCBBDD | 16-bit
	AECFBGDH |  8-bit

Compared to "full" duplication, this saves one 32-bit unit, two 16-bit
and four 8-bit ones.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/