[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] DivMod unit



> Yep... standard "restoring" division.
> Minor improvement:
> 
> 	tmpB := B shl (WIDTH - 1);
> 	for i in WIDTH-1 downto 0 loop
> 		-- calculate result bit here
> 		tmpB := tmpB shr 1;
> 	end loop;
> This avoids the variable-size shift inside the loop (which would take
> at least 1 extra clock cycle).
Ok, that's change has been done.
 
> > As you can see all this unit need is a really good substraction function.
> > The one that I want to write will normally work for all size of data (if
> > it's a power of 4). The problem is that it doesn't work actually for a
> > other size than 8 (I have some porblem with index, it will be solved next
> > week).
> There's a general problem here: when the word size grows, the delay
> will increase as well (O(log(n))).  A 64-bit divider will need 2 cycles for
> each result bit, larger units will be even slower.  Another drawback
> is that this kind of algorithm doesn't work with signed operands.
It's sure. But i don't know if we need to do signed division, only a prestage
to positive them and a postage to negative them back, if they are negative is
needed. So i don't know, if we must do this in divide unit or not.
 
> > I think, that i will probably rewrite my substraction function. Because 
> > all the perfomance of this unit depend on it. I read a lot about the way
> > to jump over 0, do SRT divide, and I think that for F-CPU this type of
> > algorithm are perhaps not a good idea. Because if some case are accelerated
> > a lot of others are really slow down. And this type of algorithm are not
> > previsible, and the average speed of this unit is slower than this unit.
> > It's why i think it's not a good solution to use them.
> If operating on 8-bit bytes, the simple approach is faster -- it can
> perform an 8-bit division in 8 cycles --, but for larger word sizes,
> you need additional cycles for the subtract operation, which makes the
> SRT algorithm faster (1 cycle per bit, plus a few cycles for input
> preparation -- operand normalization -- and output postprocessing).
So, it's ok. I am know learning howto do a good SRT/RADIX division. The
question that I have, is how big must I do it (I read some little things
about 2-, 4-, 8-bits Raddix/SRT divider). Perhaps a solution is to do a C
program that will generate the SRT/RADDIX divider that you want.
 
> Since we can calculate eight 8-bit quotients at a time (SIMD),
> pipelining doesn't seem really useful to me.
I am really stupid, I forgot this ;-)
 
An other things, I am currently looking to the divider unit (idiv.vhdl). And
I thing that it's possible to do 8, 16, 32 and 64bits divisions in parallel.
So I am curently redesigning this unit to do it, so I change the entity
prototype. I had :
 - Some bits to say that the result is ready
 - Some others to say that's the result will be ready in LENGTH cycle (it's
a quick approximation, for the scheduler)
 - And some non-interesting bits to say wich unit is currently used.
I had the same bits for the divmod component (but not the non-intersting bits
;-)

So what did you think about that ?

A+
  Cedric

PS: the current eu_idiv on my account at epita did not work, I will finished it
 during the week end.
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/