[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

EU Report (was: Re: [f-cpu] Register set revised)



On Wed, Mar 19, 2003 at 10:10:01AM +0100, devik wrote:
[...]
> By the way F-CPU is missing SAD insn which is very useful for video.

How sad ;)

> And it can't be coded effeciently in f-cpu. At least sum of bytes
> inside register would be needed.

What exactly is the SAD insn supposed to do?

A sum-of-bytes instruction could be integrated with EU_POPC -- that
unit typically also includes an N:1 byte adder (for popcount with wider
chunks).  I also consider making EU_POPC more general (and rename it
to EU_AFU, for "additional functions unit", or maybe "a fun unit" ;-).
Since it's full of XORs anyway, it may also be able to perform things
like CRC and ECC generation or similar stuff (simple parity is already
there: it's in the LSB of the popcount result).

While we're talking about EUs... I revisited the ASU recently (after
more than a year), and I found a way to include some additional
functions.  We now also have

	Y = A + B + 1 (including correct carry-out)
	Y = A - B - 1 (including correct carry-out)

with fast 8-bit results (1 cycle) and

	Y = (A + B) / 2
	Y = (A + B + 1) / 2
	Y = (B - A) / 2
	Y = (B - A - 1) / 2

which always take two cycles.  The third and fourth functions are
probably most interesting (average with up/down rounding).  With a minor
modification (which I've not done yet), it should also be possible to
derive the `increment' flag from a third input port and calculate

	tmp = A ± (B + lsb(C))
	Y = tmp % pow(2, chunksize)
	Z = tmp / pow(2, chunksize)

Since that's a 3r2w operation, don't expect it to be implemented in FC0.
It would be quite useful for adding/subtracting big numbers, however.

I also have complete INC and CMP units available.  EU_INC performs
inc/dec/neg in 1 cycle, and abs/nabs/lsb0/lsb1 in 2 cycles.  In lsb mode,
two different results are available: a single-bit mask that selects the
bit searched for, or a binary encoded index.  The same is true for msb
(but that's handled by the CMP unit).  EU_CMP also performs cmpg (and its
inverse, cmple), or min and max (at the same time, via separate output
ports -- which also gives us minmax, aka `sort').  EU_CMP operations
always take two cycles.

All of them (that is, ASU/INC/CMP) auto-scale to integer multiples of
64 bits (the maximum chunk size of 64 bits doesn't increase, however).
I'll add this feature to the other EUs later (it's a bit more difficult
with EU_IMU and EU_SHL).

EU_IMU is subject to change.  Since I managed to squeeze some delay out of
EU_ASU, I may also be able to make some room inside the multiplier (which
is basically a huge n-input adder).  I hope that will provide enough space
for output muxes that let us select between mul/amac results on one hand
and mac[l|h] results on the other (output port usage is pretty different
between these groups, and I don't want to add another set of 8 ports).
It may also improve the timing a bit.

EU_IDU still isn't finished.  I have a normalizer (a data-driven bit
shifter based on an omega network, which can also be used in int->fp
conversions) and a radix-2 SRT core.  I'm also investigating radix-16 SRT
but the lookup tables for the partial quotients seem to be too complex.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/