[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
EU Report (was: Re: [f-cpu] Register set revised)
On Wed, Mar 19, 2003 at 10:10:01AM +0100, devik wrote:
[...]
> By the way F-CPU is missing SAD insn which is very useful for video.
How sad ;)
> And it can't be coded effeciently in f-cpu. At least sum of bytes
> inside register would be needed.
What exactly is the SAD insn supposed to do?
A sum-of-bytes instruction could be integrated with EU_POPC -- that
unit typically also includes an N:1 byte adder (for popcount with wider
chunks). I also consider making EU_POPC more general (and rename it
to EU_AFU, for "additional functions unit", or maybe "a fun unit" ;-).
Since it's full of XORs anyway, it may also be able to perform things
like CRC and ECC generation or similar stuff (simple parity is already
there: it's in the LSB of the popcount result).
While we're talking about EUs... I revisited the ASU recently (after
more than a year), and I found a way to include some additional
functions. We now also have
Y = A + B + 1 (including correct carry-out)
Y = A - B - 1 (including correct carry-out)
with fast 8-bit results (1 cycle) and
Y = (A + B) / 2
Y = (A + B + 1) / 2
Y = (B - A) / 2
Y = (B - A - 1) / 2
which always take two cycles. The third and fourth functions are
probably most interesting (average with up/down rounding). With a minor
modification (which I've not done yet), it should also be possible to
derive the `increment' flag from a third input port and calculate
tmp = A ± (B + lsb(C))
Y = tmp % pow(2, chunksize)
Z = tmp / pow(2, chunksize)
Since that's a 3r2w operation, don't expect it to be implemented in FC0.
It would be quite useful for adding/subtracting big numbers, however.
I also have complete INC and CMP units available. EU_INC performs
inc/dec/neg in 1 cycle, and abs/nabs/lsb0/lsb1 in 2 cycles. In lsb mode,
two different results are available: a single-bit mask that selects the
bit searched for, or a binary encoded index. The same is true for msb
(but that's handled by the CMP unit). EU_CMP also performs cmpg (and its
inverse, cmple), or min and max (at the same time, via separate output
ports -- which also gives us minmax, aka `sort'). EU_CMP operations
always take two cycles.
All of them (that is, ASU/INC/CMP) auto-scale to integer multiples of
64 bits (the maximum chunk size of 64 bits doesn't increase, however).
I'll add this feature to the other EUs later (it's a bit more difficult
with EU_IMU and EU_SHL).
EU_IMU is subject to change. Since I managed to squeeze some delay out of
EU_ASU, I may also be able to make some room inside the multiplier (which
is basically a huge n-input adder). I hope that will provide enough space
for output muxes that let us select between mul/amac results on one hand
and mac[l|h] results on the other (output port usage is pretty different
between these groups, and I don't want to add another set of 8 ports).
It may also improve the timing a bit.
EU_IDU still isn't finished. I have a normalizer (a data-driven bit
shifter based on an omega network, which can also be used in int->fp
conversions) and a radix-2 SRT core. I'm also investigating radix-16 SRT
but the lookup tables for the partial quotients seem to be too complex.
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/