[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[f-cpu] Alternative ROP2 Implementation



During those boring easter holidays ;) I have found another way to
implement the ROP2 unit.  It's based on the formulas (in pseudo-C):

      a &  b   ==  b ?  a :  0      // and
      a & ~b   ==  b ?  0 :  a      // andn
      a ^  b   ==  b ? ~a :  a      // xor
      a |  b   ==  b ?  1 :  a      // or
    ~(a |  b)  ==  b ?  0 : ~a      // nor
    ~(a ^  b)  ==  b ?  a : ~a      // xnor
      a | ~b   ==  b ?  a :  1      // orn
    ~(a &  b)  ==  b ? ~a :  1      // nand
                   b ?  a :  c      // mux
                   b ?  c :  a      // muxr (new: "reversed" mux)

Note the similarity between and/andn and mux/muxr.  The attached GIF
shows the actual implementation.  The five signals: 0, 1, a, ~a and c,
are "precomputed" (only ~a and c actually need any gates) and passed to
two n-bit wide 8:1 muxes that are directly controlled by the opcode's
function bits.  The individual bits of b then select from their outputs
(that's a row of <n> 1-bit 2:1 muxes).

The main advantage of this kind of circuit is that the `b' operand signals
may come later than the rest.  That allows to put a SIMD <n>:2**<n>
decoder in front of it which will help providing the full set of `bitop'
instructions:

    band    y =   a &  (1 << b)     // also called btst
    bandn   y =   a & ~(1 << b)     // also called bclr
    bxor    y =   a ^  (1 << b)     // also called bchg
    bor     y =   a |  (1 << b)     // also called bset
    bnor    y = ~(a |  (1 << b))    // new
    bxnor   y = ~(a ^  (1 << b))    // new
    born    y =   a | ~(1 << b)     // new
    bnand   y = ~(a &  (1 << b))    // new

with a latency of just 1 cycle (which won't work with the SHL unit).

The fcpu-mr-rop2-20030421.tar.gz package (second attachment) contains a
rewrite of the ROP2 unit that supports all instructions mentioned above,
as well as combine mode up to a chunk size of 64 bits (but only for the
ordinary logical operators, not for bitop -- I doubt that it makes sense
for them).  Latency is critical in combine mode (I had to violate the
6G rule again, but I still obey the 10T rule), therefore I'd like to
receive synthesis and speed reports.

The unit has been tested with both Simili and Vanilla, the testbench
I used is included in the package.  You'll also need some stuff from
the `common' directory; see eu_rop2/Makefile for details.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"

Attachment: rop2.gif
Description: GIF image

Attachment: fcpu-mr-rop2-20030421.tar.gz
Description: application/gunzip