[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] New EU_SHL Instruction



Hi guys!

While skimming through some AltiVec documentation the other day I
noticed that they have nice `permute' and `select' instructions that
let you shuffle the chunks inside a vector any way you like. It's a
general instruction that can replace both `sdup' and `sbyterev' - and
can be easily implemented in the current SHL execution unit. The basic
function works as follows (beware, pseudo-VHDL!):

    function permute (A, B : in F_VECTOR) return F_VECTOR is
        variable Y : F_VECTOR;
    begin
        for i in 0 to NUMBER_OF_CHUNKS - 1 loop
            chunk(Y, i) := chunk(A, to_integer(chunk(B, i)));
        end loop;
        return Y;
    end permute;

I therefore suggest the following ISA update:

    vperm.size r3, r2, r1       // new instruction: `vector permute'

        performs the `permute' function described above, with r3 being
        the selector (B), r2 being the source (A) and r1 the result
        (Y).  Only the required bits of the selector chunks are used
        (e.g. bits 2...0 if there are 8 chunks).

        `vperm' can perform chunk-wise shifts. It's not suitable
        as a replacement for `cshiftl', however - you have to
        set up the selector register somehow, and you'll need
        cshiftl to do that. `cshiftr', on the other hand, may
        be emulated by `vperm' (in a less efficient manner).

    vsel.size r3, r2, r1        // new instruction: `vector select'

        same as `vperm', but only the least significant chunk of
        the result is returned (with zero extension). Again, only the
        required bits of the selector are used. This instruction lets
        you read any chunk of a register with minimal effort.

        Note that `vperm' is the SIMD variant of `vsel', but I think
        that the name is more intuitive than `svsel'. We can keep the
        latter as an alias, however.

    vseli.size $imm8, r2, r1    // new instruction: `vector select immediate'

        same as `vsel', but with an 8-bit unsigned immediate
        selector. The SIMD variant `svseli' (or `vpermi') is
        probably less useful, but one never knows...

    sdup.size r2, r1            // changed instruction

        will survive as an alias for `vperm.size r0, r2, r1' which has
        exactly the same effect.

    [s]byterev.size r2, r1      // unchanged

        will stay the same. Note that `sbyterev' can be emulated
        with `vperm', but the non-SIMD `byterev' can't without an
        additional zero-extension instruction.

This change is so useful and so cheap to implement that I consider it
a must-have. Any objections?

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/