[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] New EU_SHL Instruction

To: F-CPU Mailing List <f-cpu@seul.org>
Subject: [f-cpu] New EU_SHL Instruction
From: Michael Riepe <michael@stud.uni-hannover.de>
Date: Tue, 7 Jan 2003 06:47:10 +0100
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Tue, 07 Jan 2003 12:08:46 -0500
Reply-to: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

Hi guys!

While skimming through some AltiVec documentation the other day I
noticed that they have nice `permute' and `select' instructions that
let you shuffle the chunks inside a vector any way you like. It's a
general instruction that can replace both `sdup' and `sbyterev' - and
can be easily implemented in the current SHL execution unit. The basic
function works as follows (beware, pseudo-VHDL!):

    function permute (A, B : in F_VECTOR) return F_VECTOR is
        variable Y : F_VECTOR;
    begin
        for i in 0 to NUMBER_OF_CHUNKS - 1 loop
            chunk(Y, i) := chunk(A, to_integer(chunk(B, i)));
        end loop;
        return Y;
    end permute;

I therefore suggest the following ISA update:

    vperm.size r3, r2, r1       // new instruction: `vector permute'

        performs the `permute' function described above, with r3 being
        the selector (B), r2 being the source (A) and r1 the result
        (Y).  Only the required bits of the selector chunks are used
        (e.g. bits 2...0 if there are 8 chunks).

        `vperm' can perform chunk-wise shifts. It's not suitable
        as a replacement for `cshiftl', however - you have to
        set up the selector register somehow, and you'll need
        cshiftl to do that. `cshiftr', on the other hand, may
        be emulated by `vperm' (in a less efficient manner).

    vsel.size r3, r2, r1        // new instruction: `vector select'

        same as `vperm', but only the least significant chunk of
        the result is returned (with zero extension). Again, only the
        required bits of the selector are used. This instruction lets
        you read any chunk of a register with minimal effort.

        Note that `vperm' is the SIMD variant of `vsel', but I think
        that the name is more intuitive than `svsel'. We can keep the
        latter as an alias, however.

    vseli.size $imm8, r2, r1    // new instruction: `vector select immediate'

        same as `vsel', but with an 8-bit unsigned immediate
        selector. The SIMD variant `svseli' (or `vpermi') is
        probably less useful, but one never knows...

    sdup.size r2, r1            // changed instruction

        will survive as an alias for `vperm.size r0, r2, r1' which has
        exactly the same effect.

    [s]byterev.size r2, r1      // unchanged

        will stay the same. Note that `sbyterev' can be emulated
        with `vperm', but the non-SIMD `byterev' can't without an
        additional zero-extension instruction.

This change is so useful and so cheap to implement that I consider it
a must-have. Any objections?

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: [f-cpu] New EU_SHL Instruction
  - From: Yann Guidon <whygee@f-cpu.org>

Prev by Date: [f-cpu] GCC and jmpz vs. jmpl
Next by Date: [f-cpu] statistics of direct indexing usage
Previous by thread: Re: [f-cpu] GCC and jmpz vs. jmpl
Next by thread: Re: [f-cpu] New EU_SHL Instruction
Index(es):
- Date
- Thread