[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[f-cpu] New EU_SHL Instruction
Hi guys!
While skimming through some AltiVec documentation the other day I
noticed that they have nice `permute' and `select' instructions that
let you shuffle the chunks inside a vector any way you like. It's a
general instruction that can replace both `sdup' and `sbyterev' - and
can be easily implemented in the current SHL execution unit. The basic
function works as follows (beware, pseudo-VHDL!):
function permute (A, B : in F_VECTOR) return F_VECTOR is
variable Y : F_VECTOR;
begin
for i in 0 to NUMBER_OF_CHUNKS - 1 loop
chunk(Y, i) := chunk(A, to_integer(chunk(B, i)));
end loop;
return Y;
end permute;
I therefore suggest the following ISA update:
vperm.size r3, r2, r1 // new instruction: `vector permute'
performs the `permute' function described above, with r3 being
the selector (B), r2 being the source (A) and r1 the result
(Y). Only the required bits of the selector chunks are used
(e.g. bits 2...0 if there are 8 chunks).
`vperm' can perform chunk-wise shifts. It's not suitable
as a replacement for `cshiftl', however - you have to
set up the selector register somehow, and you'll need
cshiftl to do that. `cshiftr', on the other hand, may
be emulated by `vperm' (in a less efficient manner).
vsel.size r3, r2, r1 // new instruction: `vector select'
same as `vperm', but only the least significant chunk of
the result is returned (with zero extension). Again, only the
required bits of the selector are used. This instruction lets
you read any chunk of a register with minimal effort.
Note that `vperm' is the SIMD variant of `vsel', but I think
that the name is more intuitive than `svsel'. We can keep the
latter as an alias, however.
vseli.size $imm8, r2, r1 // new instruction: `vector select immediate'
same as `vsel', but with an 8-bit unsigned immediate
selector. The SIMD variant `svseli' (or `vpermi') is
probably less useful, but one never knows...
sdup.size r2, r1 // changed instruction
will survive as an alias for `vperm.size r0, r2, r1' which has
exactly the same effect.
[s]byterev.size r2, r1 // unchanged
will stay the same. Note that `sbyterev' can be emulated
with `vperm', but the non-SIMD `byterev' can't without an
additional zero-extension instruction.
This change is so useful and so cheap to implement that I consider it
a must-have. Any objections?
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/