[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

scatter/gather op ( was:Re: [f-cpu] New EU_SHL Instruction)

To: f-cpu@seul.org
Subject: scatter/gather op ( was:Re: [f-cpu] New EU_SHL Instruction)
From: nico <nicolas.boulay@ifrance.com>
Date: Fri, 10 Jan 2003 20:59:36 +0000
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Thu, 09 Jan 2003 14:59:03 -0500
In-reply-to: <20030109140151.50113@thrai.stud.uni-hannover.de>
References: <20030108084730.18996.qmail@web14910.mail.yahoo.com><3E1CC977.7070803@f-cpu.org><20030109140151.50113@thrai.stud.uni-hannover.de>
Reply-to: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

On Thu, 9 Jan 2003 14:01:51 +0100
Michael Riepe <michael@stud.uni-hannover.de> wrote:

> On Thu, Jan 09, 2003 at 01:59:35AM +0100, Yann Guidon wrote:
> [...]
> > and_reduce (or "combine" as written in ROP2) is not possible
> > for very wide data.
> > 
> > Furthermore, the xorn.and trick is useful for "detecting" that a
> > byte corresponds, but if you need to find the index of the
> > character, the "obvious" answer is to loop over the register.
> > if you have a result of 0x00FF000000000000, it's not a good
> > solution. So the idea is to "transpose" the bits in the word, that
> > would become 0x4040404040404040 and the last byte can then ben
> > binary encoded in INC (if it's implemented).
> 
> Wouldn't it be sufficient to `collapse' each chunk into a single bit?

that's a gather intra-chunk operation. (Such gather op are a lack in
all the f-cpu ISA because inter-chunk operation are maid in 64 bits cpu
instead of thinking about a 256 bits version.)

A add gather could be usefull too !

gather.add.64 V1 V2 R3

R3 = V1[0]+V1[1]+V1[2]+V1[3]
    +V2[0]+V2[1]+V2[2]+V2[3]

(big tree adder ?)

This avoid stupid end of loop in many mathematical operation (imagine
unroll MAC op for digital filter) :

int X[100], Coeff[100], out;

init(Coeff);

out=0;
for(int i ; i<100; i++)
{
 out+=X[i]*Coeff[i];
}

Such loop are a dream for SIMD (8*32=256 bits register) :

V8i X[100/8], Coeff[100/8], Vout1,Vout2;
int out;
init(Coeff);

out=0;
for(int i ; i< (floor(100/8)=96); i+=2)
{
 Vout1+=X[i]*Coeff[i];
 Vout2+=X[i+1]*Coeff[i+1]; /*for masking the internal depencies of the
mac op !*/
}

for(int i; i < (rest(100,8)=4);i++)
{
	out+=(int)X[i]*(int)Coeff[i]
}

out+=scatter_add(Vout1,Vout2);

return out;

This kind of scatter avoid you to do strange manipulations with the
vector in registers. This is Vector-Vector->Scalar or
Vector-Scalar->Scalar operations. The inverse could be usefull too
(scatter) : Scalar-Scalar-> Vector.

Add is the most evident op for such thing but maybe other op could be
usefull too ?

For bit-wise operation, like and/or_reduice, this is intra-chunk op.
Because bit-width op are only SIMD with 1 bit integer :)

nicO


> That is, if the chunk's value is not zero, the corresponding bit will
> be set, otherwise it will be zero:
> 
> 	r2 = 0xab00cd00ef0000
> 	collapse.b r2, r1
> 	r1 <= 0x54
> 	collapse.d r2, r1
> 	r1 <= 0x0e
> 
> and so on. A complementary `uncollapse' instruction would be nice,
> too (it would allow you to generate chunk masks more easily):
> 
> 	r2 = 0x5a
> 	uncollapse.b r2, r1
> 	r1 <= 0x00ff00ffff00ff00
> 	uncollapse.d r2, r1
> 	r1 <= 0x0000ffff0000ffffffff0000ffff0000	// yes, that's 128 bits ;)
> 
> -- 
>  Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
>  "All I wanna do is have a little fun before I die"
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
> _____________________________________________________________________
> GRAND JEU SMS : Pour gagner un NOKIA 7650, envoyez le mot IF au 61321
> (prix d'un SMS + 0.35 euro). Un SMS vous dira si vous avez gagn_.
> R_glement : http://www.ifrance.com/_reloc/sign.sms
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: scatter/gather op ( was:Re: [f-cpu] New EU_SHL Instruction)
  - From: Yann Guidon <whygee@f-cpu.org>

References:
- Re: [f-cpu] New EU_SHL Instruction
  - From: Just an Illusion <illusion_to_net@yahoo.fr>
- Re: [f-cpu] New EU_SHL Instruction
  - From: Yann Guidon <whygee@f-cpu.org>
- Re: [f-cpu] New EU_SHL Instruction
  - From: Michael Riepe <michael@stud.uni-hannover.de>

Prev by Date: Re: [f-cpu] Are 8 bits SIMD mode usefull ?
Next by Date: Re: [f-cpu] New EU_SHL Instruction
Previous by thread: Re: [f-cpu] New EU_SHL Instruction
Next by thread: Re: scatter/gather op ( was:Re: [f-cpu] New EU_SHL Instruction)
Index(es):
- Date
- Thread