[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Instruction census



On Wed, Jan 15, 2003 at 07:34:11PM +0000, cedric wrote:

> >   98104 instructions total
> >   13079 move            13.3% ( 13.3% total)
> >   10846 load            11.0% ( 24.3% total)
> >    3545 storei           3.6% ( 78.1% total)
> >    3533 loadi            3.6% ( 81.7% total)
> >    3459 store            3.5% ( 85.3% total)
> 
> >   21383 lsu             21.7% ( 21.7% total)
> 
> 	Did you use the new register allocator for this test ? I hope it will change 
> a lot of result for RISC CPU, and perhaps it will remove some loadi/storei, 
> but I don't know the impact for load/store (Did you have an idea why we have 
> 3 time more load than store, but loadi and storei are very close) .

Which register allocator?  Did I miss something?  I used what Martin
provided, plus my own bug fixes and backend extensions.

> > Goals for optimization (IMHO):
> > 	- reduce number of load/store instructions
> 
> Perhaps it's more easy for loadi/storei, but I really want to know where all 
> this load came from.

Most of it comes from the code itself; with -O -fomit-frame-pointer,
save/restore instructions are reduced to a minimum.

> > 	- increase number of conditional moves (in favor of jmp{cc})
> > 	- avoid shift-and-add where mul/mac is faster
> 
> Hum, what about a "mac"shift instruction ?

Martin proposed `shadd[i]' which calculates`r1 += r2 << r3_or_uimm8',
or similar.  But that is rather hard to do with separate SHL and ASU
execution units, and won't be faster than explicit shift and add
instructions.  In fact, explicit instructions are more flexible.
But an immediate version of `amac' would make sense, IMHO.  I'll add
one and see what happens.

> > 	- make use of divrem[s] instruction
> > 	- make use of SIMD instructions
> 
> I think that gcc support SIMD only for string function, it's really hard to 
> give a real SIMD support to gcc.

Gcc 3.x has real SIMD support, but you'll have to use builtin functions
explicitly (or use the <altivec.h> interface, where available).  E.g.
with my patched version of the latest official fcpu-gcc release,
`fcpu-gcc -S -O -fomit-frame-pointer' translates

	/* use vecturs that consist of two floats */
	typedef float __v2sf __attribute__((__mode__(__V2SF__)));

	__v2sf
	sfmac_f(__v2sf a, __v2sf b, __v2sf c) {
	    return __builtin_addv2sf(a, __builtin_mulv2sf(b, c));
	}

into

		.p2align 5
		.global sfmac_f
	sfmac_f:
		sfmac.32 r3,r2,r1
		jmp r63

which is all you can expect.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/