[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Instruction census



> I compiled a set of C files (among them libelf and fctools) with fcpu-gcc
> and started to count instructions.  The most often used single instruction
> was an ordinary unconditional `move':

was was not it one with comment "zero_extend" ? Most of them will
be gone but we will have to duplicate almost ALL patterns
with implicit zero extend (as fcpu zeroextends automatically
as x86-64 from AMD does).
I'm trying to learn combiner to do it by changing its code
but it is not simple at all.
Fortunately one of czech core gcc developers devoted his time
to communicate with and help me.

> The vast majority of instructions are load/store, add/sub/loadaddr[i],

yep ... gcc can't prefetch and "cache" data in registers too
early because of read-write ordering rules which can't be
resolved by aliasing analysis. It is why IA-64 has ld.a
instruction. There if flag which tells gcc to use "possibly
dangerous" early fetches - but it doesn't follow C standard then.
The add/sub case: CSE generates all addresses as PLUS of
first seen address and constant. Then in next pass it looks
for all load pairs and tries to use post-increment on those
loads. What often prevent it, are labels in between (then we
could arrive here from more places possibly with unknown value
of address register. This could be resolved by complete CFG
analysis which gcc doesn't do just now - but I'd not expect
to find much more of them. Also I see it as problem - before
scheduling we don't know whether add or post-inc is better
for scheduling - both is possible.

> This is mostly the profile I expected from standard software.  Note that
> I compiled with `-O -fomit-frame-pointer'; otherwise, the result would

WARNING: -fomit-frame-pointer produces sometimes addi with
inwalid (out of range) imm. I'm not still sure why.

> Another interesting fact is that 1/4 of the multiplications are actually
> `mac' operations (most of them of the kind where all operands have the
> same size).  One can also observe that add, sub, xor and shift[lr] are

well, mac is not yet supported - we have generaly problems
with ^1 register addressing.

> 	- reduce number of load/store instructions
> 	- increase number of conditional moves (in favor of jmp{cc})

how ? it would probably help to manualy find places
where movcc could be used and is not

> 	- avoid shift-and-add where mul/mac is faster

done.

> 	- make use of divrem[s] instruction

the same problem as with mac - but this one basicaly
works for some cases. Unfortunately there is no much
of such places wgere both rem & div is needed...

> 	- make use of SIMD instructions

in string ops .. well. for other, we can enable gcc to
use vector modes explicitly but it works oonly with
programs whose knows how to use them. gcc will not
emit them itself.

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/