[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Instruction census

To: f-cpu@seul.org
Subject: Re: [f-cpu] Instruction census
From: Cyrano <cyrano@nerim.net>
Date: Wed, 15 Jan 2003 19:47:42 +0000
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Tue, 14 Jan 2003 13:50:08 -0500
In-reply-to: <Pine.LNX.4.33.0301131055580.518-100000@devix>
References: <20030113000719.11954@thrai.stud.uni-hannover.de><Pine.LNX.4.33.0301131055580.518-100000@devix>
Reply-to: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

Hum, i need some explanation to understand it all.

On Mon, 13 Jan 2003 12:25:25 +0100 (CET)
devik <devik@cdi.cz> wrote:

> > I compiled a set of C files (among them libelf and fctools) with
> > fcpu-gcc and started to count instructions.  The most often used
> > single instruction was an ordinary unconditional `move':
> 
> was was not it one with comment "zero_extend" ? Most of them will
> be gone but we will have to duplicate almost ALL patterns
> with implicit zero extend (as fcpu zeroextends automatically
> as x86-64 from AMD does).
> I'm trying to learn combiner to do it by changing its code
> but it is not simple at all.
> Fortunately one of czech core gcc developers devoted his time
> to communicate with and help me.
> 
> > The vast majority of instructions are load/store,
> > add/sub/loadaddr[i],
> 
> yep ... gcc can't prefetch and "cache" data in registers too
> early because of read-write ordering rules which can't be
> resolved by aliasing analysis. It is why IA-64 has ld.a
> instruction. There if flag which tells gcc to use "possibly


Could you explain what does ld.a ? What is the read-write ordering rules
?

> dangerous" early fetches - but it doesn't follow C standard then.
> The add/sub case: CSE generates all addresses as PLUS of
> first seen address and constant. Then in next pass it looks
> for all load pairs and tries to use post-increment on those
> loads. What often prevent it, are labels in between (then we
> could arrive here from more places possibly with unknown value
> of address register. This could be resolved by complete CFG
> analysis which gcc doesn't do just now - but I'd not expect
> to find much more of them. Also I see it as problem - before
> scheduling we don't know whether add or post-inc is better
> for scheduling - both is possible.
> 
> > This is mostly the profile I expected from standard software.  Note
> > that I compiled with `-O -fomit-frame-pointer'; otherwise, the
> > result would
> 
> WARNING: -fomit-frame-pointer produces sometimes addi with
> inwalid (out of range) imm. I'm not still sure why.
> 
> > Another interesting fact is that 1/4 of the multiplications are
> > actually`mac' operations (most of them of the kind where all
> > operands have the same size).  One can also observe that add, sub,
> > xor and shift[lr] are
> 
> well, mac is not yet supported - we have generaly problems
> with ^1 register addressing.
> 

What kind of problem ?

> > 	- reduce number of load/store instructions
> > 	- increase number of conditional moves (in favor of jmp{cc})
> 
> how ? it would probably help to manualy find places
> where movcc could be used and is not
> 
> > 	- avoid shift-and-add where mul/mac is faster
> 
> done.
> 
> > 	- make use of divrem[s] instruction
> 
> the same problem as with mac - but this one basicaly
> works for some cases. Unfortunately there is no much
> of such places wgere both rem & div is needed...
> 
> > 	- make use of SIMD instructions
> 
> in string ops .. well. for other, we can enable gcc to
> use vector modes explicitly but it works oonly with
> programs whose knows how to use them. gcc will not
> emit them itself.
> 

It's even impossible to detect "obvious" scheme ? (like C[i]=A[i]*B[i]
or A+=V1[i]*V2[i] ?)

> devik
> 

Michael wrote :
> By the way: If we implement load-linked/store-conditional, we may also
> implement speculative loads á la Itanium. The mechanism is the same:
> keep an eye on the memory location and set a flag when it's been
> stored into after the load.  When the `completion load' finds the flag
> set, it re-fetches the data.  We could handle that kind of things in
> the LSU.

(Be carfull on itanium patent !)

I don't like too much load-linked/store-conditional because it's unusful
in multi-core env. But in that case, you want to support such case to
avoid gcc mistake on preloading (or prefetching) ?

nicO

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: [f-cpu] Instruction census
  - From: devik <devik@cdi.cz>

References:
- [f-cpu] Instruction census
  - From: Michael Riepe <michael@stud.uni-hannover.de>
- Re: [f-cpu] Instruction census
  - From: devik <devik@cdi.cz>

Prev by Date: Re: [f-cpu] Instruction census
Next by Date: [f-cpu] slashdot
Previous by thread: Re: [f-cpu] Instruction census
Next by thread: Re: [f-cpu] Instruction census
Index(es):
- Date
- Thread