[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Instruction census



> > yep ... gcc can't prefetch and "cache" data in registers too
> > early because of read-write ordering rules which can't be
> > resolved by aliasing analysis. It is why IA-64 has ld.a
> > instruction. There if flag which tells gcc to use "possibly
>
> Could you explain what does ld.a ? What is the read-write ordering rules
> ?

ld.a is so called advanced load of IA-64 which loads data
from address and places the address into TLB like structure
(with 64 entries for Itanium IIRC).
Later completion load checks whether the item is still present,
if yes, latter load in NOP. If no, the latter load reloads
data.
In C if you write *a = b; c = *d; you have to ensure RAW
ordering of "a" store and "d" read because you don't often
don't know whether they can alias (interestingly
*a = b, c = *d; don't need to enforce it - it is paralel
expression).
IA64 can do advanced load before store and then do check
load after store - it is often nop.
If CPU would store to the same address it would remove
store address from the disambiguation memory thus forces
latter check-load to reload.
The same effect has full disambiguation memory (old pointer
dropped) or other CPU dirtying cache line of pointer.

IA64 has also speculative load ld.s - its only difference is
that it doesn't trap - if it would like trap it stores the condition
in flag associated to the destination register.
When such value would be used it would then really trap.
Completion load can check the flag and jump to special stub
code to handle it.
It allows you to load *a in expression if (a) b = *a; without
change of side effect.

> >
> > well, mac is not yet supported - we have generaly problems
> > with ^1 register addressing.
> >
> What kind of problem ?

MR explained me that MAC doesn't have that problem. But
MUX has - there is no clean way to tell gcc that insn
places result to register r^1 ...
It can be done for divrem because it uses expander. But
MUX will gave to be solved by 3-2 split point of gcc only
because it consists of 3 RTL insns.

> > use vector modes explicitly but it works oonly with
> > programs whose knows how to use them. gcc will not
> > emit them itself.
> >
> It's even impossible to detect "obvious" scheme ? (like C[i]=A[i]*B[i]
> or A+=V1[i]*V2[i] ?)

no. It requires vectorizer which is not present in gcc yet.
But loop optimizer is planed to be rewritten and vectorizer
will be probably added too.

> > By the way: If we implement load-linked/store-conditional, we may also
> > implement speculative loads á la Itanium. The mechanism is the same:
> > keep an eye on the memory location and set a flag when it's been
> > stored into after the load.  When the `completion load' finds the flag
> > set, it re-fetches the data.  We could handle that kind of things in
> > the LSU.
>
> (Be carfull on itanium patent !)

which one ? so I know what things don't thing about at all .. :-(

BTW what is load-linked/store-conditional ?

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/