[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Instruction census



> > yep ... gcc can't prefetch and "cache" data in registers too
> > early because of read-write ordering rules which can't be
> > resolved by aliasing analysis. It is why IA-64 has ld.a
> > instruction. There if flag which tells gcc to use "possibly
> > dangerous" early fetches - but it doesn't follow C standard then.
>
> It does.  If the prefetched memory location has been modified when the
> real load follows, an exception is thrown and the exception handler is
> supposed to re-fetch the data.

you speak about IA-64 advanced load, aren't you ? I know it - you
even don't need to use exception if the only way is to refetch
data - you can use completion load which will reload register
from memory if ALOAD association entry is missing.

But I was speaking about -fsched-spec-load-dangerous which allows
moves of loads even for CPUs without disambiguation support
like F-CPU is - and then it is not strictly C-compliant (it does
some assumtions on aliasing).

> > WARNING: -fomit-frame-pointer produces sometimes addi with
> > inwalid (out of range) imm. I'm not still sure why.
>
> fcpu-as didn't complain so far (only for the loadcons[x] case).  Which
> value did it produce?

I tried to fix it in fcpu.c by means of fcpu_need_fp_p which should
detect cases when we can't eliminate FP because elimination is done
during reload when we can't change addi to loadcons+add. And when
elimination distance between SP and FP is >127 then is is likely that
addi $372,... is produced (because elimination substitution is
not checked by recog then).
I hope the fcpu_need_fp_p will catch all cases but my gcc friend
told me to investigate IA64 port more - it handles the same case
and maybe it is more inteligent.

> > > Another interesting fact is that 1/4 of the multiplications are actually
> > > `mac' operations (most of them of the kind where all operands have the
> > > same size).  One can also observe that add, sub, xor and shift[lr] are
> >
> > well, mac is not yet supported - we have generaly problems
> > with ^1 register addressing.
>
> `mac' doesn't use a second output - it's 3r1w, not 2r2w.
> And it is supported by gcc now, in all variants :)

ohh :) good news ! I overlooked that r1 is both src and dst. BTW how
will be it implemented in HW ?
Did someone think about shadd like ia64 has ? I'd expect immediate
form where immediate would tell you no of left shifts of one
operand - it could be much faster than mac and seems to be used
often in code .... Will have to check its usage.

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/