[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] Instruction census
On Tue, Jan 14, 2003 at 07:21:00AM +0100, devik wrote:
[...]
> you speak about IA-64 advanced load, aren't you ? I know it - you
> even don't need to use exception if the only way is to refetch
> data - you can use completion load which will reload register
> from memory if ALOAD association entry is missing.
Didn't remember that one (it's been a while since I read the docs).
> But I was speaking about -fsched-spec-load-dangerous which allows
> moves of loads even for CPUs without disambiguation support
> like F-CPU is - and then it is not strictly C-compliant (it does
> some assumtions on aliasing).
If the optimizer can prove that there's no aliasing that violates the
rules of the ISO C standard, the optimization would be correct.
By the way: If we implement load-linked/store-conditional, we may also
implement speculative loads á la Itanium. The mechanism is the same:
keep an eye on the memory location and set a flag when it's been stored
into after the load. When the `completion load' finds the flag set,
it re-fetches the data. We could handle that kind of things in the LSU.
> > > WARNING: -fomit-frame-pointer produces sometimes addi with
> > > inwalid (out of range) imm. I'm not still sure why.
> >
> > fcpu-as didn't complain so far (only for the loadcons[x] case). Which
> > value did it produce?
>
> I tried to fix it in fcpu.c by means of fcpu_need_fp_p which should
> detect cases when we can't eliminate FP because elimination is done
> during reload when we can't change addi to loadcons+add. And when
> elimination distance between SP and FP is >127 then is is likely that
> addi $372,... is produced (because elimination substitution is
> not checked by recog then).
I've never seen that before (and fcpu-as would have detected it).
[...]
> > `mac' doesn't use a second output - it's 3r1w, not 2r2w.
> > And it is supported by gcc now, in all variants :)
>
> ohh :) good news ! I overlooked that r1 is both src and dst. BTW how
> will be it implemented in HW ?
s/will be/is/ :)
It's handled by the multiplier (see vhdl/eu_imu/). A `mac' will take
the same time as the corresponding `mul' (3...6 cycles, depending on
the chunk size).
> Did someone think about shadd like ia64 has ? I'd expect immediate
> form where immediate would tell you no of left shifts of one
> operand - it could be much faster than mac and seems to be used
> often in code .... Will have to check its usage.
You mean, as in `r1 += r2 << uimm8' or something like that?
That would require chaining of the SHL and ASU units. Therefore,
it won't be cheaper than an explicit
shiftli $uimm8, r2, r2
add r2, r1, r1
sequence. On top of that, EU chaining would make the IF/D hardware much
more complex (you need to find free slots for both EUs!).
--
Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
"All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/