[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Instruction census



On Tue, Jan 14, 2003 at 07:21:00AM +0100, devik wrote:
[...]
> you speak about IA-64 advanced load, aren't you ? I know it - you
> even don't need to use exception if the only way is to refetch
> data - you can use completion load which will reload register
> from memory if ALOAD association entry is missing.

Didn't remember that one (it's been a while since I read the docs).

> But I was speaking about -fsched-spec-load-dangerous which allows
> moves of loads even for CPUs without disambiguation support
> like F-CPU is - and then it is not strictly C-compliant (it does
> some assumtions on aliasing).

If the optimizer can prove that there's no aliasing that violates the
rules of the ISO C standard, the optimization would be correct.

By the way: If we implement load-linked/store-conditional, we may also
implement speculative loads á la Itanium. The mechanism is the same:
keep an eye on the memory location and set a flag when it's been stored
into after the load.  When the `completion load' finds the flag set,
it re-fetches the data.  We could handle that kind of things in the LSU.

> > > WARNING: -fomit-frame-pointer produces sometimes addi with
> > > inwalid (out of range) imm. I'm not still sure why.
> >
> > fcpu-as didn't complain so far (only for the loadcons[x] case).  Which
> > value did it produce?
> 
> I tried to fix it in fcpu.c by means of fcpu_need_fp_p which should
> detect cases when we can't eliminate FP because elimination is done
> during reload when we can't change addi to loadcons+add. And when
> elimination distance between SP and FP is >127 then is is likely that
> addi $372,... is produced (because elimination substitution is
> not checked by recog then).

I've never seen that before (and fcpu-as would have detected it).

[...]
> > `mac' doesn't use a second output - it's 3r1w, not 2r2w.
> > And it is supported by gcc now, in all variants :)
> 
> ohh :) good news ! I overlooked that r1 is both src and dst. BTW how
> will be it implemented in HW ?

s/will be/is/ :)

It's handled by the multiplier (see vhdl/eu_imu/).  A `mac' will take
the same time as the corresponding `mul' (3...6 cycles, depending on
the chunk size).

> Did someone think about shadd like ia64 has ? I'd expect immediate
> form where immediate would tell you no of left shifts of one
> operand - it could be much faster than mac and seems to be used
> often in code .... Will have to check its usage.

You mean, as in `r1 += r2 << uimm8' or something like that?
That would require chaining of the SHL and ASU units. Therefore,
it won't be cheaper than an explicit

	shiftli $uimm8, r2, r2
	add r2, r1, r1

sequence.  On top of that, EU chaining would make the IF/D hardware much
more complex (you need to find free slots for both EUs!).

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/