[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] latest gcc & immediate addressing [Was: BOUNCEf-cpu@seul.org:...] (fwd)



> > hi, you seem to be pretty familiar with gcc internals, did you
> > think about changing/fixing some things directly in a code ?
>
> I wasn't sure whether your working version differs from the released
> one, and merging any changes might have been difficult. And I really
> prefer to hack my own code ;-)

heh :) me not, typicaly when I have idea I fix/change any
code I have at hands :)
By the way, If someone would need it I can provide CVS
server with php web and with accounts for all members.
It could help mainain some code.

> > DI to HI.
> > We can't use it on FCPU because of 2 reasons:
> > 1) it would colide with our no-op extension, producing no code
> >    for case { uint i; ushort s = (uchar)i; } where is should
> >    produce either truncate or zextend
>
> In that case, you MUST explicitly truncate `i' to 1 byte, probably
> with `move.8 reg(i), reg(s)', whether or not TRULY_NOOP_TRUNCATION is
> true.

it is what I mean - gcc emits explicit truncation of i UNLESS
I define my special noop_zero_extend :) Then gcc's truncation
is killed by my optimization :-)
But I will try to create special template for generic insn
followed by zero_extend and will emit it as no-op in case they
can be combined. It is cleaner.

> > 2) it makes jmpz/nz problematic because it tests full 64bit
> >    register - TRULY_NOOP_TRUNCATION leaves garbage in upper
> >    bits and if((uchar)i) where i==256 jumps even if low 8bits
> >    is zero. If we would want TRULY_NOOP_TRUNCATION we would
> >    need to emit "and" before each such test.
>
> I just wondered whether we can use `jmpl/nl' for conditional branches.
> Compare and x[n]or.{and|or} operations will always set the LSB to 1 if
> the result was true.

:) it is what my other mail was about :) it is nice that we
have similar ideas. for != is might be enough to use
xor.nn (non simd zero extended) and jmpz - it will take the
same time az x[n]or.{and|or} + jmpl.
With cmpe it could be even faster but ....

> > good point - it is another thing I didn't find in manual. So
> > that f-cpu needs natural align on all data ?
>
> Yes, up to 8 bytes at least. It's not clear yet whether machines with
> larger registers will have even stricter alignment constraints. For
> the moment, we should concentrate on the 64-bit version.

"at least" ? I suppose 32bit in needs to be "naturaly"
(32bit) aligned, ok ?

> > yes. I still didn't reallized what are these macros used for. I hope
> > to be able to undef them completely.
>
> The compiler uses them internally, IIRC.

hmm I found it only in final.c. and some targets doesn't define them
at all - I'll have to look at it.

> > "More precisely, the two operands that match must include one input-only
> >  operand and one output-only operand."
> > So that you can use that 0,0 but not 1,2 constraint.
>
> This kind of pattern is also used in other machine descriptions,
> but it really seems to be wrong - if I try to compile something like
> `a & b | ~c & d' with optimization, the reload pass fails :-(
> Ok, forget it...

:) for your information, these targets uses more complex predicates
to ensure that when the insn is selected then just one of the 1,2
constraint will be met for sure. Constraints are them only used to
select correct insn format.
There is warning in manual that you can't use constraints for selecting
whether an insn can be matched by template - this is job of predicates.

> > :) They are under developement still but the really describe what
> > loadcons does !!
> > They should be also able to translate code:
> > a = b & 0xffffffff0000ffff | 0x45330000;
> > to single loadcons.1.
>
> Isn't that a rather rare case? It only seems suitable for access to
> certain bitfields.

yes it is of course. But if you want compined to be able to use and
split them you need to descrtibe their behaviour. And the description
I used is perfectly valid.

> > Also we really need to tell gcc about all loadcons isns because splitter
> > will need them to do correct scheduling. Loadcons are ideal "stuffers"
> > for free schedule slots.
>
> That's true. But then you also should tear apart any bigger
> loadcons[x] sequence.

this is what splitter does. we can emit assembler directives
to get particular parts of larger constant.

> > Ok about the offset. But SYMBOL_REF doesn't imply loadaddrid ! SYMBOL_REFs
> > are used both for external data & code pointers and there is currently
> > no way to distinguish them (unfortunately) ! Thus the warning comment in
> > the asm. LABEL_REF is used ONLY for internal labels BUT also it can be
> > used with loadaddrid sometimes ! (gcc allows you to take pointer to
> > label, switch() statements needs jump table at label and gcc also labels
> > private data tables.
>
> The documentation seemed to be pretty clear that a LABEL_REF always
> refers to code, while a SYMBOL_REF refers to data. It even mentions
> that:
>
> 	"The reason for using a distinct expression type for code label
> 	references is so that jump optimization can distinguish them."

ok here is part of generated switch stat:
(insn 101 99 102 (set (reg:DI 83)
        (label_ref:DI 107)) -1 (nil)
    (expr_list:REG_EQUAL (label_ref:DI 107)
        (nil)))

(label_ref:DI 107) is label of jump table. Here we need loadaddrid
even if it is not clear.
And below is excerpt of code loading function pointer
address of function f and data item g:

(insn 13 11 15 (set (reg/f:DI 72)
        (symbol_ref:DI ("f"))) -1 (nil)
    (expr_list:REG_EQUAL (symbol_ref:DI ("f"))
        (nil)))

(insn 15 13 17 (set (reg/f:DI 74)
        (symbol_ref:DI ("g"))) -1 (nil)
    (expr_list:REG_EQUAL (symbol_ref:DI ("g"))
        (nil)))

As you see both are symbol_ref without any info whether
they are data or code pointer.

devik

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/