[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] latest gcc & immediate addressing [Was: BOUNCEf-cpu@seul.org:...] (fwd)



Ehh I forgot .. I fwd my reply to group too as someone might
like to comment something.
---------

> I finally got it up and running now, but the assembler still doesn't
> grok the code, for several reasons:

hi, you seem to be pretty familiar with gcc internals, did you
think about changing/fixing some things directly in a code ?
I'll try to address all of these but later when I get some more
time.
Just now only some comments/questions. PLEASE read and comment
it. It would help to decide how to do some things.

> 	- The ranges for immediate operands are wrong. umin/umax take
> 	  unsigned bytes, smin/smax signed bytes, for example (btw:
> 	  smin/smax are missing).

ahh yes - there was a lot of experiments with ranges and this code
is not checked yet. And smin/smax was not there because I finshed
that part before you told me that there will be signed versions ..

> 	- For shift/rotate ops, the second operand should always have
> 	  QImode.  That avoids an unnecessary explicit zero extension
> 	  when the operand is not a full register.  Immediate operands
> 	  are unsigned, BTW.

ok, one comment which is VERY important conceptional thing. Because
gcc doesn't handle SIMD implicitly (yet) I use only non-simd (except
for instrinstic funtions - not done yet). When we do some operation
in SI mode we can rely on CPU to zero extend it to 64bits (for FC0).
This holds for all insns I know about.
It seems natural to use it and define zero_extend as no-op (I tested
it using subregs). Other CPUs in gcc often define TRULY_NOOP_TRUNCATION
to tell that we can truncate just by pretending register as already
truncated - ie. accessing 64bit register with .16 insn truncates
DI to HI.
We can't use it on FCPU because of 2 reasons:
1) it would colide with our no-op extension, producing no code
   for case { uint i; ushort s = (uchar)i; } where is should
   produce either truncate or zextend
2) it makes jmpz/nz problematic because it tests full 64bit
   register - TRULY_NOOP_TRUNCATION leaves garbage in upper
   bits and if((uchar)i) where i==256 jumps even if low 8bits
   is zero. If we would want TRULY_NOOP_TRUNCATION we would
   need to emit "and" before each such test.

If we will decide to use noop zero_extend then the code will
be shorter and also there is no problem with rotation size above.

> 	- logic operations take 9-bit signed immediate operands.

uhh .. my 2.7 manual doesn't state it :( Where can I find it ?

> 	- STACK_BOUNDARY must be at least 64 (fullword!). FUNCTION_BOUNDARY
> 	  should be the size of a cache line (256 bits).  And even more
> 	  important: STRICT_ALIGNMENT must be 1 (unaligned memory access
> 	  is strictly forbidden).

good point - it is another thing I didn't find in manual. So
that f-cpu needs natural align on all data ?

> 	- We pass arguments in r1...r15, not r2...r15.	Leaving r1 unused
> 	  is pointless.  On the other hand, we also pass arguments in
> 	  registers when the function has variable arguments - currently,
> 	  gcc puts the `fixed' arguments in registers and pushes the
> 	  rest on the stack?!

then the manual pg 223 is wrong again. About variable arguments,
yes they go to the stack just now - I have to look how can I force
gcc to find them in registers. Again this convention seems not to be
documented.

> 	- The function prologue is wrong. `storei $-8, r62, r61' uses
> 	  *post*decrement, not predecrement.  I suggest you calculate
> 	  the required amount of memory, subtract it from r62, and then
> 	  use a temporary register with postincrement for storing
> 	  registers and initializing locals.

I know. I simply assume that stack pointer points to next free
slot instead of last used one. Then in prolog I can use post
decrement and save several insn and additional register (which
can save us troubles with register-cache aliasing).
Epilog first substracts 8 from SP - it is ok because at end of
fn there is often a lot of free scheduler slots where this
substract insn fits at no expense (which is not case in prolog).

However I just realized that it is not interrupt/signal safe :-(
I really have to decrement in prolog ...

> 	- ASM_OUTPUT_REG_PUSH and ASM_OUTPUT_REG_POP are wrong. The
> 	  former because it assumes predecrement again, the latter
> 	  because the instruction is called `loadi', not `load'.

yes. I still didn't reallized what are these macros used for. I hope
to be able to undef them completely.

> 	- There's a much easier way to specify the pattern for `mux'
> 	  (note the constraints for operands 3 and 4):
>
> 		(define_insn "*mux"
> 			[(set (match_operand 0 "register_operand" "=r,r")
> 				(ior
> 					(and
> 						(match_operand 1 "register_operand" "r,r")
> 						(match_operand 2 "register_operand" "r,r"))
> 					(and
> 						(not (match_operand 3 "register_operand" "1,2"))
> 						(match_operand 4 "register_operand" "0,0"))))
>      [....]
> 			"@
> 			mux %1,%2,%0
> 			mux %2,%1,%0"
> 			[(set_attr "type" "logic")])

Are you sure ? Constraints are tested AFTER matching. So that your
version will probably took: a&b | c&~d
as one which matches and the will try to use consraints to ensure
correct register allocation. It will found that none matches and
will abort. See this except fro manual:
"More precisely, the two operands that match must include one input-only
 operand and one output-only operand."
So that you can use that 0,0 but not 1,2 constraint.

> 	- `loop' branches when the register is not zero. That is:
>
> 		(ne (match_operand:DI 0 "register_operand" "+r") (const_int 0))
> 	  This instruction uses DImode, not SImode.

ehh yes it was there for teysting only because loop optimizer still
dies weird things :(

> 	- Conditional branches (`jmpz'/`jmpnz') also use DImode for the
> 	  condition register.  If the operand is smaller, it needs
> 	  zero extension (unless you're sure that the upper bits are
> 	  zero anyway - but that's not always true).

I use VOIDmode because it is needed by loop optimizer ... See
extension ideas above.

> 	- Your patterns for `*movM' allow the destination to be constant 0.

yes I know. However this will never be emited by gcc and if yes
it will abort in constraint selection. If I will disallow it, it
would abort elsewhere then ... So I allowed it because I had no
regmem_operand fn at that time. I'd focus on more problematic parts
just now.

> 	- The `*loadcons' patterns look suspicious to me. Better drop them.
> 	  In fact, better don't use loadcons at all, use loadconsx
> 	  instead (and rely on the assembler to optimize the instruction
> 	  sequence).

:) They are under developement still but the really describe what
loadcons does !!
They should be also able to translate code:
a = b & 0xffffffff0000ffff | 0x45330000;
to single loadcons.1.
Also we really need to tell gcc about all loadcons isns because splitter
will need them to do correct scheduling. Loadcons are ideal "stuffers"
for free schedule slots.

> 	- `loadaddri' (without `d') is correct for a LABEL_REF (while a
> 	  SYMBOL_REF would require `loadaddrid'). But you got the offset
> 	  wrong: it should read `loadaddri $label-.-4, reg'.

Ok about the offset. But SYMBOL_REF doesn't imply loadaddrid ! SYMBOL_REFs
are used both for external data & code pointers and there is currently
no way to distinguish them (unfortunately) ! Thus the warning comment in
the asm. LABEL_REF is used ONLY for internal labels BUT also it can be
used with loadaddrid sometimes ! (gcc allows you to take pointer to
label, switch() statements needs jump table at label and gcc also labels
private data tables.
Thus we will need to analyze insn dataflow to found which kind of
loadaddr to use - and there are further possible cases where (G)CSE
will try to use resulting pointer for both data & code access (it
is legal to read from function pointer AFAIK).

devik


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/