[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] GCC 3.1 for F-CPU port



> there are several F-CPU assemblers now.
> even though i don't know which one to trust :-P
> everyone with a specific syntax, unique features etc ...

:) Are there reasons other than personal ego ? Like
if all wants their own assembler ? And are these
documented so that I can select one ?

> emulator is a big problem. it will take time
> before it's completely solved but i am confident.

I understand that it is a big piece of code. But is
the principle so hairy ? I'd expect relatively simple
code ... But there may be things I'm overlooking ..
Probably memory and cache simulations are not so simple too.

> compiler ... well ... you seem to have taken over
> the past efforts :-)))

I didn't know about past efforts ! Huh... I could
starts from that and not from scratch then :( I was
learning gcc internals whole 3 days until I coded
it...

> >linux kernel for example and count cycles it takes
> >until /sbin/init is launched).
>
> i don't think that it is a good metric.
> On top of that, there is no external HW ready.

yep .. I only wanted to see linux booted on f-cpu ;-)
Probably because I'm familiar with large parts of linux
and wanted to try to understand arch-dependent parts too.
I planed to stole HW emulation from bochs :)

> However it can be interesting to code and run the
> "primary boot monitor" (see at
> http://f-cpu.seul.org/new/F-CPU_boot.txt )
> and start bootstrapping stuffs from that point,
> making simple "toy" or useful software, etc ...

good idea. Maybe it could be start point to do other tests.
I wanted to port linux also because when it "runs" you
can benchmark other code on it (like to compile gcc on it).

> >Also insns other than add and shift should be add (just
> >now gcc uses its libs).
>
> ? i don't understand what that means ....
> we can't do boolean or shift operations ?

no no ... :-) I was just be lazy and implemented only
mandatory patterns like movM, jump, jump_indirect, call
and a few optionals like addM3, shiftM, extendM, compareM
and all branches.
So I wanted to say that I have to add all others like
logic, other shifts, rots ....
Also recently I found that I need reload_inM and reload_outM
probably because when I tried to compile "vsprintf" is
crashes when reloading registers - I used gen_reg_rtx
in moveM which is unfortunately not allowed when
reload_in_progress is true. Have you played with these ?

> >There is problem with jump optimizer because it needs
> >labels tied to jumps but we have them in registers.
> the 'trick' is maybe to use a "macro" and the instruction
> can be rescheduled by Cédric's assembler ...

I have done it exactly how you said. :) Problem can be
that we disable certain otimizations by this. If you'd
be able to instruct gcc to emit address load together
with jump it is able to do cse and factoring the load
of of the loop bodies. This is hard to do in assembler
because it would need to redo loop detection and expression
lifespan analysis.
I've seen something like this in ia64.md, I'll look at it.

> >Also conditional branching is not tuned - it supresses
> >loop optimization :(
>
> it seems that you do not use the same set of conditions
> as is implemented (LSB, MSB, zero, instead of greater, etc.).

I use only zero and not-zero accompanied with cmpxx. I didn't
use MSB because there was discussion about its removal and LSB
because I don't know good use for it yet ;-)

> >    jmp_direct.nz a0,@L6
> >    widen.d a3,a1
> >    bseti log2(1048576),r0,a0
>
> wow .... uh ....

I was lazy to use "*" in define_expand instead of "@" to
compute "log2_exact()" so that I simply emit it for assemler
to handle it ;-) I use bseti C,r0,r for constants over 0xffff
which are powers of two.
I played a lot with code which generates constants. Now it
is relatively clean only doesn't handle negative numbers
efficiently. I'll need to do:
nand r0,r0,t0
; STALL
loadcons.0 C,t0

or

loadcons.0 C,t0
widen.d t0,t0
; STALL

while for positive the best is probably:
move r0,t0
loadcons.0 C,t0

which is without possible stall. Of course the "stall" slot
will be probably used by compiler but not always. Do you
know better code for small negatives ?

I'm thinking about this: If RTL pass of compiles allocates
less than half of temporary registers then there will be
some free ones even after optimization. In such case allocate
one physical register and presume in patterns that is has
value -1. Then use it during combiner pass just like we
use r0 but for another things. If it was really used, generate
new insn in prolog part to assign -1 to it and let second scheduler
pass to reschedule it.
Then we save 1 cycle per use for no runtime penalty. Hehe.
Take it as interesting reading only - I don't plan to implement
this beast soon but it is interesting, is not ?

> i have put the sources at
> http://f-cpu.seul.org/new/gccfcpu.20021203.tgz

I'll definitely look at it ! However I have to postpone
any other hacking till Christmas because I have important
project at my work.

best regards,
devik

PS: do you know about new scheduler in gcc 3.3 ?? It should
    schedule f-cpu almost perfectly

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/