[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] latest gcc & immediate addressing [Was: BOUNCE f-cpu@seul.org:...](fwd)



hi !

devik wrote:

Ehh I forgot .. I fwd my reply to group too as someone might
like to comment something.
---------

<snip>

- logic operations take 9-bit signed immediate operands.

uhh .. my 2.7 manual doesn't state it :( Where can I find it ?

IIRC the 9-bit immediate for ROP2 is composed of a byte with a sign bit.
this can be important when dealing with ASCII characters or bytes from the video framebuffer,
because 7 bits is not enough.
The 9nth bit is not used when SIMD mode is used for byte width, but it's useful
for wider modes (non-SIMD or non-byte modes).

- STACK_BOUNDARY must be at least 64 (fullword!). FUNCTION_BOUNDARY
should be the size of a cache line (256 bits). And even more
important: STRICT_ALIGNMENT must be 1 (unaligned memory access
is strictly forbidden).

good point - it is another thing I didn't find in manual. So
that f-cpu needs natural align on all data ?

*all* data :-)
otherwise it traps.

one could program a trap handler for this, but it is left as an exercise for later (much later).

Concerning the 128+ bit versions, a smarter version of the LSU could be made,
but there still is a problem when a word crosses a page...


- The function prologue is wrong. `storei $-8, r62, r61' uses
*post*decrement, not predecrement. I suggest you calculate
the required amount of memory, subtract it from r62, and then
use a temporary register with postincrement for storing
registers and initializing locals.

I know. I simply assume that stack pointer points to next free
slot instead of last used one. Then in prolog I can use post
decrement and save several insn and additional register (which
can save us troubles with register-cache aliasing).
Epilog first substracts 8 from SP - it is ok because at end of
fn there is often a lot of free scheduler slots where this
substract insn fits at no expense (which is not case in prolog).

However I just realized that it is not interrupt/signal safe :-(
I really have to decrement in prolog ...

i don't see why there would be a problem.

If an IRQ happens, the user stack and its pointer will not be used.
Each IRQ should have its own stack space, unless you program it otherwise.
And if IRQs are nested, they allocate their stack progressively.

So don't worry for the epilogues.


- The `*loadcons' patterns look suspicious to me. Better drop them.
In fact, better don't use loadcons at all, use loadconsx
instead (and rely on the assembler to optimize the instruction
sequence).

:) They are under developement still but the really describe what
loadcons does !!
They should be also able to translate code:
a = b & 0xffffffff0000ffff | 0x45330000;
to single loadcons.1.
Also we really need to tell gcc about all loadcons isns because splitter
will need them to do correct scheduling. Loadcons are ideal "stuffers"
for free schedule slots.

BTW i don't know the latest news about loadcons and its "shifted" versions.
The original loadcons(x) is, as devik said, an ideal "stuffer" because there
should be no delay, and it is often critical.

One example is the code snippet i shown at 19C3,
it compares many characters (bytes) in parallel and the loadcons
is very useful for filling the slots between the sdup and the xorn.and
(sdup has at least one cycle of latency).


I don't have time to discuss other aspects
but i hope this helps.

devik

YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/