[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Stack handling



hi !

Thomas Lavergne wrote:
> I've reread the manual and found a problem in stack handling suggestion
> 
> The manual say :
> 
> pop = load 8, r3, r2
> push = store -8, r3, r2
> 
> With r3 a stack pointer and r2 a value to be pushed/poped to/from the stack.
> 
> but this was false, for example try to make :
> push r2
> pop r2
> 
> you obtain this code
> 
> store -8, r3, r2
> load 8, r3, r2
> 
> you push r2 at r3 address and increment r3,
> and after you get in r2 the value at r3 and decrement the pointer
> so after executing this you don't have the same value in r2... Bug
> 
> We can't manage a stack without pre-decrement instruction, or we need a
> lot of tricks and obtain very bad code...

there is something very important to note here :
 we CAN'T use pre-inc/decrement in F-CPU.

the obvious conclusion is that we have to use some tricks.

This does not annoy me for the simple reason that F-CPU works
best when global optimisations are applied (all the program
is flattened and cross-routine optimisations and allocation
are done). In this context, your remark is not an issue.

However, i think that the people who discussed about the
parameter passing, have forgotten a VERY IMPORTANT DETAIL.
but i doubt they would care listen, even though this is absolutely
critical for performance. If they don't want to lose a factor
of roughly 5 on their codes, they MUST specify which registers
will be used as pointers. FC0 uses 8 pointers to data and 8 pointers
to instructions : a software-managed "return stack" and "data stack"
MUST be allocated (that is : there remains 48 registers for the rest).

Of course it is not usual and it might confuse some people.
However, it is much more easier to GCC which will only use
these pointers to access the memory, preventing unacceptable
latencies. GCC has a parameter (in the machine description
files) that says if there are pointer-only registers, and we
can statically allocate 8 for data and 8 for instructions.

Please take into account that a SW-managed stack is an excellent
place for doing optimisations. What is written in the last manual
is a big issue for F-CPU : the unique function return address
register is a critical "bottleneck" because a pointer can't
be moved to another register (the value can, but the hidden
flags won't be moved, so the next use will create a stall
during maybe 5 or 10 cycles).

I think and know that there are several "inconsistencies".
but the key is to NOT think like with usual CPUs.
Ask yourself : what the CPU has to do and what is the simplest,
fastest way to perform the task.

F-CPU is not MIPS : r63 is not hardwired in the instructions
when a call or return is performed. We have the choice of using
63 registers for storing the return address. 8 of them can be used
"statically", whatever their number, to reduce the re-fetching overhead.
Same for data. It is simpler and faster to allocate 8 data and
8 instruction pointers, rather than trying to reuse the scheme
used by other CPUs "because that's all people are used to".

If somebody cares...

> Tom
> Thomas Lavergne
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/