[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Stack handling



On Tue, Jul 23, 2002 at 09:17:26PM +0200, Yann Guidon wrote:
> hi !
> 
> Thomas Lavergne wrote:
> > I've reread the manual and found a problem in stack handling suggestion
> > 
> > The manual say :
> > 
> > pop = load 8, r3, r2
> > push = store -8, r3, r2
> > 
> > With r3 a stack pointer and r2 a value to be pushed/poped to/from the stack.
> > 
> > but this was false, for example try to make :
> > push r2
> > pop r2
> > 
> > you obtain this code
> > 
> > store -8, r3, r2
> > load 8, r3, r2
> > 
> > you push r2 at r3 address and increment r3,
> > and after you get in r2 the value at r3 and decrement the pointer
> > so after executing this you don't have the same value in r2... Bug
> > 
> > We can't manage a stack without pre-decrement instruction, or we need a
> > lot of tricks and obtain very bad code...
> 
> there is something very important to note here :
>  we CAN'T use pre-inc/decrement in F-CPU.
> 
    F-CPU doesn't really need pre-inc or pre-dec either. The most
efficient way to push a lot of values on the stack is to:
1) decrement the stack pointer by the amount of space we need for
arguments
2) just store all the arguments in parallel to the stack space

On a superscalar CPU, this scheme is even better, because the processor
doesn't have to get all tricky about handling dependencies between the 
store instructions with pre-dec, and also can help make for better cache
usage.

    On the other hand, post-inc can be useful for popping stuff from the
stack, but I would not fret at all over pre-dec.

> the obvious conclusion is that we have to use some tricks.
> 
> This does not annoy me for the simple reason that F-CPU works
> best when global optimisations are applied (all the program
> is flattened and cross-routine optimisations and allocation
> are done). In this context, your remark is not an issue.
> 
    The tricks that need to be applied to the F-CPU really aren't
tricks. They are sensible architecture constraints. As someone who
regularly writes optimizing backends, I can say nothing in the F-CPU
phases me in the least. :)

> However, i think that the people who discussed about the
> parameter passing, have forgotten a VERY IMPORTANT DETAIL.
> but i doubt they would care listen, even though this is absolutely
> critical for performance. If they don't want to lose a factor
> of roughly 5 on their codes, they MUST specify which registers
> will be used as pointers. FC0 uses 8 pointers to data and 8 pointers
> to instructions : a software-managed "return stack" and "data stack"
> MUST be allocated (that is : there remains 48 registers for the rest).
> 
> Of course it is not usual and it might confuse some people.
> However, it is much more easier to GCC which will only use
> these pointers to access the memory, preventing unacceptable
> latencies. GCC has a parameter (in the machine description
> files) that says if there are pointer-only registers, and we
> can statically allocate 8 for data and 8 for instructions.
> 
    GCC is fairly old stuff, though. You could easily write algorithms
to make better use of this idiom without artificially setting a boundary
between what is a data register and what is an address register.

> Please take into account that a SW-managed stack is an excellent
> place for doing optimisations. What is written in the last manual
> is a big issue for F-CPU : the unique function return address
> register is a critical "bottleneck" because a pointer can't
> be moved to another register (the value can, but the hidden
> flags won't be moved, so the next use will create a stall
> during maybe 5 or 10 cycles).
> 
> I think and know that there are several "inconsistencies".
> but the key is to NOT think like with usual CPUs.
> Ask yourself : what the CPU has to do and what is the simplest,
> fastest way to perform the task.
> 
> F-CPU is not MIPS : r63 is not hardwired in the instructions
> when a call or return is performed. We have the choice of using
> 63 registers for storing the return address. 8 of them can be used
> "statically", whatever their number, to reduce the re-fetching overhead.
> Same for data. It is simpler and faster to allocate 8 data and
> 8 instruction pointers, rather than trying to reuse the scheme
> used by other CPUs "because that's all people are used to".
> 
    Interprocedural register allocation is great stuff. When the call
chain is known, it's a simple matter to determine the best place to
store the return addresses without ever resorting to stack use.

> If somebody cares...
> 
> > Tom
> > Thomas Lavergne
> WHYGEE
    Lee

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/