[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] New suggestion about call convention
> > It's an interesting idea, but for each instruction in each function in
> > each module you need to search in a "database" which chunk to replace and
> > then replace it with a mask. But before taking your decision you need to
> > execute a allocation algorithm. Currently, with only relocation on x86, a
> > lot of people are thinking that time needed by linking is slow...
>
> You mean run-time (dynamic) linking? We're going to do it at compile
> time; there will be no overhead when the program starts.
Ok, I was thinking you do this for every linking process. I better understood
your idea. So we still need some idea for dynamic librairie.
> You don't need a complicated database look-up. You just do the
> following:
>
> get the instruction word
> for all register fields do
> if the register number has to be replaced
> replace it with the new register number
> You can use a table-based translation (64 table entries) or an algorithmic
> approach (add a constant to the register number if it's inside the range
> that is supposed to be `pitch-shifted'). The number of register fields is
> available from the opcode (another table with 256 entries). Nothing more,
> nothing less. In either case, transposing an instruction will take only
> a handful of CPU cycles.
For me tjhe 256 entries tables is a database. It's still a lot. I know that a
good idea will be to preload into register each possible destination jump,
then look in the table for a constant and then do a conditionnal call to the
right destination. It can be quite quick and a good approach for the linking
process.
The problem is that will be an intruisive patch into gcc and ld...
> The allocation algorithm isn't complicated either (I outlined it in my
> other mail).
Perhaps a little bit, but we know how to handle it correctly now. But you need
to see your function more like a graph than a tree, and you must for every
function that you don't know if they call somebody say that they call every
body !
> > But the idea is good, what did you think about a double solution : for
> > .o (not librairie), you use your technique and for the rest the technique
> > suggested by Arnaud.
> Inside a shared library or a program, reallocation can be used as well.
> It just won't work at the interface between programs and shared libraries.
Where I was thinking with mask register...
> > In the prologue you use the mask technique and put a second entry point
> > for optimal call. (I don't see why you need to have contiguous register,
> > you only need a 64 bits mask per function, so 2 techniques can be used
> > together).
> Contiguous register numbers make the allocation and transposition easier,
> expecially when register pairs are involved. And it's what compilers
> usually do: start with the first, and always pick the next available
> register.
So here you don't whant to change the compiler ;-) I think that's possible to
have a register allocation that select them "randomly" and give good result
(problem with pair isn't a problem, because you know where to put the second
register).
> On the other hand, the `double jump' allows us to put multiple wrappers
> around the same function without touching the function itself.
I was thinking that in the quick entry we didn't save/restore register. So
that the wrapper can use the method they want for entering in this function
(with or without save, depending on the information they have).
> > With this you have both possibility. In fact you don't need to add a jump
> > for entry point with your method too, because you can add the 2 entries
> > points when you compile.
> The link time register reallocation relies on the fact that any number
> of wrappers can be added to a function. That has to be done at link time
> because the compiler doesn't know how many wrappers it has to provide.
We can have a default wrapper and many other that use the normal entry.
> > low_entry_point:
> > loadcons.1 0xFFFF, r0
> > loadcons.2 0xFFFF, r0
> > loadcons.3 0xFFFF, r0
> > and r0, mr, r1
> > maskstore r1, [sp]
; oups :
move qe, m0
not r0, qe
> > quick_entry_point:
> > // My function
> > jmpz qe, r63
> > restore_code:
> > loadcons.1 0xFFFF, r0
> > loadcons.2 0xFFFF, r0
> > loadcons.3 0xFFFF, r0
> > and r0, mr, r1
> > maskload r1, [sp]
> > jmp r63
> Nice idea, but it still allows only a single wrapper per function.
You point to quick_entry_point if you want an other wrapper...
> BTW: Your restore code is broken; you'll have to use the modified mask
> when you restore the registers, and then put the original mask back in
> place.
I forgot to set qe !
> But the register numbers are always encoded in the same 1...3 fields
> of the instruction word - and that's all that matters. We don't have to
> touch the instructions itself, only the register numbers.
Of course.
> > The problem is doing a optimising linker will really improve the code,
> > but will not work for shared code (librairie are loaded only once) and
> > will certainly be really slow for big code... The mr is not as optimal as
> > your proposition but didn't cost so much each time you run a application.
> Au contraire, mon ami! (I always wanted to say that ;)
;-)
> Link time register allocation is done *once* for each program - before
> it is run. At run time, you'll only waste cycles when you have to
> save/restore registers. The `mask' approach also needs some cycles for
> bookkeeping (that is, the mask operations), which makes it slower than
> link time allocation in the worst case.
Of course, but I was only thinking about shared libraire, sorry.
> One might consider the `mask' approach as an alternative for the
> program/library interface (after all, that looks like the weak point
> of my approach). But there is a good argument against it: A reasonably
> complex program will use *all* registers internally. That is, you always
Perhaps for a program, but not a librairie, and perhaps not between librairie
and certainly not every time.
> have to save registers when you enter a shared library - with either
> approach. Both link time allocation (when the library is built) and
> register masking (at run time) help to reduce the number of registers
> you have to save, but link time allocation is superior because it has
> less run time overhead.
I agree.
Cedric
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/