[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] New suggestion about call convention



> > It's an interesting idea, but for each instruction in each function in
> > each module you need to search in a "database" which chunk to replace and
> > then replace it with a mask. But before taking your decision you need to
> > execute a allocation algorithm. Currently, with only relocation on x86, a
> > lot of people are thinking that time needed by linking is slow...
>
> You mean run-time (dynamic) linking? We're going to do it at compile
> time; there will be no overhead when the program starts.

Ok, I was thinking you do this for every linking process. I better understood 
your idea. So we still need some idea for dynamic librairie.

> You don't need a complicated database look-up. You just do the
> following:
>
> 	get the instruction word
> 	for all register fields do
> 		if the register number has to be replaced
> 			replace it with the new register number

> You can use a table-based translation (64 table entries) or an algorithmic
> approach (add a constant to the register number if it's inside the range
> that is supposed to be `pitch-shifted'). The number of register fields is
> available from the opcode (another table with 256 entries). Nothing more,
> nothing less. In either case, transposing an instruction will take only
> a handful of CPU cycles.

For me tjhe 256 entries tables is a database. It's still a lot. I know that a 
good idea will be to preload into register each possible destination jump, 
then look in the table for a constant and then do a conditionnal call to the 
right destination. It can be quite quick and a good approach for the linking 
process. 
	The problem is that will be an intruisive patch into gcc and ld...

> The allocation algorithm isn't complicated either (I outlined it in my
> other mail).

Perhaps a little bit, but we know how to handle it correctly now. But you need 
to see your function more like a graph than a tree, and you must for every 
function that you don't know if they call somebody say that they call every 
body !

> >   But the idea is good, what did you think about a double solution : for
> > .o (not librairie), you use your technique and for the rest the technique
> > suggested by Arnaud.

> Inside a shared library or a program, reallocation can be used as well.
> It just won't work at the interface between programs and shared libraries.

Where I was thinking with mask register...

> >   In the prologue you use the mask technique and put a second entry point
> > for optimal call. (I don't see why you need to have contiguous register,
> > you only need a 64 bits mask per function, so 2 techniques can be used
> > together).

> Contiguous register numbers make the allocation and transposition easier,
> expecially when register pairs are involved. And it's what compilers
> usually do: start with the first, and always pick the next available
> register.

So here you don't whant to change the compiler ;-) I think that's possible to 
have a register allocation that select them "randomly" and give good result 
(problem with pair isn't a problem, because you know where to put the second 
register).

> On the other hand, the `double jump' allows us to put multiple wrappers
> around the same function without touching the function itself.

I was thinking that in the quick entry we didn't  save/restore register. So 
that the wrapper can use the method they want for entering in this function 
(with or without save, depending on the information they have).

> > With this you have both possibility. In fact you don't need to add a jump
> > for entry point with your method too, because you can add the 2 entries
> > points when you compile.

> The link time register reallocation relies on the fact that any number
> of wrappers can be added to a function. That has to be done at link time
> because the compiler doesn't know how many wrappers it has to provide.

We can have a default wrapper and many other that use the normal entry.

> > low_entry_point:
> > 	loadcons.1 0xFFFF, r0
> > 	loadcons.2 0xFFFF, r0
> > 	loadcons.3 0xFFFF, r0
> > 	and r0, mr, r1
> > 	maskstore r1, [sp]
; oups :
	move qe, m0
	not r0, qe
> > quick_entry_point:
> > 	// My function
> > 	jmpz qe, r63
> > restore_code:
> > 	loadcons.1 0xFFFF, r0
> > 	loadcons.2 0xFFFF, r0
> > 	loadcons.3 0xFFFF, r0
> > 	and r0, mr, r1
> > 	maskload r1, [sp]
> > 	jmp r63

> Nice idea, but it still allows only a single wrapper per function.

You point to quick_entry_point if you want an other wrapper...

> BTW: Your restore code is broken; you'll have to use the modified mask
> when you restore the registers, and then put the original mask back in
> place.

I forgot to set qe !

> But the register numbers are always encoded in the same 1...3 fields
> of the instruction word - and that's all that matters. We don't have to
> touch the instructions itself, only the register numbers.

Of course.

> > The problem is doing a optimising linker will really improve the code,
> > but will not work for shared code (librairie are loaded only once) and
> > will certainly be really slow for big code... The mr is not as optimal as
> > your proposition but didn't cost so much each time you run a application.

> Au contraire, mon ami! (I always wanted to say that ;)

;-)

> Link time register allocation is done *once* for each program - before
> it is run. At run time, you'll only waste cycles when you have to
> save/restore registers. The `mask' approach also needs some cycles for
> bookkeeping (that is, the mask operations), which makes it slower than
> link time allocation in the worst case.

Of course, but I was only thinking about shared libraire, sorry.

> One might consider the `mask' approach as an alternative for the
> program/library interface (after all, that looks like the weak point
> of my approach). But there is a good argument against it: A reasonably
> complex program will use *all* registers internally. That is, you always

Perhaps for a program, but not a librairie, and perhaps not between librairie 
and certainly not every time.

> have to save registers when you enter a shared library - with either
> approach. Both link time allocation (when the library is built) and
> register masking (at run time) help to reduce the number of registers
> you have to save, but link time allocation is superior because it has
> less run time overhead.

I agree.

Cedric
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/