[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] New suggestion about call convention



> First, a decent F-CPU compiler should not select registers randomly.
> It should analyze every function and assign register numbers so that
> chances for a collision are minimized.

It depend if you have knowledge about what the other module librairie do.
You will certainly agree that if you don't have any knowledge about the 
register usage, random is the best solution.

> That is, in a module containing several functions that call each other,
> each function should use a different set of registers. Additionally,
> functions can have both an "internal" entry point for intra-module calls
> (which need not save any registers) and a "public" entry point that
> saves all registers used inside it, prepares for restoring them, and
> then dispatches to the internal entry point. Unless there are recursive
> functions, you'll have to save registers only when you enter the module
> (and restore them when you leave it).

It's a good idea for having quick call intra-module. The problem is in the 
epilogue, how did you know from where you came ?

> Ideally, each function will use a contiguous set of registers, and there
> will be an entry in the object file reporting which registers it uses.
> That way it becomes possible to further minimize collisions at link time
> by renumbering the registers inside a function - you may also call it
> "deferred register allocation" if you like.

It's an interesting idea, but for each instruction in each function in each 
module you need to search in a "database" which chunk to replace and then 
replace it with a mask. But before taking your decision you need to execute a 
allocation algorithm. Currently, with only relocation on x86, a lot of people 
are thinking that time needed by linking is slow...
  But the idea is good, what did you think about a double solution : for .o 
(not librairie), you use your technique and for the rest the technique 
suggested by Arnaud.
  In the prologue you use the mask technique and put a second entry point for 
optimal call. (I don't see why you need to have contiguous register, you only 
need a 64 bits mask per function, so 2 techniques can be used together).

> 	new_entry_point:
> 		// allocate stack space here
> 		storem xyz
> 		move r63, saved_reg	// save return address
> 		loadaddri old_entry_point, temp_reg
> 		jump temp_reg, r63	// call original function
> 	restore_code:
> 		move saved_reg, r63	// restore return address
> 		loadm xyz
> 		// deallocate stack space here
> 		jump r63

Problem with this is that you have a double jump each time you call a function 
outside of your module. Why didn't you use something like this :

low_entry_point:
	storei +8, [sp], r63
	loadaddri restore_code, r63
	loadcons.1 0xFFFF, r0
	loadcons.2 0xFFFF, r0
	loadcons.3 0xFFFF, r0
	and r0, mr, r1
	maskstore r1, [sp]
quick_entry_point:
	// My function
	jmp r63
restore_code:
	loadcons.1 0xFFFF, r0
	loadcons.2 0xFFFF, r0
	loadcons.3 0xFFFF, r0
	and r0, mr, r1
	maskload r1, [sp]
	loadi -8, [sp], r0 	; I didn't like this piece of code
	loadi -8, [sp], r63
	jmp r63

With this you have both possibility. In fact you don't need to add a jump for 
entry point with your method too, because you can add the 2 entries points
when you compile. A possibility to improve your idea is to specify a other 
registre Quick Entry, if qe == r0 => don't restore else restore so the 
epilogue/prologue will look like :

low_entry_point:
	loadcons.1 0xFFFF, r0
	loadcons.2 0xFFFF, r0
	loadcons.3 0xFFFF, r0
	and r0, mr, r1
	maskstore r1, [sp]
quick_entry_point:
	// My function
	jmpz qe, r63
restore_code:
	loadcons.1 0xFFFF, r0
	loadcons.2 0xFFFF, r0
	loadcons.3 0xFFFF, r0
	and r0, mr, r1
	maskload r1, [sp]
	jmp r63

> The same algorithm works with both functions and modules, and it only
> needs the set of used registers per function/module and a dependency
> graph. Both can be easily constructed from intermediate code. Register
> renumbering is done on the object code directly (thanks for the uniform
> register model and instruction format of the F-CPU which allow us to
> `transpose' the object code -- note to guitar players: think `capo' ;-).

I don't understood you here, we have a lot of different instruction : 2r1w, 
1i1w, 1i1r, 1i1r1w, 3r2w... For each you have a special case.

> Conclusion: Using a smart linker, we can resolve register collision
> issues statically. We can also take profiling (feedback) information
> into account, like you suggested, to optimize the number and placement of
> save/restore points. Therefore, I see no real need to do it at run time.

The problem is doing a optimising linker will really improve the code, but 
will not work for shared code (librairie are loaded only once) and will 
certainly be really slow for big code... The mr is not as optimal as your 
proposition but didn't cost so much each time you run a application.

Cedric
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/