[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] New suggestion about call convention




Hi,

Just my two cents... About the idea that register optimization
could be done at linking time. I don't know if F-CPU is meant
to work as an isolated architecture, with only dedicated tools.
But if you want to benefit from existing software, you must think
in terms of compatibility with the Gnu tools.

That means :
- compiler capabilities
- linker capabilities
- object file format (ELF, etc.)

I'd bet (but I may be wrong ;-)) that none of those are ready
for such a thing as link-time or load-time register reallocation.
I'd also bet that if nobody does it at the moment, maybe it's
because there are serious problems with it (despite the fact that
compilation time is cheap so there is much incentive to optimize
as far as possible in this phase). For once the optimization
problem itself doesn't seem very simple, especially when programs
get big.


An example is the mysql daemon (which is a respectable database,
but not a big and featureful monster as Postgres or Oracle).

[root@fsol antoine]# strip -s /usr/sbin/mysqld 
[root@fsol antoine]# ls -la /usr/sbin/mysqld 
-rwxr-xr-x    1 root     root      1957780 nov  5 21:38 /usr/sbin/mysqld*


This is quite a big executable indeed. Of course there is not only
code. Given that stuff other than code (strings, empty data...) is
likely to be highly compressable, we can still estimate an order of
magnitude of the actual code :

[antoine@fsol antoine]$ gzip -c /usr/sbin/mysqld > mysqld
[antoine@fsol antoine]$ ls -la mysqld 
-rw-rw-r--    1 antoine  antoine    871030 nov  5 21:48 mysqld

So we can, quite safely, assume that the mysqld code (again, not
that a big database program compared to others) is several hundreds
of kilobytes big (and this would probably be worse on a RISC
machine, with lots of small instructions using lots of registers !).
I don't think it's reasonable to expect processing some global
register allocation (how do you do that ? ;-)) on such a big
piece of code. Also, keep in mind that registers may have a
special role and not be easily exchangeable (not really the case
in F-CPU, except for instructions that also write to the register 
next to the output register).

(note : mysqld is a practical example because it is completely
standalone - even a custom glibc is statically linked -, so there
is no "hidden cost")


Another strong argument is that you can't even be sure that there
is a good solution to the optimization problem (even if you have
invented an outstanding algorithm that makes you sure to find the
optimum, and maybe it also helps you demonstrate that P==NP at the
same time :-)). The optimum may be not much better than the common
case.

For example, if you have three functions (A, B, C) that each
allocate 32 different registers. B calls C. A calls in turn :
C then B. So sometimes C is called from B which is called from
A, sometimes C is called directly from A. How do you think the
optimal solution will look like, and will that optimality be
satisfactory ?

(if you optimize C for being called by B, it won't be optimized
for being called by A, and vice-versa ; of course, you can 
duplicate C in two versions but beware of cache thrashing, given
that this is a very simple example)



Antoine.


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/