[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Rep:Re: [f-cpu] registers



-----Message d'origine-----
De: Yann Guidon <whygee@f-cpu.org>
A: f-cpu@seul.org
Date: 03/10/02
Objet: Re: [f-cpu] registers

hello,

Nicolas Boulay wrote:

>Just a little speach.
>
>For my job, i work with sparc V7 clone (ERC32 from ESA) and V8 (LEON).
>This cpu have no fpu. But you could add a fpu called Meïko from Sun.
>
>Sparc V8 arch use 32 windows register (>100 registers but only 32
seeing
>in a given time).
>
AFAIK, the "windows" have a granularity of 8 registers. Only 24 
registers from the "sliding window"
can be seen at a time. I wouldn't count your >100 registers as 
interesting because only a small
fraction is used at a time.

>>>>there is 32 register at a time, 24 inside the windows and 8 globals.
Windows register have the interrest to have lithning fast call
convention, that is much quicker that the mess of the fcpu needed for a
call. But there is the trap handler problem.

> When you add the fpu you receive 32 registers
>dedicated to the fpu (the 32 fpu regiter are 32 bit but you could
access
>it by to for 64 bit double).
>
Is it in a "sliding window" too ?

>>>nop, it's completly appart.

>i have ask to our expert why adding new register set and not use the
>integer register bank. I had in mind the f-cpu approach.
>
>This answer was quite simple : "to use more register" !
>
>Think about it for Fcpu !
>  
>
Think that F-CPU has 3x more "useful" registers than SPARC !
Think that the 64 registers of F-CPU amounts to the same total
register number as ALPHA 21064 or MIPS R4000.
we have 63 registers while SPARC has only 24 (+32, if what is understood

is ok).
So the argument doesn't hold.

>>>i never try to compare f-cpu and the little 50000 gate LEON wich run
at 200 Mhz or the ERC32 à 25 Mhz ! I try to compare the philosophie of
the use of the register. They think we have only 5 bits for register
adressing so let split the bank to have more space.

>That's not a bad point because there is very few case where you have to
>use integer operation on flotting point number.
>
But when you have to do it, you are often caught in a critical place 
where moving registers from one set to another is the slowest thing,
indeed because those smart engineers thought that this operation doesn't
occur often...

>>>Even with a single register set you will need to manipulate the
vector because you can't access every chunk individualy !

> So it's 2 class of
>number with no real operation crossover. But this is also the case for
>Fcpu : why mixing register bank for simple integer and vector ?
>  
>
If you want such a computer, use a Cray or a derived architecture.
BTW, F-CPU is not a "vector" computer, it's "SIMD" (or "very short
vector",
compared to the 64 numbers per register of CRAY).

>>>Or compare to the 144kb register of NEC ESS supercomputer a little
bit newer than Cray (
http://www.nec-ess.com/newsroom/attachments/SX-6-Single-node.pdf ).

>Nowadays we too often think about 64 bits register length. so 64 == the
>biggest int number. But in fact, the most interresting vector size is
>256 bits.
>  
>
This can change in the future....
if compilers get better, if programs are better written...
It will take some time, but when it's done, then a 256-bit-only CPU
will be outperformed by more flexible computers.

>>>This size as been found as a good size for double intensive
application. With more chunk in a register you couldn't use the vector
any more, if there is too much dependancies.
When the time will come why keeping binary compatibility is important ?

>As you can see, using a register of 256 for storing an integer of 64
>bits is a big waste ! Look at our programmation api. We always have 3
or
>4 pointers in the register set, so 3/4 of the register will never be
>used 95% of the time, what a waist, don't you think ?
>  
>
hey ! speak about the waste of adding a FPU and FP registers to a 
computer that
does ints most of the time : you _lose_ 1/2 of the silicon. I would like

to see the
_real_ use of the FPU in your spacec computers : do they do FP 50% of 
the time ?

>>>I forgot that you are an expert in SCAO and realtime computing...:-/
That's true the leader in space insdustrie in Europe have only stupid
expert, i forgot, that...

And by the way, LEON is a single issue CPU so by definition, one half of
the core is not used (or available) at any time.
Compare ALPHA 21064 (just an example that i am not too ignorant to
speak) and a single-issue CPU with FPU :
 ALPHA can issue 2 instructions at a time. one FP and one INT (for
example).
 this is the best case and in that condition, it achieves 100% of 
performance
 (well, i don't speak about memory access etc...)
 LEON can issue 1 inst/cycle ==> At any cycle, the decoder (and the
program)
 can't use either one of int or FP units. so even in the best cases, 50%
of silicon use is the most one can achieve.

>>>I beleive that operator in f-cpu doesn't shared much of the silicon,
so when things are computed 80% of the chip sleep...

By "silicon use", i mean : accessing data in the register set and 
computing it.
Computing can be pipelined, but accessing data is what hurts most if
there is a split register set.


Currently, FC0 is the current implementation of F-CPU.
I believe that EVEN though newer and better architectures will come,
FC0 will be used a lot for power-saving applications for example.

So yes we must think about the future but we must not harm FC0 too.
FC0 uses a single monolithic register set which is probably very slow,
but it's compact and efficient for a single-instruction pipeline.

>>>if it slow it's not efficient. "simple", i prefer.

We can later make more "intelligent" cores that will have different
register sizes and split banks, but it's much too complex now,
because we don't even do a single-issue CPU...

>I don't think we should split fp and int register but SIMD and scalar
>number.
>  
>
Cray went even further (and a lot followed) : there are 3 (sometimes 
even more) banks,
1 for ints, 1 for FP and 1 for vectors. However, the compilers are not 
in the public domain...

>>> actual x86 arch are even worse, almost every register haven't the
same usefullness...

More importantly, i don't think it was an issue in the CRAYs but in a 
single-chip,
split sets means separate execution units. dooohoh ! this means that 
some units
will be duplicated ! more headaches...

>>>> ???? i don't think why, you could reused what you want !

>Why not having 2 registers bank, one SIMD, the other Scalar ? This add
>1/4 of memory content inside the cpu but double the number of usable
>registers.
>  
>
And what about the case when there is no need of SIMD ?

>>>only 1/4 of the waste. 

And what about the problems of saving the whole CPU state ?

>>>Not really the hard point for me.

And what about the cases when ints must be compared to SIMDs or
stuffs like that ?

>>>>i don't understand why not try to access the 2bank for the same op.

And what about the necessary additional instruction that is necessary
to at least help in the last question ?
And what about the modification of the non-computational instructions
(load/store etc) ? Does this means that it creates new coding
restrictions ?
(not being able to do a "GET" or "PUT" to/from SIMD ...)

And will you have the courage to modify all the manual to reflect
these "additions" ?

>>> the bigger side effect is when you use mixed instruction. But this
could be detected. To know which register to load.


>I don't thing this change could will need lot of modification in
>existing code.
>  
>
of course it will ! it will affect the register allocation algorithms, 
the compilers, the instruction set etc.... 

>>> in the expression "existing code", there is the word "existing" from
the verbe to exist, which means that the code is still there in
f-cpu.seul.org. Does i miss a complete port of gcc ?

And more importantly : you will go back to the programming environment
of those "multimedia extensions" such as MMX, SSE, Altivec...

>>>not really.

you will "split" the core in parts that will not be able to communicate
as easily as if it was unified.

>>> You think about think i didn't mention.

Now one question :
why going back almost 4 years in the past and want to modify such a
critical
characteristics of the F-CPU ISA ?
I don't think it's constructive. Let me remember you that if F-CPU was
only developped by myself, there would be at least 128 registers.....

>>>Not enought space in the 32 bits instruction world. Big register set
are really interresting to decrease memory pressure and vector registor
could be used as preload area in really no-SIMD code.

So we have to make some "compromises" and once the difficult and
long decision is taken, it is a waste of time to discuss about it again.
Rather, it is more important to implement it. A single large set might
be underused, but the situation is often worse with split sets. I guess
that FC0 has reached a point of least return on modification.

>>>the usual speach...

 If you really want to exercise your skills usefully, maybe you can make
a draft about FC1 ?

>>> I should do that to avoid to be to much frustrated.
nicO

à ce soir,

>nicO
>  
>
YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/
______________________________________________________________________
Etudiant: Wanadoo t'offre le Pack eXtense Haut Débit soit 150,92 euros
d'économies ! Clique ici : http://www.ifrance.com/_reloc/mail.etudiant 

______________________________________________________________________
Etudiant: Wanadoo t'offre le Pack eXtense Haut Débit soit 150,92 euros
d'économies ! Clique ici : http://www.ifrance.com/_reloc/mail.etudiant


*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/