[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] registers



hello,

Nicolas Boulay wrote:

>Just a little speach.
>
>For my job, i work with sparc V7 clone (ERC32 from ESA) and V8 (LEON).
>This cpu have no fpu. But you could add a fpu called Meïko from Sun.
>
>Sparc V8 arch use 32 windows register (>100 registers but only 32 seeing
>in a given time).
>
AFAIK, the "windows" have a granularity of 8 registers. Only 24 
registers from the "sliding window"
can be seen at a time. I wouldn't count your >100 registers as 
interesting because only a small
fraction is used at a time.

> When you add the fpu you receive 32 registers
>dedicated to the fpu (the 32 fpu regiter are 32 bit but you could access
>it by to for 64 bit double).
>
Is it in a "sliding window" too ?

>i have ask to our expert why adding new register set and not use the
>integer register bank. I had in mind the f-cpu approach.
>
>This answer was quite simple : "to use more register" !
>
>Think about it for Fcpu !
>  
>
Think that F-CPU has 3x more "useful" registers than SPARC !
Think that the 64 registers of F-CPU amounts to the same total
register number as ALPHA 21064 or MIPS R4000.
we have 63 registers while SPARC has only 24 (+32, if what is understood 
is ok).
So the argument doesn't hold.

>That's not a bad point because there is very few case where you have to
>use integer operation on flotting point number.
>
But when you have to do it, you are often caught in a critical place 
where moving
registers from one set to another is the slowest thing, indeed because 
those smart
engineers thought that this operation doesn't occur often...

> So it's 2 class of
>number with no real operation crossover. But this is also the case for
>Fcpu : why mixing register bank for simple integer and vector ?
>  
>
If you want such a computer, use a Cray or a derived architecture.
BTW, F-CPU is not a "vector" computer, it's "SIMD" (or "very short vector",
compared to the 64 numbers per register of CRAY).

>Nowadays we too often think about 64 bits register length. so 64 == the
>biggest int number. But in fact, the most interresting vector size is
>256 bits.
>  
>
This can change in the future....
if compilers get better, if programs are better written...
It will take some time, but when it's done, then a 256-bit-only CPU
will be outperformed by more flexible computers.

>As you can see, using a register of 256 for storing an integer of 64
>bits is a big waste ! Look at our programmation api. We always have 3 or
>4 pointers in the register set, so 3/4 of the register will never be
>used 95% of the time, what a waist, don't you think ?
>  
>
hey ! speak about the waste of adding a FPU and FP registers to a 
computer that
does ints most of the time : you _lose_ 1/2 of the silicon. I would like 
to see the
_real_ use of the FPU in your spacec computers : do they do FP 50% of 
the time ?
And by the way, LEON is a single issue CPU so by definition, one half of 
the core
is not used (or available) at any time.
Compare ALPHA 21064 (just an example that i am not too ignorant to speak)
and a single-issue CPU with FPU :
 ALPHA can issue 2 instructions at a time. one FP and one INT (for example).
 this is the best case and in that condition, it achieves 100% of 
performance
 (well, i don't speak about memory access etc...)
 LEON can issue 1 inst/cycle ==> At any cycle, the decoder (and the program)
 can't use either one of int or FP units. so even in the best cases, 50% of
 silicon use is the most one can achieve.
By "silicon use", i mean : accessing data in the register set and 
computing it.
Computing can be pipelined, but accessing data is what hurts most if
there is a split register set.

Currently, FC0 is the current implementation of F-CPU.
I believe that EVEN though newer and better architectures will come,
FC0 will be used a lot for power-saving applications for example.
So yes we must think about the future but we must not harm FC0 too.
FC0 uses a single monolithic register set which is probably very slow,
but it's compact and efficient for a single-instruction pipeline.

We can later make more "intelligent" cores that will have different
register sizes and split banks, but it's much too complex now,
because we don't even do a single-issue CPU...

>I don't think we should split fp and int register but SIMD and scalar
>number.
>  
>
Cray went even further (and a lot followed) : there are 3 (sometimes 
even more) banks,
1 for ints, 1 for FP and 1 for vectors. However, the compilers are not 
in the public domain...

More importantly, i don't think it was an issue in the CRAYs but in a 
single-chip,
split sets means separate execution units. dooohoh ! this means that 
some units
will be duplicated ! more headaches...

>Why not having 2 registers bank, one SIMD, the other Scalar ? This add
>1/4 of memory content inside the cpu but double the number of usable
>registers.
>  
>
And what about the case when there is no need of SIMD ?
And what about the problems of saving the whole CPU state ?
And what about the cases when ints must be compared to SIMDs or
stuffs like that ?
And what about the necessary additional instruction that is necessary
to at least help in the last question ?
And what about the modification of the non-computational instructions
(load/store etc) ? Does this means that it creates new coding restrictions ?
(not being able to do a "GET" or "PUT" to/from SIMD ...)
And will you have the courage to modify all the manual to reflect
these "additions" ?

>I don't thing this change could will need lot of modification in
>existing code.
>  
>
of course it will ! it will affect the register allocation algorithms, 
the compilers,
the instruction set etc.... And more importantly : you will go back to 
the programming
environment of those "multimedia extensions" such as MMX, SSE, Altivec...
you will "split" the core in parts that will not be able to communicate 
as easily
as if it was unified.

Now one question :
why going back almost 4 years in the past and want to modify such a critical
characteristics of the F-CPU ISA ?
I don't think it's constructive. Let me remember you that if F-CPU was
only developped by myself, there would be at least 128 registers.....
So we have to make some "compromises" and once the difficult and
long decision is taken, it is a waste of time to discuss about it again.
Rather, it is more important to implement it. A single large set might
be underused, but the situation is often worse with split sets. I guess
that FC0 has reached a point of least return on modification. If you
really want to exercise your skills usefully, maybe you can make
a draft about FC1 ?


à ce soir,

>nicO
>  
>
YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/