[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] SIMD register



hi,

Nicolas Boulay wrote:
> After reading a idct MMX code i don't think it really easy to use simd
> register without knowing there size. Such idct use 8 word chunk because
> it fill the right data.

if you read Intel's documents, or code designed for Intel computers,
i understand your doubts. However, some Zen, a clear and clean mind
and some patience will help you read the "basic" books in a different
way.

I have already started writing a DCT code, optimised for F-CPU.
It is surprisingly easy when you know a few tricks. And more importantly,
i had started from an already good-looking code, so it was almost
straight-forward.

> A study on theoretical cpu on a new and simple algorythme of compiling
> to use vector instruction on spec program give it's maximum around
> 128-256 bits register (4*64 bits float). With bigger register inter
> register dependancies increase and code speed DECREASE.
> 
> So size independant code will slow done even more the code.

I don't agree with you because you assume that the study is perfect.
It is based on prototype code, in very specific conditions and the algo
is probably badly chosen. Add to that that the memory system is probably
not adapted, and you see that this is probably a misleading result.

Don't forget that in the past, most people said "32-bit registers
are too wide, we don't need all these bits" or "8 registers are enough
for any algorithm". Since then , the balance and architecture of the computers
have radically changed : it's not wise to say that we won't need 256-bit
registers in the future. If the use of embedded DRAM increases, your study
might well become a geek's joke.

Most importantly, nobody today writes code that is independent from
the platform (except in C where the size of the ints is unknown).
So it's easy to say now that size-independent code is not worth.
However, with a few programming habits, you could write once a
code that can be executed as is and as fast as possible on any compliant
platform. It's "just" a matter of complying with a programming model,
so you don't have to touch old code.

I think that FC0 is easily scalable to 256-bits and 2 instructions per cycle.
when such a CPU will be implemented, we will have already started FC1, i guess.
But we will have to deal with 3 kinds of codes, if i follow your idea correctly.
There is however a simpler solution : in computation-intensive code
(or bandwidth-stressing code), use the maximum width register (in SIMD mode).
execute a "get SR_MAX_SIZE, rd" and divide your loop counter by rd
(or play with the loop counter, substracting rd instead of just 1).
This way, your code can compile and execute on any version of the CPU.

I don't think it's too complex to do.

> nicO
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/