[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Status quo

Hi Nicolas,

On 03/30/2015 11:28 AM, Nicolas Boulay wrote:
2015-03-29 6:26 GMT+02:00 <whygee@xxxxxxxxx
<mailto:whygee@xxxxxxxxx>>: ...

Le 2015-03-28 19:05, Nikolay Dimitrov a Ãcrit :

Hi Yann,

Even if you implement such CPU, it would be incredibly expensive and
in most of the use cases the users won't use most of its features.
Talk about inefficiency.

Today all high-performance designs have either a co-processor or
special application accelerator (usually an IP-core attached to one
of the buses). For your case, you can design an arbitrary precision
BCD accelerator. This will both allow you to calculate faster, and
leave the CPU design... well, generic :D.

You raise the question of coprocessors and it's a very important
matter that also influences my new "vision" of the F-CPU evolution.
More and more designs couple a CPU with coprocessors (more or less
generic ones) and GPGPU is spreading all over the map.

Be carefull, since 10 years, i add works on TI omap 3. Coprocessor
use have many stupid drawback. Embeded code for some cryptographique
function use coprocessor when the paquet are big, and the cpu when
the paquet are too small. When you use coprocessor, you loose a lot
of time in communication and register setting. Beside that, there is
always a missing mode or feature, so cpu must then be used. I think
that a kind of specialised (one cycle) instructions are prefered,
even with it's own "power domain".

I agree that each and every SoC has its quirks, and vendors doesn't seem
to be too enthusiastic of releasing newer silicon revisions of their
buggy chips. But the fact that a specific implementation has issues
doesn't allow us to generalize that the idea is itself bad.

There are tons of ways to design and integrate a coprocessor or
accelerator. The only one which works well is when:
- Hardware and software teams are working *together* to define all
interfaces and the actual implementation
- The design is evaluated against tons of real-world workloads and the
test results are fed back to the design team, to close the loop and make
sure the design works well (whatever this means)

This can work well in the F-CPU case. The (few left) enthusiast
behind F-CPU are a unified hw/sw team, we want and need to test any
future implementations on real workload. We can also release fixes for
the FPGA faster than an ASIC re-spin, thus avoid scraping expensive
ASICs, or even worse - being forced to make and support workarounds for
years to come).

F-CPU was meant for "high performance" and number crunching, but
this is and should now be relegated to the array of generic
coprocessors. This leaves the real "application" to the F-CPU, this
is how and why I aim at a leaner FC0.

Use of shaders for HPC is also decreasing. It's always harder to
make code fast on shader compare to a cpu. Latest Xeon core have 20
cores and 256 bits SIMD. That's almost the same speed than big GPU.
The flexibility of CPU programmation make the writing of fast code
even simplier.


Well, it looks like balance in the nature is still preserved - smarter CPUs, less smart developers :D.

But I totally agree that having smarter CPU cores is always a big benefit to have. And still there are some situation where you would need not just high performance, but incredibly high performance, while satisfying some ugly constraints (power budget, thermal design, price, integration). In such cases you are forced to rely on application-specific accelerators - video compression/decompression unit, GPU, image scaling, color space conversion, hardware video overlays with alpha-blending, asynchronous audio sample rate conversion, raw NAND ECC controller, DMA controllers. All of these tasks can be implemented in software, but it will be either less efficient, or it would be impossible to meet requirements.

To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/