[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Status quo

2015-03-30 13:05 GMT+02:00 Nikolay Dimitrov <picmaster@xxxxxxx>:
Hi Nicolas,

On 03/30/2015 11:28 AM, Nicolas Boulay wrote:
2015-03-29 6:26 GMT+02:00 <whygee@xxxxxxxxx
<mailto:whygee@xxxxxxxxx>>: ...

Le 2015-03-28 19:05, Nikolay Dimitrov a Ãcrit :

Be carefull, since 10 years, i add works on TI omap 3. Coprocessor
use have many stupid drawback. Embeded code for some cryptographique
function use coprocessor when the paquet are big, and the cpu when
the paquet are too small. When you use coprocessor, you loose a lot
of time in communication and register setting. Beside that, there is
always a missing mode or feature, so cpu must then be used. I think
that a kind of specialised (one cycle) instructions are prefered,
even with it's own "power domain".

I agree that each and every SoC has its quirks, and vendors doesn't seem
to be too enthusiastic of releasing newer silicon revisions of their
buggy chips. But the fact that a specific implementation has issues
doesn't allow us to generalize that the idea is itself bad.

There are tons of ways to design and integrate a coprocessor or
accelerator. The only one which works well is when:
- Hardware and software teams are working *together* to define all
interfaces and the actual implementation
- The design is evaluated against tons of real-world workloads and the
test results are fed back to the design team, to close the loop and make
sure the design works well (whatever this means)

You forgot setup and programmation time You forgot that you _need_ OS support, not only an assembler that issue the good instruction. So you need a way to share the device between program, it will be hard to virtualized. When you use it, you use the DRAM bandwith or internal bus/network bandwith, and you don"t have always the same kind of cache system as for a cpu. If the data will be needed by the cpu, there are few chance to be in cache.

Specific instruction did not have any of those drawback, it's even smaller and always more efficient than a coprocessor. You will only loose power consomption if you can't cut its power, when the operator is not needed.

This can work well in the F-CPU case. The (few left) enthusiast
behind F-CPU are a unified hw/sw team, we want and need to test any
future implementations on real workload. We can also release fixes for
the FPGA faster than an ASIC re-spin, thus avoid scraping expensive
ASICs, or even worse - being forced to make and support workarounds for
years to come).

HW and SW time are not the same. SW stay much longer, with fast developing cycle. HW stay less longer with slower developement cycle.

You will have a hard time, if you need to synchronised both. Apple kill nokia, because nokia develop the SW as HW, starting from almost zero for each device. iOS was quite independante of the device.