[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Status quo



Le 2015-03-30 10:28, Nicolas Boulay a ÃcritÂ:
Be carefull, since 10 years, i add works on TI omap 3. Coprocessor use
have many stupid drawback.
I think that the term "coprocessor" is a very broad and it used for a very wide spectrum of circuits, more or less specific (like packet accelerator vs
GPGPU), with different interfaces.

Embeded code for some cryptographique
function use coprocessor when the paquet are big, and the cpu when the
paquet are too small. When you use coprocessor, you loose a lot of
time in communication and register setting.
setup/start time is a common problem, I agree

Beside that, there is
always a missing mode or feature, so cpu must then be used. I think
that a kind of specialised (one cycle) instructions are prefered, even
with it's own "power domain".

"one cycle" is not really possible with modern ultra-pipelined CPU.
"one instruction" can be done if it is sufficiently generic (let's
keep it RISC) and small. For other purposes, the choice of interface
depends indeed on the size of the dataset. In your example of AES,
it fits well into fixed blocks of 256 bits and it can be a pretty
intensively used feature so it makes sense for Intel to add it,
as well as other features like CRC32 and RNG.

For more complex operations, there is the big old hairy complex problem
of interruptions, errors and restart. AES, CRC and RNG don't throw
errors. Other "complex accelerators" might encounter problems,
for example TCP/IP.

I think I have found an interesting approach, that look a bit like
Intel's Larrabee. I'm defining and refining it these days...
The good part is that the existing instruction set is already
designed for massive SIMD sor there would be a single unified
instruction set across the system, tasks could be spawned
like function calls, and moved across cores to match the
requested features.

Use of shaders for HPC is also decreasing. It's always harder to make
code fast on shader compare to a cpu. Latest Xeon core have 20 cores
and 256 bits SIMD. That's almost the same speed than big GPU.
The latest multi-vector single-chip for NEC is a crazy beast too...

The flexibility of CPU programmation make the writing of fast code even
simplier.
it's relative... I think it's always hard to do things right :-D

Nicolas
yg


Le 2015-03-30 13:05, Nikolay Dimitrov a Ãcrit :
There are tons of ways to design and integrate a coprocessor or
accelerator. The only one which works well is when:
- Hardware and software teams are working *together* to define all
interfaces and the actual implementation
- The design is evaluated against tons of real-world workloads and the
test results are fed back to the design team, to close the loop and make
sure the design works well (whatever this means)

This can work well in the F-CPU case. The (few left) enthusiast
behind F-CPU are a unified hw/sw team, we want and need to test any
future implementations on real workload. We can also release fixes for
the FPGA faster than an ASIC re-spin, thus avoid scraping expensive
ASICs, or even worse - being forced to make and support workarounds for
years to come).

Agreed.

* cohesive work is important. I should find/make a better environment
than this pre-Y2K mailing list...

* tests against real projects is critical as well, but this means that
we have actual projects, with a deliverable... For now, in my work,
I only have actual application for the 16/32 bits microcontroller.
I can put the YASEP in small cheap FPGAs to add some intelligence
to devices that fit my clients' needs.

But who needs a F-CPU today ?

I have understood, with the YASEP, that the actual architecture
is not what matters in the end. It's important for the subscribers
of this list because it's what we have to deal with in our work,
but what's the final purpose ?

The design needs to be bootstrapped in many ways.
An ideal project would be sufficiently well funded,
have no stringent delivery time, require little material investment,
would be low-volume, have many possible evolution paths,
and would provide something that would excite even people
who don't care about CPU design, so they would want to eventually
buy one.

Writing this reminds me that I had examined such a system, indeed !
http://ygdes.com/HSF2009/HSF2009_GPL.html
Short version : a handled game console fits most of the criteria I listed above
 - small, so relatively inexpensive,
 - graphics require tons of computations, we can hack accelerators too,
 - low-power, real-time, embedded design
 - has a "kernel" side and runs third-party "applications" (games)
I don't say we should start a gaming company. I mean :
this is the kind of realistic target system that could
both prove the points we make with F-CPU (freedom etc.)
AND test our designs against.

Any idea of another application ?
Sound processing ?
One friends told me about his project of a flight controller
but I don't think it fits most of our criteria ;-)

But I totally agree that having smarter CPU cores is always a big
benefit to have.
what is "smart" ?

And still there are some situation where you would
need not just high performance, but incredibly high performance, while
satisfying some ugly constraints (power budget, thermal design, price,
integration). In such cases you are forced to rely on
application-specific accelerators - video compression/decompression
unit, GPU, image scaling, color space conversion, hardware video
overlays with alpha-blending, asynchronous audio sample rate
conversion, raw NAND ECC controller, DMA controllers. All of these
tasks can be implemented in software, but it will be either less
efficient, or it would be impossible to meet requirements.

yup.

Regards,
Nikolay
thanks,
yg
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/