[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Status quo



Hello,

On Mon, Mar 30, 2015 at 9:38 PM,  <whygee@xxxxxxxxx> wrote:
> Le 2015-03-30 14:21, Nicolas Boulay a Ãcrit :
>> 2015-03-30 13:42 GMT+02:00 <whygee@xxxxxxxxx>:
> Another kind of pure HW accelerator is JPEG/MPEG block DCT "accelerator".

That's a subject I have been playing for now a decade with by writing
a 2D toolkit. I have yet to find a setup where using one of those JPEG
block was actually useful. In the best case scenario they save a
little bit of battery, but the pain of maintenance of software is way
bigger than the little win it provide you (Especially when you make a
huge effort to open a file once and share it across process). Even
Intel with their libva doesn't really help much. It's more something
that you use for the fun than for some real reason.
  Note that's only true for still image, as soon as you start having
an animation/movie, a dedicated block make sense and is actually
usefull... But you still do implement many codec in software as new
codec do appear almost every day and you need to decode them. That's
where Intel CPU have a serious lead compared to ARM one as you can
find optimized assembly code for almost any new codec. Providing block
that are easy to use from the CPU for any codec is useful. It being an
instruction or an "external" block is not really the issue here. If
you are able to easily detect it and use it, then that will be fine.

  For the "crypto" stuff, I guess that AES and CRC are the basis
things you need. I would argue that helper for ECC is also very useful
this day ( https://tools.ietf.org/html/rfc5656 ). Them being
accessible from outside of the kernel space is an absolute must
(That's why instruction are usually better suited for that task).

  Actually one of the thing that I would have loved to see many times
is a way to tell the MMU to map a memory area as a tile. Using it
linearly, but the physical memory being spread by block over a
surface. I am pretty sure, I am not clear here, but the main purpose
is to be able to swizzle memory for no cost before uploading it to the
GPU. So you would just allocate the memory using a specific syscall
and that memory that appear linear to the process, could directly be
used by a GPU texture unit. You would be able to directly decompress a
JPEG for example in memory and the GPU could use that memory directly
with no cost and with the best possible frame rate. Additional benefit
you could use that same texture with a software renderer and benefit
from better cache locality. Of course that require cooperation with
GPU block...
  Well, in fact anything that make it possible to increase the
available memory bandwidth is going to help here. Most 2D rendering
operation are just limited by that, only up and down smooth scaling
are CPU intensive enough that you may be able to use 2 cores before
you saturate your memory. As a reference, using a C implementation to
draw RLE glyphs is faster than any MMX/NEON code when the glyphs
vertical size is above 20 pixels. I don't know what kind of other lite
compression we could do for improving the available memory bandwidth.

  Also I am not sure we want to tackle the subject of GPU here, but
still I think it is worth looking at Vulkan
(http://en.wikipedia.org/wiki/Vulkan_%28API%29) and SPIR-V in
particular (http://en.wikipedia.org/wiki/Standard_Portable_Intermediate_Representation).
The later one is likely going to become a potential backend for OpenGL
and OpenCL, so maybe it makes sense to study it and take that into
account while designing any interaction with external block.
-- 
Cedric BAIL
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/