[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Status quo



Hi Cedric,

On 03/31/2015 06:37 PM, Cedric BAIL wrote:
Hello,

On Tue, Mar 31, 2015 at 5:25 PM, Nikolay Dimitrov <picmaster@xxxxxxx>
wrote:
On 03/31/2015 05:13 PM, Cedric BAIL wrote:
On Tue, Mar 31, 2015 at 1:02 AM, Nikolay Dimitrov
<picmaster@xxxxxxx> wrote:

On 03/30/2015 11:30 PM, Cedric BAIL wrote:

On Mon, Mar 30, 2015 at 9:38 PM,  <whygee@xxxxxxxxx> wrote:

Le 2015-03-30 14:21, Nicolas Boulay a Ãcrit :

2015-03-30 13:42 GMT+02:00 <whygee@xxxxxxxxx>:

For the "crypto" stuff, I guess that AES and CRC are the
basis things you need. I would argue that helper for ECC is
also very useful this day (
https://tools.ietf.org/html/rfc5656 ). Them being accessible
from outside of the kernel space is an absolute must (That's
why instruction are usually better suited for that task).


Linux CAAM and cryptodev provide such device abstraction.


Not really if I remember correctly as they require a systemcall
to actually access the device. This can have a huge impact on
what and when you can use them.

It's a virtual device, accessed by ioctls. You can have 0 or more
physical devices abstracted by the driver, the decision how to
implement the interface is yours. The only limitation which always
applies is to make sure the driver supports multiple contexts
(users) at the same time. But again I agree, the crypto stuff can
be implemented anywhere, including in userspace - custom library,
openssl, UIO, wherever you want.

That's exactly the issue. Having a device requiring a syscall to
access it kill performance massively. This means you are going to
use it only for large buffer and stick to a CPU implementation for
smaller buffer. That is why it is part of the ISA on Intel chip.
Actually one of the thing that I would have loved to see
many times is a way to tell the MMU to map a memory area as a
tile. Using it linearly, but the physical memory being spread
by block over a surface. I am pretty sure, I am not clear
here, but the main purpose is to be able to swizzle memory
for no cost before uploading it to the GPU. So you would just
allocate the memory using a specific syscall and that memory
that appear linear to the process, could directly be used by
a GPU texture unit. You would be able to directly decompress
a JPEG for example in memory and the GPU could use that
memory directly with no cost and with the best possible frame
rate. Additional benefit you could use that same texture with
a software renderer and benefit from better cache locality.
Of course that require cooperation with GPU block...

Isn't this what Linux DRI2 is already doing?

No, DRI is a direct path to send GPU command from a user space
application (It actually require still an interception and
analysis by the kernel before being send to the GPU). Here I am
talking about the texture upload operation, which usually require
to convert the memory layout before uploading/exposing it to the
GPU.

DRI uses memory managed by DRM, which does exactly what you need.

Absolutely not ! The memory allocation sure, but it never provide a
simple way to adapt to the GPU memory layout. That's why uploading
texture is usually heavy on CPU as they have to memcpy every pixel
in a different layout before uploading them to the GPU or they endup
on the GPU in a format that will be noticably slower. I have never
seen a system where you could get a linear looking memory on the CPU
side via the MMU and a tile based one in physical memory.

Ahh, now I see. Thanks for explaining. I don't see a solution that can
provide transparent access to this data. It's probably possible to use
something like video-data ordering adapter (something like weird DMAC)
that can quickly convert data between the GPU and linear formats and
back on request.

Regards,
Nikolay
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/