[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Status quo



Hello,

On Tue, Mar 31, 2015 at 5:25 PM, Nikolay Dimitrov <picmaster@xxxxxxx> wrote:
> On 03/31/2015 05:13 PM, Cedric BAIL wrote:
>> On Tue, Mar 31, 2015 at 1:02 AM, Nikolay Dimitrov <picmaster@xxxxxxx>
>> wrote:
>>>
>>> On 03/30/2015 11:30 PM, Cedric BAIL wrote:
>>>>
>>>> On Mon, Mar 30, 2015 at 9:38 PM,  <whygee@xxxxxxxxx> wrote:
>>>>>
>>>>> Le 2015-03-30 14:21, Nicolas Boulay a Ãcrit :
>>>>>>
>>>>>> 2015-03-30 13:42 GMT+02:00 <whygee@xxxxxxxxx>:
>>>>
>>>> For the "crypto" stuff, I guess that AES and CRC are the basis
>>>> things you need. I would argue that helper for ECC is also very
>>>> useful this day ( https://tools.ietf.org/html/rfc5656 ). Them
>>>> being accessible from outside of the kernel space is an absolute
>>>> must (That's why instruction are usually better suited for that
>>>> task).
>>>
>>>
>>> Linux CAAM and cryptodev provide such device abstraction.
>>
>>
>> Not really if I remember correctly as they require a systemcall to
>> actually access the device. This can have a huge impact on what and
>> when you can use them.
>
> It's a virtual device, accessed by ioctls. You can have 0 or more
> physical devices abstracted by the driver, the decision how to implement
> the interface is yours. The only limitation which always applies is to
> make sure the driver supports multiple contexts (users) at the same
> time. But again I agree, the crypto stuff can be implemented anywhere,
> including in userspace - custom library, openssl, UIO, wherever you want.

That's exactly the issue. Having a device requiring a syscall to
access it kill performance massively. This means you are going to use
it only for large buffer and stick to a CPU implementation for smaller
buffer. That is why it is part of the ISA on Intel chip.

>>>> Actually one of the thing that I would have loved to see many
>>>> times is a way to tell the MMU to map a memory area as a tile.
>>>> Using it linearly, but the physical memory being spread by block
>>>> over a surface. I am pretty sure, I am not clear here, but the
>>>> main purpose is to be able to swizzle memory for no cost before
>>>> uploading it to the GPU. So you would just allocate the memory
>>>> using a specific syscall and that memory that appear linear to
>>>> the process, could directly be used by a GPU texture unit. You
>>>> would be able to directly decompress a JPEG for example in memory
>>>> and the GPU could use that memory directly with no cost and with
>>>> the best possible frame rate. Additional benefit you could use
>>>> that same texture with a software renderer and benefit from
>>>> better cache locality. Of course that require cooperation with
>>>> GPU block...
>>>
>>> Isn't this what Linux DRI2 is already doing?
>>
>> No, DRI is a direct path to send GPU command from a user space
>> application (It actually require still an interception and analysis
>> by the kernel before being send to the GPU). Here I am talking about
>> the texture upload operation, which usually require to convert the
>> memory layout before uploading/exposing it to the GPU.
>
> DRI uses memory managed by DRM, which does exactly what you need.

Absolutely not ! The memory allocation sure, but it never provide a
simple way to adapt to the GPU memory layout. That's why uploading
texture is usually heavy on CPU as they have to memcpy every pixel in
a different layout before uploading them to the GPU or they endup on
the GPU in a format that will be noticably slower. I have never seen a
system where you could get a linear looking memory on the CPU side via
the MMU and a tile based one in physical memory.
-- 
Cedric BAIL
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/