Re: [pygame] pygame.pixelcopy.array_to_surface(Dest, Src) not working

On 25 December 2016 at 05:10, Ian Mallett <ian@xxxxxxxxxxxxxx> wrote:

Even if I'll write commercial arcade game, it would not suffer
much from being 8 bit color. Most of the the time it is not
about smooth gradients, it is about action and fun.

What I mean is, you can have an "8-bit" game emulated using 24-bit graphics. You just only use 256 (or fewer) colors instead of the whole 16777216. Once again, this is more performant, both in software (SDL, PyGame) and in hardware (GL, Vulkan).

Sure. the question is however how the data is initially generated.
E.g. if I visualize numeric data, I can transform the palette, or
switch between palette pages. So it is an elegant model
for many cases, one cannot just ignore it.
It is all very easy to implement in numpy, but I just wonder if one could
benefit from hardware acceleraton for color mapping.

> The copy buffer trick you're using looks a little suspect to me.

But since there is no other method which is comliant with numpy order,
and as I see now, it works as fast as pixelcopy.
Look what I mean, if I use get_buffer() and then I simply copy a
memory chunk from one place to another(?), it works and the result
shows up correctly. So one numpy *row* is mapped to one image *row*
on the screen.

> In particular, if the internal layouts differ at all (for example, if the NumPy
> array is contiguous, and the SDL surface has row padding, both of which are likely)
> then the copy will fail (fatally crash) in some cases. I don't know whether PyGame
> is smart enough to anticipate that. Odd width/height surfaces,
> and especially truecolor surfaces, would be a reassuring test.

Yes there are a lot of alignment nuances here. As as understand
GPU surface must be memory aligned, so odd shapes must be padded
into aligned rectangels. Especially 24 bit formats makes it
even harder to convert.

It's not alignment so much (IIRC SDL uses malloc on uint8, so it's byte-aligned) as row stride. If the row size doesn't work out to a nice number, then SDL will insert padding bytes. So copying from a contiguous array might create weird striated patterns, while copying from a surface could actually segfault the underlying NumPy backend. So it's not just inelegant--it's unsafe.

Once again, I don't know if PyGame is smart enough to detect this and handle it in the buffer API (and if it did handle it, it would require an allocation+copy, which would make it slow), so I would err on the side of caution.

I have tried now: buffer write causes error if the whole array bytesize is bigger than surface buffer:
> ValueError: 'buffer' object length is too large

So it is probably not so dangerous and if use correct shape and know how what you
are doing, nothing bad should happen. As for padded row length, well one

would just use arrays of specific shapes according to surface format.

On Sat, Dec 24, 2016 at 5:12 PM, Mikhail V <mikhailwas@xxxxxxxxx> wrote:
Probably there is more criterias here that I am not aware of
and objective arguments to prefer "FORTRAN" order, apart
from having more traditional [x,y] notation?
The argument I think comes from building/slicing matrices out of (column) vectors. You see this a lot in numerical work. If the row is of pointers, you can build sparse systems that reference underlying vector without doing any copying (you can do this with row data instead, but then you need row vectors, and that would be morally wrong). This is important since building sparse systems can be very slow if you're not careful.

I still avoid FORTRAN order because it's not mathy. E.g., the matrix element "a_{0,2}" should be accessed as "a[0][2]". For an objective argument, I'll note that graphics hardware--in particular VGA/VBE hardware, which influenced latter standards, e.g. HDMI--is row-major, top-to-bottom raster order. This has been hugely influential, and is more-or-less expected today by graphics programmers. It explains everything from most windowing systems today having GUI controls at the top and left, to why GL takes padded scanlines as texture input.

One way or another, at this point, changing the order in PyGame is probably a bad idea (backwards compatibility and suchlike). At the very least, it would needs to be deferred to a major update with breaking API changes.

So you kind of agree, that surfarray/pixelcopy should better deal with C order?

I am curious, if it is worth proposing adding methods which do so.
I agree, one should not touch the existing API.

Now I have tested the performance one more time, namely
comparing 3 variants to copy data from array to surface:
1.     buf = Dest.get_buffer()
      buf.write(Src.tostring(), 0)
2.     pygame.pixelcopy.array_to_surface(Dest, Src)
3.     pygame.pixelcopy.array_to_surface(Dest, Src.T)

And it turned out that I was wrong about transpose being expensive.
Actually transpose itself does not add significant overhead. First time
I was testing it, I did something wrong.

For method 2. if I define order="FORTRAN" for original array,
there is no difference in comparison to 3. But if I leave default (C)
order then the performance degrades with bigger arrays

(ca. 20% slower by 800x600 8bit array).

So it is indeed important thing.

Most interesting that 1. method with buffer write seems to be always faster
than others, by ca. 5%. Not a big win, but still interesting...

And if I try it with FORTRAN order, it becomes 2 times slower!

So I would still look forward to having methods dealing with C order,
just to avoid writing extra transposing and full compliance
with default numpy notation.

Any comments or opinions about it?

It would be good to know first, which of those things
people use more often and make some use case examples.

Mikhail