Re: [pygame] pygame.pixelcopy.array_to_surface(Dest, Src) not working

On Sun, Dec 25, 2016 at 12:53 PM, Mikhail V <mikhailwas@xxxxxxxxx> wrote:

On Sat, Dec 24, 2016 at 5:12 PM, Mikhail V <mikhailwas@xxxxxxxxx> wrote:
Probably there is more criterias here that I am not aware of
and objective arguments to prefer "FORTRAN" order, apart
from having more traditional [x,y] notation?
The argument I think comes from building/slicing matrices out of (column) vectors. You see this a lot in numerical work. If the row is of pointers, you can build sparse systems that reference underlying vector without doing any copying (you can do this with row data instead, but then you need row vectors, and that would be morally wrong). This is important since building sparse systems can be very slow if you're not careful.

I still avoid FORTRAN order because it's not mathy. E.g., the matrix element "a_{0,2}" should be accessed as "a[0][2]". For an objective argument, I'll note that graphics hardware--in particular VGA/VBE hardware, which influenced latter standards, e.g. HDMI--is row-major, top-to-bottom raster order. This has been hugely influential, and is more-or-less expected today by graphics programmers. It explains everything from most windowing systems today having GUI controls at the top and left, to why GL takes padded scanlines as texture input.

One way or another, at this point, changing the order in PyGame is probably a bad idea (backwards compatibility and suchlike). At the very least, it would needs to be deferred to a major update with breaking API changes.

So you kind of agree, that surfarray/pixelcopy should better deal with C order?

Definitely.

I am curious, if it is worth proposing adding methods which do so.
I agree, one should not touch the existing API.

Now I have tested the performance one more time, namely
comparing 3 variants to copy data from array to surface:
1.     buf = Dest.get_buffer()
      buf.write(Src.tostring(), 0)
2.     pygame.pixelcopy.array_to_surface(Dest, Src)
3.     pygame.pixelcopy.array_to_surface(Dest, Src.T)

And it turned out that I was wrong about transpose being expensive.
Actually transpose itself does not add significant overhead. First time
I was testing it, I did something wrong.

For method 2. if I define order="FORTRAN" for original array,
there is no difference in comparison to 3. But if I leave default (C)
order then the performance degrades with bigger arrays
(ca. 20% slower by 800x600 8bit array).
So it is indeed important thing.

Makes sense. For bigger arrays, caching becomes more important in the copying, and implicit transposes of the order mean you thrash on reading.

Most interesting that 1. method with buffer write seems to be always faster
than others, by ca. 5%. Not a big win, but still interesting...
And if I try it with FORTRAN order, it becomes 2 times slower!

I'm not sure I fully parse what you're doing here. As long as it's safe, copying buffers should be slightly faster since it's 1D--maybe the buffer API is smart enough to step in larger chunks that might potentially straddle a scanline, and you also have one fewer loop variable. When you try it with FORTRAN order, to produce a buffer of the same format would require an allocation and then a copy, so that's probably why it's slower.

The NumPy internals has salient things to say on this issue.

So I would still look forward to having methods dealing with C order,
just to avoid writing extra transposing and full compliance
with default numpy notation.

Any comments or opinions about it?
It would be good to know first, which of those things
people use more often and make some use case examples.

Personally, I would like C order just because it's "expected" in graphics*. Under this assumption, I wrote all my code e.g. looping over "y" first, using the buffer API for GL interop, etc. This is optimal in the C order every graphics programmer would expect, but in FORTRAN order, it's exactly wrong. I never profiled both options because it's a nearly fundamental assumption.

I mean, it's not terribly important. Python is not a fast language. One writes stuff in Python because your program running 5x/50x slower is a non-issue and you want the expressivity. But free perf is free, so it's a bit annoying.

*In the interest of fairness, it should be noted that there is an offshoot of image processing (a subset of graphics) that might disagree. They're very FORTRAN-y, using langs with 1-based indexing and both array orders. They also tend to be non-CS/non-math types who work in industry, generating appalling code.

Mikhail

Ian