Re: [pygame] Cheap COW clones for surfaces

On Sun, 17 Jun 2018, 19:47 DR0ID, <dr0id@xxxxxxxxxx> wrote:

On 17.06.2018 17:48, Daniel Pope wrote:

I have been thinking for some time about how to optimise Pygame Zero games on Raspberry Pi. Most Pi models have multiple cores and an obvious improvement is to parallelise. Logic has to be synchronous but I would like to offload painting the screen to a separate thread.

The problem I would have is in passing references to mutable objects between threads. Primarily this applies to Surface objects, which are mutable and expensive to copy. If I have a draw thread that maintains a queue of screen blit operations, I want the queue to hold references to surface data that won't change even if I later mutate the surfaces in the logic thread.

If I was writing a game myself, I could work around this easily by taking copies of surfaces that I want to mutate again in a different thread. But as a game framework I don't know how users will use the surfaces in their code. I would have to copy everything in case the logic thread updates it - and that is unacceptable overhead given that I expect mutation to be very rare.

A solution could be copy-on-write clones for mutable objects. For example, Surface.__copy__() would return a new PyObject referring to the same SDL surface, incrementing a refcount for that surface data. All mutation methods would create a single-user copy of the surface if the refcount was higher than 1. Code that doesn't use clones would pay a small overhead in performing refcount checks. Refcount checks could be guarded by the GIL.

This solves my problem, as I can then clone every surface when passing it to a draw thread, and this becomes a very cheap operation. If the logic thread does mutate a surface it creates a copy in that thread.

How feasible is this? Are there other applications that could benefit from it? I could do it in Pygame Zero by wrapping all Surface objects in a Python class but this makes Pygame Zero a thicker and slower layer around Pygame (the goal is to be "training wheels" for Pygame) and creates more pitfalls when users dig into Pygame proper.

Hi there

I think if you truly want to use the other cores on your cpu then you will have to use processes to achieve this (unless you use threads from a c-extension, like threads in SDL itself or similar). Or another python runtime with real threads.

Either way this approach should work with threads or processes:

One other way to offload rendering to another core would be to separate the rendering logic. This would mean you will have user written render logic running on that other core/thread and the game logic on another core/thread. But then the communication is sending the current game state to the render logic. So the main problem would be to efficiently store and send the game state (you will have to serialize the game state some how I guess if you use processes, for thread at least copy it I guess). Keep in mind that you need one process to control the other sub-processes, but that should be fairly easy. Next thing to consider is that now your input in caught by the render process which actually acts as client as in a client-server setup. So you have to send the input tot he logic executing process back.

I have never used this approach in a game, although Gummbum and I have experimented with rendering in another process (render and game logic performance was very good, it might introduce some extra lag). Also I'm not sure how to store the game state and to be able to send (probably a copy) to the rendering process/thread efficiently.

Here a simple diagram showing what I tried to describe (pygame is only needed in the client, but for rect speedups you can use it in game logic too of course).

There might be more pitfalls I haven't seen yet. Frankly this is a fair advanced design and I'm not sure how good it is suited for pygame zero.

~DR0ID

PS: I have attached the graphics source so you can extend and elaborate on it. It was created with https://www.draw.io/