[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [pygame] Re: Python optimization help



On 1/20/2012 1:09 AM, Weeble wrote:
> Step 1: measure!
> 
> Unless you measure, you can't tell what's costing you time here. It could
> be rendering the circles. It could be processing the physics. It could be
> overhead in interprocess communication. Both of the latter appear to be
> O(N^2), and the communication costs cannot be parallelized. However, it
> does seem likely the calculations are the significant cost unless you have
> a ridiculous number of cores and amazing memory bandwidth.
> 
> Your calculations look reasonably amenable to use of numpy. This will do
> the numeric calculations in well-optimized C code. The for-loop in attract
> is embarrassingly parallel and shouldn't be hard to convert to an
> array-based form. In addition, since numpy releases the GIL during most
> array operations, you might find that multi-threading is good enough, and
> if you do go down that route you could avoid some of the cost of
> inter-process communication.
> 
> It might be obvious, but pick a number of workers that does not exceed your
> available number of cores. There's no value in having the workers fight
> each other for time on the same cores. multiprocessing.Pool will by default
> pick a number of workers equal to your number of cores, so you probably
> don't need to override it.
> 
> If after all that it's still not fast enough, I suspect you'll need to go
> to GPU-based solutions. Given the nature of the problem, I'd imagine you
> could get a good further speed boost out of this route, but you may well
> need to spend days or weeks getting familiar with these technologies.
On 20.01.2012 04:45, Robert Xiao wrote:
> If you are CPU-bound, extra threads won't help in Python due to the GIL (it limits an entire Python process to one core, even with multiple threads). 
He doesn't use threads though, he uses multiprocessing, which makes your
point somewhat moot. Also, from my own experiments, multiple threads DO
use multiple cores, they just interfere with (i.e. block) each other due
to the GIL. But you're right in that also multiple processes must
compete for the available processing power, even if they do that more
efficiently than threads in Python.

It would be nice, when you post code for others to examine, to

a) remove any unused code (i.e. "Consumer" and "Task"),
b) remove comments that refer to old versions of the code,
c) include a clean way to exit the program and
d) use more descriptive variable names.

Bad examples: "x", "lastx", "itemlist", "result" (when you really mean
"gravitars"), "i" in the for loop in "attract" - "body" or
"other_gravitar" would have been better here.

> Oddly enough, lowering the amount of extra workers in the pool results
> in a higher speed?
How do you measure speed? Where's your FPS counter? Have you timed your
"attract" function with the timeit module?

Your example uses 12 workers but your code is totally CPU bound. How
many CPUs / cores does your computer have? Adding workers doesn't
magically add new cores  Communication between the main process and
the worker processes adds some overhead as well as process switching
(the latter should be very fast on Unices though), so even if your
system had 12 cores, the speed wouldn't scale linearly from 1 to 12 cores.

One way to speed up your program without resorting to
hardware-acceleration would be to implement your main calculation
function "attract" in C with the help of Cython (cython.org). A little
static type information added can sometimes do wonders.

Finally a few oddities in your code, which aren't necessarily speed related:

- Why do you use pool.map_async() when you then use the blocking
res.get() immediately anyway? Just use pool.map().

- Your way of constructing the list of arguments to the workers isn't
very clear with the use of the self-referencing list. Also, passing the
full list with each job in each iteration certainly adds some overhead
(though probably negligible compared to the time spent in the "attract"
function). Consider using a shared state for this list via
multiprocessing.Manager().

- The "genlist" function seems a bit pointless. Why not create a
two-item list directly in your __main__ statement block and maintain the
list item counter "x" also there? No need for ugly globals.

    gcount = 2
    gravitars = [Gravitar(i) for i in range(gcount)]

- The use of (screen) "width" and "height" as global variables in
"Gravitar" and "attract" is unclean and prone to subtle errors in a
multiprocessing environment. I would introduce a world object, which is
passed to "attract" (or accessible via a shared state).

Attached is a IMHO much cleaner, but not yet optimized, version of your
script.

HTH, Chris



Thanks, this will help me. I managed to getnumpy installed on my
haphazard system, so now I'm going to learn how to time stuff properly.