[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Re: [pygame] Python and Speed



Hi!

    No, this is the place to discuss it because if we wish to make games,
work with existing platforms, and want speed, that is the way to go. Now
that we have had this discussion, and found solutions, now we have a list of
ways to resolve it.

    This is the place to discuss all of this and brings to the front the
issues of speed, connections, and over all solutions. The only way to make
Pygame better, faster and competitive with the world...

    Just like the question I had with the tts, text to speech, even though I
do not use the video end yet, I do use the sound end. So Ian and speed is a
very good question and take a look at what came of it below.

    I learn by doing, examples help because I get to use, tweak, and
eventually like many have done, come up with a better solution or firm
conclusion.

    For I now understand adding options into my setup.py file or now I call
it setup4tts.py or anything for any need...

        Bruce

From: "Richard Jones"
I think this is the wrong forum to be having this discussion :)

     Richard


From: Jason Ward


The way I speed up my python code is Ctypes.
I just make a dll file in C or asm and then call it with Ctypes and presto.
I have tons of speed at my fingertips.

Just my 2 cents :)



On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing <greg.ewing@xxxxxxxxxxxxxxxx>
wrote:
> René Dudfield wrote:
>
>
> > 2. - asm optimizations.  There seems to be
> >
> > almost no asm optimizations in CPython.
> >
>
>  That's a deliberate policy. One of the goals of CPython
>  is to be very portable and written in a very straightforward
>  way. Including special pieces of asm for particular
>  architectures isn't usually considered worth the
>  maintenance effort required.
>

Other, more portable, software has proved this to be somewhat wrong I
think.  Optional asm software is used in a lot of software today to
good effect.  Also this decision was made a while ago, and things have
changed since then I think.

- python now has unittests.  So testing that the asm code works, and
keeps working correctly is much easier.
- x86 is now very common.  Most mainstream server, and desktops use
x86.  So just targeting x86 gives you a lot more benefit now.
- SIMD instructions are the fast ones... so you don't actually have to
learn all that much to write fast asm - you only have to learn a
subset of asm.  You can get compilers to generate the first pass of
the function, and then modify it.  Of course writing the fastest
possible asm still requires effort - but it is fairly easy for a
novice to beat a compiler with SIMD code.
- advanced compilers can generate asm, which can then be used by worse
compilers.  eg, the intel compiler, or vectorc compiler can be used to
generate asm, and then be included into C code compiled by gcc.
- libraries of fast, tested asm code are available.  eg, from amd,
intel and others.
- python, and FOSS now has a much larger development community with
more asm experts.



>
> > CPython could use faster threading
> > primitives, and more selective releasing of the GIL.
> >
>
>  Everyone would love to get rid of the GIL as well, but
>  that's another Very Hard Problem about which there has
>  been much discussion, but little in the way of workable
>  ideas.
>

Yeah, not getting rid of the GIL entirely - but selectively releasing
it.  As an example, pygame releases the GIL around certain C
functionality like pygame.transform.scale.
Freebsd, and linux have also followed this method - adding more fine
grained locking where it is worth it - and improving their threading
primitives.  I think there has been work already in fixing a lot of
python threading issues in the last year - but there's lots more to
do.

I'm using python on 8 core machines for my work loads just fine today.


> > A way to know how much memory is being used.
> > Memory profiling is the most important way to optimize since memory
> > is quite slow compared to the speed of the cpu.
> >
>
>  Yes, but amount of memory used doesn't necessarily
>  have anything to do with rate of memory accesses.
>  Locality of reference, so that things stay in the
>  cache, is more important.
>

If you are using 200 bytes for each int, then you can quickly process
50x less data than an int that takes up 4 bytes.

If you have 1 gig of available memory, and say kjDict uses up half the
memory as a normal dict, a normal dict would use up 2gigs, and your
kjDict will use up 1gig.  In this case the kjDict would be massively
faster than a normal dict because of swapping.

I think memory is one of the most important areas in optimising a
program these days.  So python should provide tools to help measure
memory use(how much memory things use, and how things are allocating
memory).



>
> > perhaps releasing
> > a patch with a few selected asm optimizations might let the python
> > developers realise how much faster python could be...
> >
>
>  Have you actually tried any of this? Measurement
>  would be needed to tell whether these things address
>  any of the actual bottlenecks in CPython.
>

You can try it easily yourself - compile python with machine specific
optimisations(eg add -mtune=athlon to your gcc arguments).  You can
run this python binary, and get faster benchmarks.  This provides the
proof that more optimised assembly can run faster.

Also the link I gave to a commonly used memcpy function running 5x
faster should provide you with another proof of the possibilities.
Other software being sped up by asm optimisation provides another
proof(including SDL, pygame, linux etc).  The Pawn language's virtual
machine written in nasm is lots faster than the version written in C -
which provides another proof.  Psyco is another proof that asm can
speed up python(psyco is a run time assembler).

The idea is you only optimise key stable functions in asm - not everything.

For example in SDL the blit functions are written in asm - with C
implementations.  It's using the best tool for the job: Python for the
highest level, then C then asm.

I think a patch for CPython would need to be made with benchmarks as a
proper proof though - but hopefully the list above provides
theoretical proof to you that adding asm optimisations would speed up
CPython.



However the recompilation with cpu specific compiler flags would only need:
    - cpu detection code (widely available, eg in SDL and elsewhere).
    - python compilation and distutil modifications for each compiler,
eg gcc and visual studio, to generate different .so/.dll's on that
platform.
    - import changes so it loads the correct .dll for that machine.
eg.  if athlon: load("_array_athlon.so")

I've been wanting to add this to pygame distutil_mods.py for a while,
but of course have never had the time to finish it.

I think this is the easiest way to give the largest speed increase to
CPython.



>
>  > a slot int attribute takes up 4-8 bytes, whereas a python int
>
> > attribute takes up (guessing) 200 bytes.
> >
>
>  Keep in mind that the slot only holds a reference --
>  the actual int object still takes up memory elsewhere.
>  Slots do reduce memory use somewhat, but I wouldn't
>  expect that big a ratio.
>

ah ok.  The cgkit slots handle the attributes differently to the
python slots I think.  So it might allocate a structure with the
attribute details, and memory for the data.  It's a more memory
efficient way of storing stuff per class.  You only store the string
for the attribute name, the type in the class and memory index's, and
allocate memory for each instance.

eg, say you have 1 million objects...  Adding one python int attribute
would be 1000000 * 200 = 200000000 extra bytes(200MB).  Using a method
which only adds 8 bytes for the int slot-attribute is only 8MB extra.
Now if you add 4 attributes, then that is 800MB Vs 32MB.  It's an
extreme example, just to illustrate the point - but not an uncommon
use case.  Python object attributes just don't scale as well because
of the memory used.  Note the 200 bytes for a python object is just a
guess, it might be 100 or something I can't remember exactly.