[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [pygame] Faster OBJ loader



On Tue, Sep 28, 2010 at 2:05 PM, Ian Mallett <geometrian@xxxxxxxxx> wrote:
On Mon, Sep 27, 2010 at 8:05 PM, Christopher Night <cosmologicon@xxxxxxxxx> wrote:
I was able to improve the rendering performance significantly using vertex arrays in the test I did a few months ago. I was still using a display list as well, but I greatly reduced the number of GL commands within the display list. The trick was to triangulate all the faces, and render all the faces for a given material using a single call to glDrawArrays(GL_TRIANGLES...). I realize this is hardware-dependent, but the speedup was dramatic on 2 out of 2 systems that I've tried. Maybe it's not the vertex arrays that matter, and triangulating all the faces and using a single call to glBegin(GL_TRIANGLES) would yield the same speedup. Either way, it's worth looking into I think....
You must not be using the calls as you think you are, then.  Graphics cards have a "graphics bus" that handles data transfer to and from the card.  Unfortunately, this graphics bus is slower than either the CPU or the GPU themselves. 

Fixed function (glVertex3f(...), glNormal3f(...), etc.) sends the data to the card on each call.  So, if you have 900 such calls, you send 900 state changes to OpenGL across the graphics bus each time the data is drawn.  This can get slow. 

Vertex arrays work similarly, except the data is stored as an array, and the equivalent of the 900 fixed function calls are sent across the graphics bus each frame.  Although this batched approach is faster than fixed function, all the data is still transferred to the card each time the data is drawn. 

Display lists work by caching operations on the graphics card.  You can specify nearly anything "inside" a display list, including fixed function (and I think) vertex arrays.  To use display lists, you wrap the drawing code in glGenLists()/glNewList() and glEndList() calls.  The code inside, after being transferred to the GPU, is stored for later use.  Later, you can call glCallLists(), with the appropriate list argument.  The list's NUMBER is transferred to the GPU, and the relevant set of cached operations is executed.  The practical upshot of all this is that each time you draw the object, you pass a single number to the graphics card, and the appropriate cached operations are executed.  This is the way the Wiki .obj loader works.

Excellent, thank you very much for the explanation. You're absolutely right that I'm likely to not be using the calls like I think I am. :-)

I understand now that no matter what's in the display list, only a single number is passed to the graphics card, so there can't be any optimization on the outside. However, wouldn't it be possible for some display lists to execute faster within the graphics card than others?

Attached below is a script that demonstrates what I'm talking about. It should render a torus repeatedly for 60 seconds. First without vertex arrays (the way the objloader on the wiki does it), and second with vertex arrays. For me the output is:

Without arrays: 58.0fps
With arrays: 140.0fps

It takes several minutes to run, because the torus has a huge number of faces it has to generate. I had to do that to get the framerate down. Anyway, this is the kind of test that suggests to me that vertex arrays might help. Do you see something wrong with it?

As for VBOs, I know I should learn them. If I can figure them out, and I can get the same performance from them, that would be preferable. However, that's just for the sake of using non-deprecated techniques: for an OBJ loader, there wouldn't seem to be much need for dynamic data.

-Christopher


import pygame
from pygame.locals import *
from math import sin, cos, pi
from OpenGL.GL import *
from OpenGL.GLU import *

tmax = 60.  # Time to run each test for

# Generate the torus faces
nx, ny = 600, 400
xcos = [cos(x * 2 * pi / nx) for x in range(nx+1)]
xsin = [sin(x * 2 * pi / nx) for x in range(nx+1)]
ycos = [cos(y * 2 * pi / ny) for y in range(ny+1)]
ysin = [sin(y * 2 * pi / ny) for y in range(ny+1)]

def coords(x, y):
    return xcos[x] * (2 + ycos[y]), ysin[y], xsin[x] * (2 + ycos[y])
def normals(x, y):
    return xcos[x] * ycos[y], ysin[y], xsin[x] * ycos[y]
faces, vlist, nlist = [], [], []
for x in range(nx):
    for y in range(ny):
        vs = (coords(x,y), coords(x+1,y), coords(x+1,y+1), coords(x,y+1))
        ns = (normals(x,y), normals(x+1,y), normals(x+1,y+1), normals(x,y+1))
        faces.append((vs, ns))
        for v in vs: vlist.extend(v)
        for n in ns: nlist.extend(n)

for usearray in (False, True):

    # Initialize pygame and OpenGL
    pygame.init()
    pygame.display.set_mode((640, 480), DOUBLEBUF | OPENGL)

    glLightfv(GL_LIGHT0, GL_POSITION,  (10,10,10, 0.0))
    glLightfv(GL_LIGHT0, GL_DIFFUSE, (0.5, 0.5, 0.5, 1.0))
    glEnable(GL_LIGHT0)
    glEnable(GL_LIGHTING)
    glEnable(GL_COLOR_MATERIAL)
    glEnable(GL_DEPTH_TEST)
    glEnableClientState(GL_VERTEX_ARRAY)
    glEnableClientState(GL_NORMAL_ARRAY)
    glShadeModel(GL_SMOOTH)

    glMatrixMode(GL_PROJECTION)
    glLoadIdentity()
    gluPerspective(45.0, 4./3., 1, 10000.0)
    glMatrixMode(GL_MODELVIEW)

    # Generate the display list
    gl_list = glGenLists(1)
    glNewList(gl_list, GL_COMPILE)
    glColor3fv((0, 1, 0))
    if usearray:
        glVertexPointer(3, GL_FLOAT, 0, vlist)
        glNormalPointer(GL_FLOAT, 0, nlist)
        glDrawArrays(GL_QUADS, 0, nx*ny*4)
    else:
        for verts, norms in faces:
            glBegin(GL_POLYGON)
            for v, n in zip(verts, norms):
                glNormal3fv(n)
                glVertex3fv(v)
            glEnd()
    glEndList()

    playing, jframe, t = True, 0, 0
    clock = pygame.time.Clock()
    while playing and t < tmax:
        dt = clock.tick() * 0.001
        t += dt
        jframe += 1
        isquit = lambda e: e.type == QUIT or (e.type == KEYDOWN and e.key == K_ESCAPE)
        if any(isquit(e) for e in pygame.event.get()): playing = False

        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
        glLoadIdentity()
        gluLookAt(5*sin(t),5*cos(t),5,0,0,0,0,1,0)
        glCallList(gl_list)

        pygame.display.flip()

    glDeleteLists(gl_list,1)
    pygame.quit()
    print ("With" if usearray else "Without") + " arrays: %.1ffps" % (jframe/t)