[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to add assembler?



Jeff Read wrote:

> > Assembler and direct buffers are both rather old-fashion nowadays. With
> > hardware blitters, there is no way you can do better.
> 
> I suppose, if all you're doing is blitting simple images or sprites to
> the screen. However, many 2D accelerators do not have support for things
> like alpha-blending, scaling, rotation, shearing, distortion, lighting
> effects. If you had a 3D card these could be accelerated, but as yet
> there's no real elegant way to use a 3D accelerator's functions in a 2D
> context, e.g., in a window or framebuffer. I know, for example, that
> StarCraft and Jazz Jackrabbit 2 do not use hardware accel for much of
> their graphics (which often involve translucency and/or lighting).-

This is a big problem we are having with 2D graphics. There is *no* easy
way out.

For example, writing directly to a DirectDraw surface in video memory on
a Riva TNT card is *very* slow. Every programmed I/O access to the video
memory of this card causes the accelerated pipeline to be flushed. A
single blit going thru the accelerated pipeline is much faster. This is
a case that will probably very often encountered with many next
generation cards.

On the other hand, with "regular" cards, writes to video memory only
stalls (and thus take time) only when another write is going on. Thus,
it pays off to do effects "in place" in the blit, using the time that
would otherwise be lost to bus stalls in doing the calculations for the
effect. Doing the effect in main memory then blitting to video memory
would be a disaster.

For 3D video acceleration, which is an rather new domain, nothing is
spared in effects done in hardware, since 3D is very expensive to do
(for the main CPU) and when you start doing it in hardware, most effects
cannot be added by software, they have to be done by the hardware.

But 2D video acceleration is often very "lazy", as most effects not
supported by the card can be easily done in software, slower, of course.
When the hardware supports it, the libraries and drivers often don't
(accelerated filled rectangle is very new to DGA for example, and this
is a very small feature). And even when both the hardware and
library/driver support is there, game developers are wary of directly
using functions in Xlib for example.

But you do not *have* to use assembler for this reason. X11 give you
*no* real access to a video memory buffer, it is impossible (except with
DGA, but this is rather different), so you *will* have to do a separate
pass for the SFX and blit to video *anyway*.

And when you *do* have access to a real video memory buffer, why do you
want to use assembler? For bounded addition using MMX? You could still
get by simply with libmmx and regular C. Instruction scheduling with
newer generations of processors is quite a mess to program, are you
*sure* you can do better than a compiler? If you can, great, but when
there are great such libraries like Hermes, you'd be better off using
them and improving the Hermes library itself. And by isolating the
actual effect and treating them as black boxes of "accelerated"
functions, they are then easily switched to actual hardware acceleration
when available.

I'd say this is one of the biggest difference between DOS/Windows
development and Unix/Linux development. DOS people do their own
optimized assembler routines, and they are great. Unix people contribute
to a great optimized assembler routines library and make it *excellent*.
And share it.

If you take SDL and think it isn't fast enough and you can do a better
blit routine, don't start your own project, just improve the SDL blit
routine! Sometimes, goals are different and contradictives, but when all
you want is a faster blit routine, we can do that without forking off a
new project, can we?

-- 
Pierre Phaneuf
http://ludusdesign.com/