[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GTC: Status Report & Roadmap



Marcelo de Azevedo Wilbert Camelo  <camelo@led.ufsc.br> has asked me for a 
roadmap so I wrote one and here it is. I will put it on the web page eventually.

Sebastien Loisel -- McGill University -- Sun Microsystems
http://www.math.mcgill.ca/~loisel/

GTC 0.2 development plan
========================
Thu Jul 22 1999

This is a short overview of where we're going, where we're going and
what needs to be done. I will start each section by commenting on its
modularity and state. If you want to work on something which is not
very modular, you need to discuss API issues with me first, you will
also need a good understanding of the rest of the library. "High"
modularity means that most likely you can hack that right away without
telling me. "Medium" modularity means that you can hack that mostly
without telling me but there may be some constraints or dependencies
on other parts of the code which are easy to manage. For instance,
there's a suggested API for the object repository which hasn't been
done yet. "Low" modularity means that I'd rather do it myself, I
have specific ideas about how to do it. Low modularity items are
often very basic or, once preliminary work is done, will become more
modular. I would prefer if no one but me touched low modularity aspects
of the library.

Networking
==========

Modularity: VERY LOW.
Status: stable, mostly complete.

If you implement networking over alternate protocols, that can probably
be included modularly.

At this point in time, I've decided that lawyer/diplomat is a failure
and that gtcSpeaker is very good. I believe that the gtcSpeaker API
is mostly frozen. The main entry points are:

gtc_packet_make and gtc_packet_alloc:
The first makes a packet by copying data from a buffer, the second
merely allocates the packet.

gtc_speaker_put and gtc_speaker_get:
Put or get a packet from the speaker, respectively. If there are no
packets to get, gtc_speaker_get returns 0 (NULL).

gtc_connect_2_host:
Connects to the named host on the given port, returns a gtcSpeaker
or 0 on failure.

gtc_speaker_fds:
Adds the speaker's filedes to the fd_set used in select. It will not
add the speaker's filedes to the write fd_set if there is no outgoing
data in the queue.

gtc_speaker_read, gtc_speaker_write:
After select, call the first if select says there's stuff to read
and the second if select says there's stuff to write.

We also have a refcount garbage collector for packets, which I intend
to use in the next iteration of the client/server stuff.

As for the server, the gtcHost API is also very good and works well.

gtc_new_host:
Gets a new host to listen on the given port. You also get to choose
how verbose the host is (messages go to stdout).

gtc_host_fds:
Sets the filedes in the fd_sets given as parameter as appropriate
for the gtcHost and all its clients.

gtc_host_process:
Will process all fds returned by select that pertain to the host.

Scene graph (kennel)
====================

Modularity: medium.
Status: API is relatively stable, innards not quite as much so.

Maybe my naming scheme for scene graph is a little odd (kennel and
dogs?)

A scene graph is a description of everything in the virtual world.
A world is composed of several objects (nodes). A node can be a
simple mesh or a more complicated object (eg, a skeleton, a subworld,
a fractal object). Each node specifies its position with respect to
its parent node. The root node (the world) has no parent.

In the GTC terminology (which might need to be changed to make it more
usable), a "kennel" is a world and a node of the scene graph is a
"dog". (The actual data type for a dog is gtcDog.)

A dog has a 4x4 position+orientation matrix, some flags, some user
data, a pointer to a gtcMetaGfx object (which represents 3d objects
of all types, including meshes) and some internal data like a gtcBBox
used for collision detection.

The scene graph (kennel) is just a handy way for you to describe a
world to the GTC library. The GTC library will use it for collision
detection, networkin and other purposes. Having everything stored
in a scene graph makes it easy to automate many things; in particular,
networking is automatized. Eventually, we should also make it easy
to save a scene graph to disk and restore it.

A gtcMetaGfx is used to reference any type of 3d object. This is
an object-oriented interface. A skeleton needs a special rendering
function, different from that of a simple mesh. Therefore, a table
of necessary functions for each MetaGfx type is made. It is called
a gtcMetaGfxVTbl and currently it looks like this:

struct _gtcMetaGfxVTbl
{
  void (*put)(gtcMetaGfx *who, gtcReal where[4][4]);
  float (*getsize)(gtcMetaGfx *who, gtcReal origin[3]);
  void (*destroy)(gtcMetaGfx *who);
};

The put function is used to send the gtcMetaGfx down the
OpenGL pipeline. However, as specified above, the put function
will be insufficient for many applications, like a skeleton, which
needs more state information than just the position. Therefore, we
will need to pass a void *userdata with the MetaGfx-specific
information in it. Currently, the gtcMetaGfx structure looks like this:

struct _gtcMetaGfx
{
  void *data;
  gtcMetaGfxVTbl *f;
  int res_id; /* mainly for networking */
};

We can dynamically check the type of a MetaGfx at run-time by checking
where f points to. For instance, in the case of a simple mesh,
gtcMetaGfx.f will point to mgfx_mesh_vtbl, which is a (static) global
variable. It is static because neither me nor Quentin though of using
that for dynamic typing when we wrote metagfx.c, it needs to be
renamed, made global and declared (as extern) in gtcMetaGfx.h.

The data field is used for storing completely static information, such
as vertex coordinates or, in the case of a skeleton, perhaps a bone
hierarchy or something like that. A new field (perhaps called
dynamic_data) needs to be introduced (also a void *) for data which
is transparent to the kennel (scene graph) library but contains
object type specific information that can change (eg, which mesh
to bind to a skeleton or the position of the bones of the skeleton...)

Client/Server
=============

The generic client/server architecture is new but is already sort
of the way I want it. This aspect of the library is peculiar in that
the application programmer is meant to know as little as possible
about it.

The main idea is, the server knows how to make a binary encoding
of a scene graph and send it over the wire, and the client knows how
to take a binary encoding of a scene graph and display it.

There's a wrapper for kennel (the scene graph thing) that
will accept clients and send them the scene graph information as
required. This wrapper is exemplified in the current "duckserver"
and "gfxclient" demos; however, many features in
demos/duckserver.c will need to be cleaned up and moved to the library
side. The goal is to make demos/duckserver.c as oblivious to the
net as can be.

It is also important that the data be encoded in a platform-neutral
format. For instance, integers might be encoded as little-endian
numbers or possibly using some sort of variable length encoding,
for bandwidth reasons. The example code presently does not do this,
my priority was to get it to work for version 0.2.

First step
----------

Modularity: low.
Status: Partially done. Should be done in a couple of weeks.

The library side of the server will walk through the scene graph
(kennel) and make a binary encoding of it. Object ID's are not
transmitted, they are implicit in the ordering of the binary encoding;
that is, an object's ID is the order in which it occurs in the binary
encoding. The first object transmitted has object ID 0, the second
has object ID 1, the third has object ID 2 and so on.

This data is the same for all clients so it is only encoded once. We
can use the very same gtcPacket for all clients on the host, using the
refcount garbage collector.

Some per-client data will need to be encoded and sent. In particular,
the camera position has to be sent. For first-person games
(or spaceship games), the object in which the camera sits should not
be rendered.

Therefore, one small extra packet per client needs to be sent. It will
contain the position and orientation of the camera for that client,
as well as one (or more?) object ID for object(s) that should not be
displayed.

Some participation is required by the server side of the game. Given
a gtcDog (a node in the scene graph), it needs to be able to place
a camera.

The client also needs to be able to send information to the server.
At the moment, it grabs all keypresses (X events of type KeyPress
and KeyRelease) and sends them to the server. In addition, a special
event is sent after each glXSwapBuffers; this is used for flow control.

(NOTE: X11 is REALLY annoying. For some reason, someone somewhere
thought it would be smart to implement key repeat by inserting
bogus KeyRelease/KeyPress events in the event stream. If the X server
is local then these two events will be really close together and will
occur quickly enough that we won't be affected. However, if we blindly
send these events over the wire on some slow connection, it may be
that the server will think the player released the key for some long
period of time. This is very bad. For version 0.2, my temporary patch
was to do an XAutoRepeatOff. Unfortunately, and this is the most
ridiculous part of the story, XAutoRepeatOff is a global property
for the display -- your xterms lose their autorepeat! At home on my
machine, I currently have an alternate implementation which discards
KeyRelease/KeyPress events for the same key which occur inside the
same event processing loop without any other events in between.
Unfortunately that is a hack and will probably be useless for remote
X displays. If anyone has a fix for this, please share it with me.

Hypothesis: perhaps a good solution would be to set XAutoRepeatOff
when we get the focus and set XAutoRepeatOn when we lose the focus.)

On the server side, something has to be done to handle this user input.
At the moment, demos/duckserver.c directly reads the packets and
decodes the contents. I want to replace this by a callback in the
gtcKennelHost structure. There will be a callback for KeyPress and
another one for KeyRelease. glXSwapBuffer acknowledgement can be
handled transparently by the library so they will not be passed to
the game server program. In the future, other type of events will be
handled similarly, ranging from mouse clicks to joystick position and
all that garbage.

Second step
-----------

Modularity: medium-high.
Status: not even started.

After the first step is done, we need to start reducing bandwidth.
A simple mesh is now sent using a 32 bit "resource id" (to tell which
mesh it is) and a 4x4 array of floats for the position and orientation.

This can be drastically reduced. The resource id can be encoded using
a variable size coding (7 bits of the number, one bit to say if
there is more to come, 7 more bits of data, 1 bit to say if there's
more to come, etc...) The 4x4 array of floats is much overkill. In
most cases, the 3x3 orientation matrix will be a right handed
orthogonal matrix. In that case, we can encode it with three 12 or
16 bit fixed point numbers using quaternions (for instance).

The position vector can either remain 32 bit floats or can be replaced
with 16, 24 or 32 bit fixed point integers. In any case, floats pose
portability problems (we need to guarantee the client and the server
have the same binary encoding for floats) so we may need to make our
own float type. One easy solution is to have 16 bits of mantissa, 7
bits of exponent and one bit of sign; that's 24 bits per float and
should be just fine, except for very large scenes.

In the future there may be other types of objects in the scene graph
(eg, skeletons). Each object type will need some special state
information (which I called dynamic_data in my blurb on scene graphs
above) to be encoded as well. For that reason, an encoding
function will need to be added to the gtcMetaGfxVTbl (see the
section on scene graphs).

Third step
----------

Modularity: high.
Status: not started yet.

"Predictive coding". In many cases, a simple scheme used to predict
changes to the scene graph will result in massive performance gains.
For small scenes, we're not worried about performance. For large scene,
we can suppose that most meshes will remain stationnary, or perhaps
move at a constant speed.

Therefore, given the previous two frames, assuming a constant speed
and no changes in the other data (orientation or dynamic_data), we
can predict the next frame. The server should make that prediction and
check its veracity. Then, the difference is encoded and sent over the
wire. Each object can start with a single bit. If the bit is 0, the
object is as predicted and is not encoded. If the bit is a one then
the object is encoded in the stream.

Fourth step
-----------

Modularity: high.
Status: not started yet.

This step may or may not happen, for the following two reasons.
If we do this then each client will need to have the scene graph
encoded separately because each client will receive different
information. The second reason is that it requires a bit of cooperation
on the part of the game server program.

For each connected client, the game server program would be asked to
tag which gtcDogs need to be sent, and only those would be sent. This
provides a way to send very huge worlds to clients by only sending what
is close enough to be seen. This will eventually need to be done
if we intend to support large persistent RPG type worlds, I expect
those to have zillions of moving objects in the scene.

Collision detection
===================

Modularity: high.
Status: API 1/2 complete, implementation is complete. Alternate
implementations would be good. The remaining 1/2 of the API needs to
be done too.

I am implementing collision detection as a separate module which has
no dependencies with the rest of the library. Callbacks are used when
actual collisions are found, that kind of stuff.

As it is now, gtccollider.h has a very simple and useful API,
unfortunately it is not complete yet. Right now, all it does is it
takes a list of axis aligned bounding boxes and checks all pairs
to see if they collide. An output sensitive algorithm is used, meaning
that not all pairs need to be checked, a large proportion of them will
be proven not to collide. What I do is I sort the boxes along some axis
and split that axis down the middle. If the two groups do not overlap
along that axis then they can be processed independantly and we have
pruned many pairs. If they do overlap then more checking needs to be
done. This is done recursively and seems to do well for very large
number of boxes (like 1000). When you have less than 200 boxes, the
naive algorithm (which checks all pairs in the world) performs better,
mainly due to its very simple implementation.

The running time for both when less than 200 boxes are
present is very small so it should not matter which algorithm is
used in that case.

I did not exploit frame-to-frame coherence in my algorithm, even
though that is the most popular method right now. If someone is
willing to implement this, we could bechmark it, but I was hoping
not to use frame-to-frame coherence. I find it useful to be able to
switch scene very quickly without any overhead.

What has not been done yet is a mesh-to-mesh collision detection
algorithm. This needs to be done in a flexible way since we are
planning to use it for many types of weird 3d objects (eg, fractal
terrains).

Probably the easiest thing to do would be to use some sort of sphere
tree algorithm. In most cases that interest us, it is probably easy to
find a bounding sphere for an object (or piece of an object). Also,
dynamically splitting an object into two pieces and making a bounding
sphere for each piece seems doable for our objects of interest.

So perhaps we can use callbacks in collider.h to traverse the sphere
tree hierarchy for an object. That way, objects can either use
preprocessing to generate the sphere tree (eg for simple meshes) or
they can dynamically generate the sphere tree for more complicated
stuff. When we reach leaf nodes, each object has the option of either
generating a single triangle for that leaf node, or leave it as a
sphere. This allows us to have approximate collision detection or
collision detection for objects which do not really have triangles
for leaf nodes. For instance, for a given simple mesh, we may choose
to generate an approximate sphere tree which does not go all the way
down to the triangle level. Collision detection can then be performed
against that sphere tree to get very fast approximate collision
detection.

Other algorithms include AABB trees, OBB trees, swept sphere volumes
and also algorithms which aren't based on hierarchical bounding
volumes. Of those which do not depend on hierarchical bounding volumes,
most use something similar to Ming C. Lin's PhD thesis (as implemented
in I-Collide) which does collision detection for convex objects only
using frame-to-frame coherence by tracking pairs of closest features.
Other algorithms decompose complicated objects as a sequence of
differences of convex pieces (convex caps). All these algorithms are
often horribly complex to implement and do not always lend themselves
to other applications. For instance, David Hsu et al. describe a
probabilistic roadmap planner which depends on a flexible collision
detection algorithm. Eric Larsen et al. described an implementation
of that algorithm using Swept Sphere Volumes (which I believe is
implemented in V-Collide) (http://www.cs.unc.edu/~geom/SSV) They also
claim that SSV trees are better suited than the other types of trees
to motion planning.

If you work on this, I would ask you to make your code as independant
as possible from the rest of the library. For an example, see my
current implementation, which doesn't even refer to gtcSimpleMesh.

Object repository
=================

Modularity: medium, assuming the preliminary API below works.
Status: not started yet.

We need to have a small API which can fetch resources by name. If the
resource isn't loaded, it needs to be able to load it from disk. If
it is loaded, it just returns a pointer to the object. An API like
this would be ideal:

typedef void *(*gtcLoader)(char *res_name);
void *gtc_fetch_resource(char *res_name, gtcLoader loader,
	gtcRepository *r);

gtc_fetch_resource will get the resource from the given repository
(a map from res_names to void *'s). If the repository does not hold
that resource, gtcLoader is called to try to load it from disk.
loader will return 0 if it can't load it for any reason. In that case,
gtc_fetch_resource will also return 0. If loader succeeds, the object
is added to the repository with the given name. Internally, the
repository can initially be implemented using a linked list but a
hash table or a balanced binary tree will eventually be preferred.
If we're going to use a balanced binary tree, we should use the
implementation from GLIB, balanced binary trees are a bitch to
implement. A hash table can have its local implementation, they are
easy to do.

As an example, if you want to load a 3ds mesh, you might do something
like this:

gtcSimpleMesh *duck;
duck=gtc_fetch_resource("duck.3ds",gtc_3ds_loader,r);

If r already has that resource, then it will return a pointer to it.
If not, gtc_3ds_loader("duck.3ds") will be called. Presumably,
gtc_3ds_loader will look in the current directory for the file
"duck.3ds". If it finds it, it will load it and return a gtcSimpleMesh
holding all the information. If it does not find it, it will return 0.

3d object types and importers
=============================

Modularity: high (once the object repository is finished).
Status: barely started. The API isn't well defined but it's got
to be very simple so little feedback is required from me.

This is possibly the most modular part of the library. If you don't
know what you want to implement and want to start right away without
having to fight with me over the API, start by implementing crazy
object types like fractal meshes or subdivision surfaces or whatever.
Make sure they're wrapped with a gtcMetaGfx interface (very easy)
and you've extended the library. Simlarly, write functions to load
these object types, or write functions to import a simple mesh from a
foreign (eg, 3ds) format, that's very independant from the rest of
the API.

Many object types will also need to load textures or other objects.
These should be fetched from a gtcRepository, ensuring that resources
are shared as much as possible.

**************

Semi Independant Projects
=========================

There are a number of projects that can be spun off relatively
independantly. I list some here.

* Metaservers. Someone should code a meta-metaserver; a server
that you can dial to to get the IP address of game servers. Minimal
changes will be required to the server code when the metaserver is
working, so that the metaserver can interrogate the server to see if
it's up, number of players, etc...

* Audioconferencing. We need some way of letting the players speak
to each other. We should start with one of the several internet phone
programs. We would need a stand-alone server that can accept iphone
connections and deliver iphone packets to and from clients.
As a first go, we should make it work like a police radio thing,
where only one person at a time gets to speak on a given channel.

* Authentication/setup server. This guy will ask you for a login name
and password or whatever crap. If there are further choices, it can
present them to you and let you pick. If we get this to work, it would
also control connections to the audioconferencing and maybe even talk
to metaservers. This may require hooks into the server code at one
point.

* Modelling. We'll need a source of content. I've been speaking with
James Dean Palmer of Extreme Wave. It looks like Extreme Wave will
be our chosen modeller. It is very likely, however, that we will
have to make additions to the modeller to get content for some of our
more advanced graphics objects.

* Sound library. I do not intend for GTC to move into the sound
library world. However, it would be good if someone took a look at
what's out there. When we start making better games, we need to add
sound to them. We should choose one (or a few) libraries and support
them in our demos.

*************************************************************
To unsubscribe, send an e-mail to majordomo@gtc.seul.org with
unsubscribe gtc-dev       in the body. http://gtc.seul.org/