… though you might be forgiven for thinking that at first 😉 . Why do so many things in computing have such weird names?
In my post about trees, I mentioned that having too many trees in the scene can make the game engine run pretty slowly, because each one contains a lot of polygons and vertices. This could be a problem for me because some of the areas of my game are going to be quite big and contain quite a lot of trees, and I want the game to perform reasonably well even on quite modest computers. So in this post I’m going to talk about some of the tricks that can be used to speed up the rendering of complex 3D scenes, which I’ve been spending a lot of time lately coding up for my game engine.
The first trick is a pretty simple one: don’t waste time drawing things that aren’t going to be visible in the final scene. This might seem obvious but in fact it’s quite a common approach in simple 3D graphics applications just to throw everything onto the screen and let the GPU (Graphics Processing Unit) sort out what’s visible and what isn’t. (My game engine as described in the earlier posts used this method). This still works fine because the GPU is smart enough not to try and draw anything that shouldn’t be there, but it’s inefficient because we’ve wasted time on processing objects and sending them to the GPU when we didn’t need to. It would be better if we could avoid as much of this work as possible.
This is where culling comes in. It refers to the process of removing items from the graphics pipeline as early as possible so as not to waste time on them. There are various methods of doing this, because there are various reasons why items might not be visible:
- They’re behind the viewer.
- They’re too far to the side to be visible.
- From the viewer’s point of view they’re completely hidden behind other objects.
The first two cases aren’t too hard to deal with. We can imagine the area of the world that’s visible to the viewer as a big sideways pyramid shape projecting out into 3D space (often called the view frustum), then we can immediately cull anything that falls completely outside of this pyramid, because it can’t be visible. The details of how this is done are quite complicated and involve projections and various different co-ordinate systems, but it’s reasonably efficient to do.
There are a couple of ways of making the clipping even more efficient:
- Instead of examining every vertex of an object to see if it’s in or out of the frustum, it’s common to work with the object’s bounding box instead. This is an imaginary cuboid that’s just big enough to contain all of the object’s 3D points within it. It’s much faster just to clip the 8 points of the bounding box against the frustum, and it still gives us nearly all same benefits as clipping the vertices individually.
- If you arrange your 3D scene in a hierarchical form (often called a scene graph), then you can cull large parts of the hierarchy with very little effort. For example, if your scene graph contains a node that represents a house, and various nodes within that that represent individual rooms, and various nodes in each room that represent the furniture, then you can start by clipping the top level “house” node against the frustum. If it’s outside, you can immediately cull all of the room nodes and furniture nodes lower down the hierarchy and not have to spend any more time dealing with them.
(The view frustum only extends a limited distance from the viewer, so it’s also common to cull things that are too far away from the viewer. However, if this distance is too short it can cause far away objects that should be visible to disappear from the scene).
The case where an object is hidden behind another object is a bit trickier to deal with, because there’s usually no easy way to tell for sure whether this is the case or not, and we don’t want to have to get into doing complicated calculations to try and work it out because the whole point of culling things in the first place was to try and avoid doing too many calculations! However, there are exceptions; indoor scenes are a bit more amenable to this sort of optimisation because (for example) if you’ve got a completely solid wall separating one room of a building from another, you know straight away that when the viewer is in the first room, nothing in the second room is ever going to be visible (and vice versa).
Sometimes, though, even when we’ve culled everything we realistically can, things still run too slowly. For example, imagine a 3D scene looking down from a hill over a big city spread out down below. There could be hundreds or even thousands of buildings and trees and other objects visible to the viewer, and we can’t just start removing them without the player noticing, but on the other hand it’s a hell of a lot of work for the computer to render them all. What can we do?
One other option is depth cuing. This involves using less detailed models for certain objects when they’re further away from the viewer. For example, I can instruct my tree generator code to use fewer vertices on the stems and trunks, and simpler shapes made up of fewer triangles for the leaves. This wouldn’t look good for trees close to the camera, because you’d notice the shapes looking less curved and more blocky, but for trees in the distance it’s not too bad.
MakeHuman can also use less detailed “proxy” meshes which would be an option for adding depth cuing to human models.
Full detail MakeHuman model (left), and with low resolution proxy mesh (right)
Ideally it’s better if we can generate the less detailed models of the objects automatically, but it’s also possible to make them manually in Blender if necessary.
In 3D graphics terms, billboards are a bit like depth cuing taken to the extreme. In this case, instead of replacing a 3D model with a less detailed 3D model, we replace it with a flat rectangle with the object “painted” onto it via a texture – just like a billboard!
Obviously this is quite a drastic step and it only really looks acceptable for objects that are pretty far away from the camera, but the speed improvement can be dramatic. We’re going from having to render a tree model that might contain thousands of vertices and polygons to rendering a single flat surface composed of 4 points and two triangles!
In fact, older 3D games used to make extensive use of “billboard sprites” – all of the enemies and power-ups in Doom were drawn this way, as were the trees and some other things in Super Mario 64. The downsides are that they can look quite pixellated and blocky close up, and also that (unless the game creators included images of the objects from different angles) they look the same no matter what angle you view them from.
Creating texture images for every object that we might want to turn into a billboard would be a lot of work, and the resulting images would take up a lot of space as well. Fortunately, we don’t have to do this; WebGL is quite capable of creating the billboard images on-the-fly when they’re required, using a technique called render-to-texture. Basically, this means that instead of drawing a 3D scene directly onto the screen like normal, we draw it into an image stored on the GPU, and that image can then be used as a texture when drawing future scenes.
That little pixellated tree was my very first attempt at a billboard sprite!
This is an incredibly useful technique. As well as making billboards, it can also be used for implementing things like display screens and mirrors in games, and some 3D systems use it extensively for doing multiple rendering passes so that they can do clever stuff with lights and shading. I’d never used it myself before, but once I’d coded it up for generating the billboards, I was pleased that it seemed to work pretty well.
Up close, it’s pretty obvious which tree is the 3D model and which is the billboard…
… but from a bit of a distance the billboard looks a lot more convincing
One potential problem with both depth cuing and billboards is known as “pop in”. This is the effect you sometimes see when you’re walking forwards in a game and you see a sudden visible “jump” in the scenery coming towards you, because you’ve now got close enough to it that the billboard (or less accurate model) being used for speed has been replaced by the proper 3D model. It’s difficult to get rid of “pop in” altogether, because no matter how good the billboard is, it’s never going to look exactly the same as the original model, even from quite a distance; but we can minimise it by using as good a substitute as possible and by only using it for objects a long way from the viewer.
Phew! That was pretty long and quite technical this time, but I’m really pleased to have got all of this stuff into the game engine and working. (It’s swelled the engine code up to a much larger 3,751 lines, but it’ll be worth it). I’ve tried to make it all as general as possible – there’s a mechanism in the code now for any object in the game world to say to the engine, “Hey, you can replace me with a 256×256 pixel billboard once I’m 20 metres away from the camera!” or “Here’s a less detailed model you can use once I’m 10 metres away!”, so it should be useful for speeding up all sorts of things in the future. Hopefully next time I should be back doing something a bit more fun… I haven’t quite decided what yet, but it’ll probably involve adding more elements to the game world, so stay tuned for that.
But why now?
You might reasonably ask why I chose to do all this optimisation work so early on in the project. After all, there were plenty of more interesting (to most people anyway!) things I could have been working on instead, like adding streets and buildings to my town. Also, the general advice given to programmers is not to get caught up in optimising code too early, because it complicates the code and because you might end up wasting your time if it turns out it would have run fast enough anyway. I had three main reasons for disregarding this advice:
- I already knew from similar projects I’d done recently that I was going to need these optimisations or the engine would be nowhere near fast enough.
- In my experience it’s usually easier to build fast code from the start than it is to try and “retrofit” speed to slow code later on. Some optimisations require a certain code architecture to work properly, and it’s not ideal if you find you’ve already written 10,000 lines of code using a completely different architecture.
Anyway, I’m happy. It’s all working now and the coding difficulty should hopefully be mostly downhill from this point onwards.