Sunday 31 August 2014

Rendering Post: Cascaded Shadow Maps

I'm going to talk about the very initial implementation of shadows that I've done for the engine so far. To be honest I'm not even sure if this exact method is actually used in games because I just sat down and coded something up over a weekend. But here goes! If you have never heard of cascaded shadow maps before an excellent starting point is here.

Because the engine I'm building only needs to support wide outdoor environments (it's an RTS) the most important light type I need to cater for is the directional light from the sun. So no omni-directional or spotlights need to be done for now. 

Let's outline the basics of the algorithm.
Part 1. Generate the shadow maps.
Bind framebuffer with render to texture for the depth buffer. Switch GPU to double speed z only rendering.
for each cascade 
1). Calculate the truncated view frustum of the main camera in world space, use this to determine the position, bounds, and projection matrix of the light. 
2). Render all relevant geometry of the scene into the shadow map.

Part 2. During the deferred lighting stage directional light shading pass.
for each pixel
1). Read corresponding depth, linearize and calculate the world space position of each pixel.
2). Project world space position into the current cascade's light space, and run the depth comparison and filtering against the depth stored in the shadow map. 

That's literally it. The beautiful thing about shadow mapping, as opposed to something like shadow volumes is the conceptual simplicity of the technique. But it comes at the expense of grievances like having to manage shadow map offsets and deal with issues like duelling frusta etc. Another benefit of this way of doing it (during the deferred lighting stage) is that you get free self shadowing that's completely generic over the entire world. If there's a point in the world, it will get shadowed correctly (provided your shadow maps are generated correctly as well). 

Now let's get into the details.

Calculate the view volume of the light for each cascade slice.
The most important thing you need to do here is ensure that all points that are enclosed in the frustum slice of the main camera will be included in the lights viewing volume. Not only that, but the lights viewing volume should extend from the camera frustum to the light's position itself so that any points that are outside the camera's viewing volume but potentially casting shadows onto the points that are accounted for.

This simple diagram should explain.

How I defined the light's viewing volume was to define the light's camera space, and then transform all points of the cascade slice frustum into that camera space. Which, since we're dealing with an orthographic projection, makes it easy to find the maximum and minimum extents along each axis (with the exception of the near z plane, which is always set to 1.0f. From that, you can create your light basis clipping planes and then, for each element in the world, if it's inside both pairs of clipping planes, then it get's rendered into the shadow map.

A note here on level of detail:
You want to make sure that the level of detail set for the mesh of terrain node you're rendering is based off of the main camera (not the light) to avoid artifacts. But depending on how you do things these may be tolerable and will certainly be faster.

Reconstructing the world space position.

If you read the previous article this would be an example of why you need to be able to correctly read the linear depth from the depth buffer. If you don't, your reconstructed world space positions will be swimming around and you'll notice horribly weird artifacts. A good debug view is to visualize the generated world space pixels normalized to the bounds of the current map (for me it was to calculate the world space position and divide by roughly 4000 (map size is 4km squared).
You should get something like this:



And you should note that as the camera moves around the scene the colors in the scene should not change... at all. That's the first sign you're doing something wrong.

The way I reconstructed the world space position was to calculate the vectors from the camera frustum near corners to the far corners. Translate those vectors into world space and then pass them as attributes to be interpolated over the screen during the full screen pass. From there you get the interpolated view vector and multiply it by the linear depth read from the depth buffer and you've got your world space position.

Doing the final lighting pass.
After you've got your shadow maps and you have your world space positions for each pixel, inside the final shader all you need to do is transform the pixel from world space into the light space and run your comparison. Here's the really basic fragment shader I used, there's a bunch of optimizations (and modernizations, running off of the old 1.1 GLSL spec) to do and the filtering is only the hardware PCF at the moment but it should be a good reference point to just get something on screen.

 #version 110   
   
 //#define DEBUG   
   
 /// Texture units.  
 uniform sampler2D albedoBuffer;  
 uniform sampler2D normalBuffer;  
 uniform sampler2D depthBuffer;  
   
 uniform sampler2DShadow shadowCascade0;  
 uniform sampler2DShadow shadowCascade1;  
 uniform sampler2DShadow shadowCascade2;  
 uniform sampler2DShadow shadowCascade3;  
   
 varying vec2 vTexCoord0;  
   
 /// Contains the components A, B, n, f in that order.  
 /// Used for depth linearization.  
 uniform vec4 ABnf;  
   
 /// World space reconstruction.  
 varying vec3 vFrustumVector;  
 uniform vec3 cameraPosition;  
   
 /// Lighting values.  
 uniform vec3 viewspaceDirection;  
 uniform vec3 lightColor;  
   
 uniform mat4 cascade0WVP;  
 uniform mat4 cascade1WVP;  
 uniform mat4 cascade2WVP;  
 uniform mat4 cascade3WVP;  
   
 void ProcessCascade0(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(2.0, 0.5, 0.5);  
     #endif  
   
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.0005;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade0, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void ProcessCascade1(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(0.5, 2.0, 0.5);  
     #endif  
   
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.001;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade1, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void ProcessCascade2(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(0.5, 0.5, 2.0);  
     #endif  
       
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.002;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade2, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void ProcessCascade3(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(2.0, 0.5, 0.5);  
     #endif  
   
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.0025;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade3, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void StartCascadeSampling(in vec4 worldSpacePosition)  
 {  
     vec4 cascadeClipSpacePosition;  
     cascadeClipSpacePosition = cascade0WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade0(cascadeClipSpacePosition);  
         return;  
     }  
       
     cascadeClipSpacePosition = cascade1WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade1(cascadeClipSpacePosition);  
         return;  
     }  
     cascadeClipSpacePosition = cascade2WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade2(cascadeClipSpacePosition);  
         return;  
     }  
       
     cascadeClipSpacePosition = cascade3WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade3(cascadeClipSpacePosition);  
         return;  
     }  
 }  
   
 void main()  
 {  
     float A = ABnf.x;  
     float B = ABnf.y;  
     float n = ABnf.z;  
     float f = ABnf.w;  
   
     // Get the initial z value of the pixel.  
     float z = texture2D(depthBuffer, vTexCoord0).x;  
     z = (2.0 * z) - 1.0;  
       
     // Transform into view space.      
     float zView = -B / (z + A);  
     zView /= -f;  
   
     // Normalize zView.  
     vec3 intermediate = (vFrustumVector * zView);  
       
     vec4 worldSpacePosition = vec4(intermediate + cameraPosition, 1.0);  
       
     vec3 texColor = texture2D(albedoBuffer, vTexCoord0).rgb;  
       
     // Do lighting calculation.  
     vec3 normal = texture2D(normalBuffer, vTexCoord0).rgb * 2.0 - 1.0;  
       
     float dotProduct = dot(viewspaceDirection, normal);  
   
     gl_FragColor.rgb = (max(texColor * dotProduct, 0.0));  
     gl_FragColor.rgb *= lightColor;  
       
     // Now we can transform the world space position of the pixel into the shadow map spaces and   
     // see if they're in shadow.  
     StartCascadeSampling(worldSpacePosition);  
       
     gl_FragColor.rgb += texColor.rgb * 0.2;  
 }  

The final result.

Here's a couple of screenshots:


Test scenario showing how the buildings cast shadow onto one another. 

Closer in details are still preserved with cascaded shadow maps. Self shadowing Tiger tank FTW!


Here you can see cascades 1-3 illustrated.


And here is cascade 0 included as well for the finer details close to the camera.
Just a thanks here to everyone from Turbosquid and the internet for the free models! :)

Friday 29 August 2014

Rendering Post: Linearizing your depth buffer and simple fog.

Right so enough about tools for now, let's do some rendering work. I hadn't really devoted much time to the renderer for the project, since I've been focusing on the back end and tools. But, tools can be boring to talk about and not terribly useful to the reader, so I added a couple of features to the engine. Let's get started.

Linearizing Your Depth Buffer
An important fundamental step to a lot of graphics techniques is to be able to read the depth of the current pixel from your depth buffer. Well... no shit, but you need to be able to do it properly. When sampling from your depth buffer in the shader, you get a value back in the range [0, 1], and you might assume that 0 is the near plane and 1 is the far plane distance.. but nope, there's a characteristic here that we need to be aware of. What we need is to understand the transformation that the z component of a vertex will undergo and how and why it's interpolated across the surface of the triangle the way it is.

Now here is where it get's potentially confusing. When it comes to the handling of z values, there are two places we get the result 1/z and they are not related. Firstly we use it when it comes to interpolating vertex attributes successfully during rasterization. The second is a consequence of the perspective divide operation.

Let's start with the first (I'm going to assume that the you guys are familiar with perspective, our perception of the world, and why we need projection to replicate it in computer graphic. If not, see here). You see, when we go from the abstract three-dimensional description of a triangle to the two-dimensional triangle that gets interpolated across the screen, we suffer from a loss of dimensionality that makes it hard to determine the correct values we need for rendering. We project the three-dimensional triangle onto a two-dimensional plane and then fill it in. But we need to be able to reconstruct the three-dimensional information from the interpolated two-dimensional information.

Here on the left is the standard similar triangles diagram that explains why you divide by z to shrink the x value the farther the point is from the camera. Easy stuff.


For this quick explanation let's assume that your near plane is set to 1.0, which means we can remove it from the equation. We have:

x' = x / z

which we can manipulate to get the equation

x = x' * z

which is great because it means that we can reconstruct the original view space x from the projected x point by multiplying by the original z value. So as we're interpolating x' during the rasterization process we can find z and recalculate the original x value. We'd be able to recover a 3D attribute from a 2D interpolation. But this assumes that you have z handy, which we don't. We need to find a way to interpolate z linearly across the screen. A linear equation is an equation of the form y = Ax + B

Ignoring what A and B's actual values are for the moment, and substituting for x and z we get:

x = Az + B

but x = x' * z, so

x' * z = Az + B

which. after some manipulation gets us

z = B / (x' - A)

Which is hardly linear.  But now for the magic trick. Take the reciprocal of both sides.

(1 / z) = (1 / B) * x' - (A / B)

Hey, that's linear!

So, we can interpolate z's reciprocal in terms of x' (we get x' by interpolating across the screen during rasterization). From there, we can just take the interpolated value, and get it's reciprocal again and we'll have z. And from that we can recalculate the interpolated x and y in 3D space. So we've overcome the dimension loss of 2D rasterization. This is applied to all attributes associated to a vertex, things like texture coordinates etc.

Secondly, when we read from the depth buffer after a perspective projection. We get an odd value back. We need to understand where it comes from and how it's reversed to recover the proper depth value. Look at the summary below:

We can see the journey of the vertex from model space to view space. Here it gets projected into a unit cube centered on the origin where clipping is performed (hence the name). As an aside, it's also here that the GPU will generate more triangles if it needs to from the clipping process.

I use the stock standard OpenGL transformation system, so right handed coordinate systems with the part of view space that will be seen in the negative z range. Looking at the symmetric perspective projection matrix definition for OpenGL we have the following:



Renaming the items in the third row to A and B and multiplying a 4D homogeneous vector
(x, y, z, 1.0) by this matrix, our vector would be

(don't care, don't care, Az + B, -z)

Two things to note: Az + B is still a linear transform (it's just mapping the z range to the range [-1, 1] with -1 being the near plane and 1 being the far plane). but w has become -z. The negation is just because in our right handed view space the visible portion is on the negative z axis.
When the perspective w divide occurs we'll have a value of the form:

(1 / z) * C

Where C is -(Az + B).  I'd also add that somewhere here they sneakily transform the C value from the range [-1, 1] to [0, 1] so the final value is probably:

(C * 0.5 + 0.5) / z

And then, depending on your depth buffer format, that value might be converted to an integer value. But for modern day cards it should generally be floating point, and it's all hidden away by the API for you.

Using this information we can reconstruct the proper depth when we read from the depth buffer.
Firstly, we need to calculate some values and pass them into our shader as uniforms.
We'll calculate:

     float n = camera.GetNearClipPlaneDistance();  
     float f = camera.GetFarClipPlaneDistance();  
     float A = -(f + n) / (f - n);  
     float B = (-2.0f * f * n) / (f - n);  
     shader->SetUniform4f("ABnf", A, B, n, f);  

And in the shader, we firstly reverse the offset from [-1, 1] to [0, 1], so we get the value back in NDC space. Then we apply the inverse of the C / z equation, which can be best understood by this diagram:


So there's our reverse formula that we use to go from normalized device coordinates back to view space and we're done! In GLSL this is:
 #version 110  

uniform sampler2D textureUnit0;  

 /**  
 * Contains the components A, B, n, f in that order.  
 */  
 uniform vec4 ABnf;  

 varying vec2 vTexCoord0;  

 void main()  
 {  
     float A = ABnf.x;  
     float B = ABnf.y;  
     float n = ABnf.z;  
     float f = ABnf.w;  

     // Get the initial z value of the pixel.  
     float z = texture2D(textureUnit0, vTexCoord0).x;  
     z = (2.0 * z) - 1.0; 
 
     // Transform into view space.      
     float zView = -B / (z + A);  

     // Get normalized value.  
     zView /= -f;  

     gl_FragColor = vec4(zView, zView, zView, 1.0);  
 }  

I negate it because, remember, in OpenGL's view space, visible z values lie in the negative z axis.
Here's a screenshot of what this outputs in a test level:



To sum up the findings.
1) The odd nonlinear values you read from the depth buffer HAVE NOTHING TO DO WITH PERSPECTIVE CORRECT INTERPOLATION.  They are caused entirely by the perspective divide caused by perspective projection.

2) There still needs to be an interpolation of 1/z somewhere in order to enable perspective correct interpolation of vertex attributes, we just don't have to care about it. Your vertices attributes go in on one end, and come out perspective correct on the other, you don't need to divide by z again.

Bonus Feature: Simple exponential fog.
So what are the uses of linear depth? Well as you'll see in posts in the future its really used all over the place, but for starters, I just coded up a simple exponential fog function. All it really does is get the normalized linear distance (a value from [0, 1]) and square it, with some artist configurable parameters of course like starting distance and a density multiplier. It's as simple a shader as you can get, and takes about 30 minutes to get integrated... ok maybe a bit more because I coded up the beginnings of the post processing framework at the same time. Anyways, the point is for a simple shader, the effects can be quite dramatic:


Useful resources:

http://www.songho.ca/opengl/gl_projectionmatrix.html
A beautifully comprehensive guide to the derivations of both orthographic and perspective projection matrices for OpenGL camera systems. Also where I grabbed the projection matrix image from.

http://chrishecker.com/Miscellaneous_Technical_Articles#Perspective_Texture_Mapping
All hail! The original and unsurpassed articles on perspective texture mapping. A little outdated now but if you're looking for the fundamentals of texture mapping and manage to survive them, they're still very informative. In fact, while you're at it, go read all of Chris Hecker's articles, he's up there with Michael Abrash in terms of ability to bring extremely technical topics down to a casual conversation level without dumbing it down.

http://www.amazon.com/Mathematics-Programming-Computer-Graphics-Edition/dp/1584502770
I actually have two copies of this book. One I used to keep at work and one for home.  Chapters 1-3 are what you need if you need an introduction to matrices, vectors, affine transformations, projections, quaternions, and so on. Integrates nicely with OpenGL engine development as all the conventions are based off of that API.




Saturday 9 August 2014

Tool Post: Static Mesh Compiler

Ok another post, and another tool to cover :)

We've done textures, so let's look at another asset type that we'll need to provide to the game.
Defined in the terminology of my engine, static meshes are essentially non-deformable meshes i.e. their vertex attributes won't change, they're uploaded to the GPU once and never modified again. You see examples of these in almost any game out there today. A rock mesh or a cup on a table, for example.

Let's think on the tool pipeline here. An artist/modeler is going to use his preferred tool, Blender, 3ds, or whatever to generate his mesh and it's going to spit out one of a variety of formats. So your tool has to have an extensible importer framework set up to handle it all. Right now, I have that set up but I've only written the importer for wavefront .obj. When I need another format it's easy to add it.

So, similarly to the Texture Compiler, the Static Mesh Compiler maintains an asset file, which contains imported static meshes in an intermediate format that allows for adding, removing, renaming, and re-positioning of the meshes. Then there is a publishing front end which takes the intermediate format database and outputs a platform optimized, compressed database that is optimal for the engine to load at runtime.

Level Of Detail
I need to branch away for a second and talk about level of detail and why it's important.
Level of detail is essentially a performance optimization. The key point is to not spend precious GPU time drawing something at it's highest possible fidelity when it's so far away you're not going to actually see and appreciate that detail; A simplified representation should suffice. Observe:


On the left we have your standard Stanford bunny representation, this one clocks in at 4968 triangles. On the right however you have a far simpler mesh, at 620 triangles. There's definitely a noticeable difference in quality here. But what happens when the camera is sufficiently far from the model?


Hard to see the difference right? At this point the objects contribution to the scene in terms of pixels output is too small to discern the detail. Voila, we don't need to care about drawing the detailed model any more.

It's worth mentioning that there are several different level of detail techniques. The first and most common, is discrete level of detail, whereby a chain of meshes is generated for every initial mesh, with each mesh in the chain being of a lower level of detail. This is quite common in games today and is characterized by being able to see a visible "pop" of a mesh as it transitions between detail levels. Secondly, continuous level of detail generates a progressive data structure that can be sampled in a continuous fashion at runtime. And thirdly, there exists view-dependent level of detail algorithms that take into account the angle of view that the mesh has with the camera in order to optimize detail. In addition, there are probably a few dozen different types that I'm missing out, notably, I'm interested in there being any level of detail techniques using the newer tessellation hardware available. For all of that, however, it must be said that discrete level of detail is still the most popular as the other techniques are either too slow at runtime or rely on the presence of special hardware in order to work and/or be performant. For instance, I know that one of the keystones of the game LAIR for PS3 was its progressive mesh technology, which required the Cell Processor and (probably) the ability of the GPU to read from system memory to be feasible. For our uses, we'll stick with good old discrete levels of detail. The only drawback to it is the potentially noticeable pop between LOD levels... and the added artist time to create all of the different levels of detail. Which leads to the next topic.

Automated LOD Generation Library
Another thing that would be ideal for our tool is some form of automated level of detail generation.
In an ideal world, maybe you'd have your artists go ahead and lovingly hand craft every single LOD level to be the optimum for a given triangle budget. But for scenarios like ours where you have one, maybe two artists available, you really want to take as much off their hands as possible. Luckily there are several libraries available that cater for automated level of detail generation. After some searching around, what I found most suitable for my immediate needs was GLOD (http://www.cs.jhu.edu/~graphics/GLOD/index.html), a powerful level of detail library developed by a team from The John Hopkins University and the University of Virginia. The thing that makes GLOD desirable for our needs is that it's designed to be closely integrated with OpenGL, for instance, it uses OpenGL vertex arrays as its unified geometry interface, meaning that it is by design capable of working with any per vertex attributes you would care to. If it works in OpenGL, GLOD can simplify/adapt it. In this respect, true to their design goals, it more closely resembles an extension to the OpenGL driver than a standalone LOD toolkit.
Also of benefit is that GLOD is separated into three modules, meaning you are free to use whatever subset of the provided functionality you need. We're just using it as a geometry simplification tool for our discrete level of detail code, however, it has additional level of detail features that I need to experiment with when I have the time.

The Tool Interface
Here's a screenshot of the main tool interface.


So, typical affair for UI, you have your list database view and publishing tool on the top left, the gizmo's for manipulating your mesh in space (good for view testing for the LODs etc) and on the bottom left you have the level of detail tool for manipulating the mesh levels of detail, you can set the individual levels distance to camera and its maximum triangle budget allowed (usually the tool spits out a mesh that's a few triangles under it, depending on circumstance). Of course, you can also move about the scene and set rendering options, like take a peek into the GBuffer, and see the wireframe and bounding volume or your mesh. For most of the tools that have a 3D scene widget that uses the renderer, those options are always available.

I should just add here that most of the tools I'll write about are under active development so they may get features added and/or removed and things may change. Which is great really because then I have new post material!