Sunday, 7 December 2014

Rendering Post: Stable Cascaded Shadow Maps

I stopped the development of the sun shadowing system after the basics were implemented, my thought process at the time being that I just wanted to get something up on screen that could be improved later. But there's only so long that you can see an ugly, artifact filled graphics feature before it gets too much, so I set aside some time to bring the system up to an acceptable standard.

The key issues that needed to be solved were stopping the shadow maps from shimmering, solving the 'shadow acne' problem, and get some decent filtering on the shadow edges. I'll talk about each of them quickly, but if you're hungry for more check the references section for all the details.

Shadow Shimmering
As the camera moves around the world, the frustum changes with it. And since the shadow cascades are built to enclose slices of frustum, as it moves they can vary dramatically with regards to their size. This in turn means that the way a piece of geometry is rendered into the shadow map will vary from frame to frame. What this manifests itself as is a shimmering effect on the shadow edges as the camera changes its position or orientation. Increasing the resolution of the shadow map helps, but doesn't rid us of the problem. I believe the best known solution to this is Michal Valient's approach. He solves the two issues of orientation changes and position changes of the frustum. Firstly, to stop the size of the shadow cascade bounds from changing as the frustum orients around, he wraps the frustum slice in a sphere (a sphere being a rotationally invariant bounding volume). This sphere can then be projected into light space and the bounds can be calculated off of it's position and radius. Secondly, to stop position changes of the camera from causing shimmering, Valient proposed that you snap the projection bounds in light space to shadow map texel sized increments. This means that as the camera moves, and the cascade bounds in light space move, any geometry that's rendered will be offset in texel sized increments, meaning no sub pixel shimmer will be encountered.

Here's the code I use to calculate the min/max bounds of the projection volume.
 float shadowMapResolution = LightManager::GetInstance().GetShadowMapsResolution();  
   
 // Calculate the view space extents of the frustum points.  
 float f = (frustumEnclosingSphereRadius * 2.0f) / (shadowMapResolution);  
   
 MathLib::vector4 centerViewSpace;  
 MathLib::matrix4x4_vectorMul(worldViewMatrix, frustumEnclosingSpherePosition, centerViewSpace);  
   
 float minX = centerViewSpace.extractX() - frustumEnclosingSphereRadius;  
 minX = floor(minX / f) * f;  
   
 float minY = centerViewSpace.extractY() - frustumEnclosingSphereRadius;  
 minY = floor(minY / f) * f;  
   
 float viewportExtent = floor((frustumEnclosingSphereRadius * 2.0f) / f) * f;    // Ensure view point extents are a texel multiple.  
   
 float maxX = minX + viewportExtent;  
 float maxY = minY + viewportExtent;  

If you read my previous post you'd see that in the diagram that I was using to explain the cascade idea, I was using several light basis. Essentially what I was doing was creating a new virtual shadow position offset from the mid point of the cascade along the reverse direction of the sun light direction.
This is blatantly WRONG! In order to have stable cascaded shadow maps, you need one fixed light position that does not change.

Shadow Acne
Shadow acne occurs because an entire area of the world can be assigned to one shadow map texel with one depth value. When you are rendering the world space points that map to that texel, the value is just as likely to be in shadow as not. The diagram explains it better:


What's happening here is that when the scene is rendered from the shadows perspective, it renders depth 0 into a particular texel of the shadow map. When it comes time to do the in-shadow test for points 1 and 2, you can see that both of them map to the same texel of the shadow map. Point 1 (and any points above it) is closer than point 0 so it is lit. Point 2 (and any points below it) are farther away than point 0 so are shadowed. This is what results in the shadow acne artifacts that you encounter. Increasing the shadow map resolution won't help with this, unless you could increase resolution to be 1:1 or more for every pixel on the screen.

In addition to acne, precision issues that result from the nature of depth buffers themselves (non-linearity, finite precision) can result in incorrect shadowing results when you perform the shadow comparison directly.

The solution to these issues are a set of "bias" techniques that you can apply to the depth values of the shadow map, either during rasterization of it or during the test against it. There is no singular solution, rather an multi-front attack has to be made. Firstly, a constant bias applied to the shadow test acts as an error margin in favour of the pixel being lit, which helps to compensate for precision issues. Essentially don't shadow unless you're farther away than the stored depth + bias. Simple enough, but doesn't help when the slope of the geometry is such that a larger bias is required, and setting the constant bias too large will result in peter panning of the shadows, whereby they become detached from their geometry in the scene. What we need is a slope aware calculation. This MSDN diagram shows what we need perfectly:



The more perpendicular the surface normal is to the vector from the point to the light (in our case as it's a directional sun light this is the negation of the light direction), the higher our bias needs to be. So if the two vectors are the same, no bias need be applied, but if they are at 90 degrees the bias needs to be essentially infinite as the light direction and the tangent plane of the surface are parallel in that case. In practice, we'll clamp this to some value however. The trig function that is perfect for this is, of course, tan() as it has asymptotes that extend to infinity at 90 degrees. This post by Ignacio Castano has a nice way of calculating this using the dot product and some basic trig identities:

 float GetSlopeScaledBias(vec3 N, vec3 L)  
 {  
     float cosAlpha = clamp(dot(N, L), 0.0, 1.0);  
     float sinAlpha = sqrt(1.0 - cosAlpha * cosAlpha);     // sin(acos(L*N))  
     float tanAlpha = sinAlpha / cosAlpha;            // tan(acos(L*N))  
     return tanAlpha;  
 }  
   

The third technique I read up on was Normal Offset Shadows (related), which involves offseting the shadow receiver vertex along its normal to avoid acne problems. This is a smart idea and works really well but I couldn't really use it because I don't render in a forward fashion. By the time I get to the shadowing stage all I have context of are pixels, and the normals stored in the gbuffer are not geometric surface normals but normals that could have come from a normal map, so it wouldn't work.
This did give me the idea to offset geometry whilst rasterizing the shadow map, however.
In the vertex shader of the terrain nodes shadowing pass, I offset slightly along the normal of the vertex but only use the y component of the normal. This is to avoid generating gaps in the shadow maps in between terrain nodes. It's hacky, but it works pretty damn well and saves me from having to ramp up my standard bias values to compensate for large terrain variations.

Shadow Filtering
This is dozens of different ways to approach shadow filtering, ranging from simple PCF box filters, Gaussian weighted filters, or rotated Poisson disk filters, to more advanced methods like Variance Shadow Maps and Exponential Variance Shadow Maps. For me right now, a simple adjustable n x n box PCF filter looks pretty good, but I'll revisit this at a later time I'm sure.

For cascaded shadow maps, you are provided with some nice flexibility in that you can adjust the complexity of the filtering method based on the cascade that the current pixel is in. This allows you to put the best looking but slowest filters close to the camera. You just have to be careful that the viewer doesn't notice the transition between cascades, and I know that several engines filter the boundaries of the cascades to hide any harsh transitions.

Demonstration Video
Of course, now post is complete without screenshots and/or a video!



Additional References

dice.se/wp-content/uploads/GDC09_ShadowAndDecals_Frostbite.ppt

ams-cms/publications/presentations/GDC09_Valient_Rendering_Technology_Of_Killzone_2.pptx

Valient, M., "Stable Rendering of Cascaded Shadow Maps", In: Engel, W. F ., et al., "ShaderX6: Advanced Rendering Techniques", Charles River Media, 2008, ISBN 1-58450-544-3.

http://mynameismjp.wordpress.com/2013/09/10/shadow-maps/

http://msdn.microsoft.com/en-us/library/windows/desktop/ee416324%28v=vs.85%29.aspx

http://developer.amd.com/wordpress/media/2012/10/Isidoro-ShadowMapping.pdf






Sunday, 14 September 2014

Rendering Post: Terrain

This was actually the first rendering subsystem I worked on, you can't have the beginnings of a strategy game without a terrain to drive vehicles over can you? Nope.
I was being lazy about the whole thing, thinking perhaps I could get away with using the brute force approach and rendering a 1km x 1km map as just a single huge mesh, and while that could be feasible for a small game, it's not really practical for the kind of game I wanted to try build. Also, doing it the brute force way just isn't impressive enough to write a blog about!

So there's a few sides to the terrain rendering system. The first is the geometry system. And the second is the surface texturing system and lighting, and the third perhaps could be the level editor and terrain modification system.

Geometry System
When it comes down to rendering terrain geometry , there are a wealth of available lod algorithms to use, but care needs to be taken when choosing which one to use as several of them were invented at a time when hardware characteristics were very different from what you can expect to find in a modern system. I'm not going to rewrite a summary of all of the other terrain geometry algorithms when such a good reference page exists here but rather talk about what I felt was necessary for the system I built.

What I needed was an algorithm that:
1). Has minimal CPU overhead. Almost all of the work must be done on the GPU, and the GPU should never be stall waiting for the CPU to hand feed geometry to it.

2). Requires no advanced preprocessing phase, and supports terrain deformation. It should also preferably not have any visible seams or cracks between nodes that need to be filled in, in other words the resulting geometry should be continuous.

3). Requires little or no per-frame memory allocations, and who's overall memory usage is minimal.
Overall memory usage is easy to understand, but the per-frame allocations is especially important as I haven't gotten around to writing a memory manager yet so all of my allocations are through the new operator. That's bad enough as it's a kernel mode call, but eventually you'll also start running into fragmentation problems as well. So minimizing my heap memory usage right now is paramount.

4). Is (relatively) easy to implement. Duh, it's just me writing this thing, and I have a dozen other things to get around to as well ;)

The algorithm I eventually decided to use was the non-streaming version of Filip Strugar's excellent Continuous Distance-Dependent Level of Detail (CDLOD). What I like about this algorithm is that it's easy to implement, easy to extend, and, aside from the heightmap itself, requires no additional memory on the GPU but one small vertex buffer (just a grid mesh that get's reused over and over for every quadtree node). I'm not implementing the streaming version as, for an RTS, you can instantly hop to any part of the playable space and you really don't want to see low detail terrain there. Additionally, the entire playing field fits into a small enough memory footprint that it shouldn't be necessary.

Here you can see the algorithm in action, with the morph areas clearly visible.
Surface Texturing And Lighting
With the geometry system up and running, next thing to do is make sure that it's textured nicely. Again there's many different kinds of texturing methods here, but for now simple is better (well simple to start with, that I can improve later when I bump up against it's limitations..). All I did was implement basic texture splatting, but have it so that it can use up to eight different layers (So, two RGBA splat textures are required), and each layer has a diffuse and normal component so we get nice normal mapping on the terrain.

Here you can see the different splatting layers.
Speaking of normal mapping, when first thinking about it, I thought that storing the whole tangent-bitangent-normal basis per vertex for every vertex on the terrain would be a huge memory drain (as in, impossible to do). But this presentation from the engine guys over at DICE pointed out the completely obvious fact that if you have access to the heightmap data you can just recalculate the basis in the vertex shader. Which drops that from being a concern completely and is plenty fast at runtime. So for the lighting you just calculate the basis in the vertex shader, interpolate it, and in the fragment shader transform from tangent space to view space and store the resulting normal in the gbuffer. Easy!
View space normals generated form the heightmap + splatted normal maps.
A note on memory usage.
The actual heightmap texture that's used by the renderer, is stored in 16 bit floating point. So the entire memory usage for a 4km x 4km play space with 1 meter resolution sits at a comfortable 32MB on the GPU. The textures used will vary widely in size of course.

Terrain Editor 
Most of my previous demos had me just loading up a heightmap and using that but that's no good when you're trying to actually build something that can be used for a game. The gulf between simple demo and game engine component... but I'm getting off topic.
What you need from a terrain editing tool is the ability to make terrain adjustments in real time, and see it in real time. It's completely unusable if you have to modify a heightmap image or something stupid like that, and then reload into the game. In fact, the actual heightmap should be abstracted away from the user. They're just modifying the terrain and shouldn't care about how it's stored or rendered.

The core piece of code that enables all of the editing in the level editor is ray-terrain intersection.
This is actually required for the game as well (select unit, tell him to move over there) so making it fast is important. What happens is that there's two components to the terrain system. One rendering quadtree, who's nodes subdivide to the terrain patch size (which is usually like 31x31 or 61x61 vertices in my engine, but can vary) and another system side quadtree which is subdivided to a much higher granularity for the ray triangle intersections. As for what editing functionality is provided, the usual suspects, like addition, subtraction, set to level height, smoothing and all that. Best seen through screenshots:

Heightmap modification.


Texturing.
End Result
To end off with I might as well put a video here showing the actual level editor in action. Excuse the horrible programmer level design :)



That's it for the post, thanks for reading!
And a special thanks to all the folks who put their free assets online for me to test stuff with, it is much appreciated!

Sunday, 31 August 2014

Rendering Post: Cascaded Shadow Maps

I'm going to talk about the very initial implementation of shadows that I've done for the engine so far. To be honest I'm not even sure if this exact method is actually used in games because I just sat down and coded something up over a weekend. But here goes! If you have never heard of cascaded shadow maps before an excellent starting point is here.

Because the engine I'm building only needs to support wide outdoor environments (it's an RTS) the most important light type I need to cater for is the directional light from the sun. So no omni-directional or spotlights need to be done for now. 

Let's outline the basics of the algorithm.
Part 1. Generate the shadow maps.
Bind framebuffer with render to texture for the depth buffer. Switch GPU to double speed z only rendering.
for each cascade 
1). Calculate the truncated view frustum of the main camera in world space, use this to determine the position, bounds, and projection matrix of the light. 
2). Render all relevant geometry of the scene into the shadow map.

Part 2. During the deferred lighting stage directional light shading pass.
for each pixel
1). Read corresponding depth, linearize and calculate the world space position of each pixel.
2). Project world space position into the current cascade's light space, and run the depth comparison and filtering against the depth stored in the shadow map. 

That's literally it. The beautiful thing about shadow mapping, as opposed to something like shadow volumes is the conceptual simplicity of the technique. But it comes at the expense of grievances like having to manage shadow map offsets and deal with issues like duelling frusta etc. Another benefit of this way of doing it (during the deferred lighting stage) is that you get free self shadowing that's completely generic over the entire world. If there's a point in the world, it will get shadowed correctly (provided your shadow maps are generated correctly as well). 

Now let's get into the details.

Calculate the view volume of the light for each cascade slice.
The most important thing you need to do here is ensure that all points that are enclosed in the frustum slice of the main camera will be included in the lights viewing volume. Not only that, but the lights viewing volume should extend from the camera frustum to the light's position itself so that any points that are outside the camera's viewing volume but potentially casting shadows onto the points that are accounted for.

This simple diagram should explain.

How I defined the light's viewing volume was to define the light's camera space, and then transform all points of the cascade slice frustum into that camera space. Which, since we're dealing with an orthographic projection, makes it easy to find the maximum and minimum extents along each axis (with the exception of the near z plane, which is always set to 1.0f. From that, you can create your light basis clipping planes and then, for each element in the world, if it's inside both pairs of clipping planes, then it get's rendered into the shadow map.

A note here on level of detail:
You want to make sure that the level of detail set for the mesh of terrain node you're rendering is based off of the main camera (not the light) to avoid artifacts. But depending on how you do things these may be tolerable and will certainly be faster.

Reconstructing the world space position.

If you read the previous article this would be an example of why you need to be able to correctly read the linear depth from the depth buffer. If you don't, your reconstructed world space positions will be swimming around and you'll notice horribly weird artifacts. A good debug view is to visualize the generated world space pixels normalized to the bounds of the current map (for me it was to calculate the world space position and divide by roughly 4000 (map size is 4km squared).
You should get something like this:



And you should note that as the camera moves around the scene the colors in the scene should not change... at all. That's the first sign you're doing something wrong.

The way I reconstructed the world space position was to calculate the vectors from the camera frustum near corners to the far corners. Translate those vectors into world space and then pass them as attributes to be interpolated over the screen during the full screen pass. From there you get the interpolated view vector and multiply it by the linear depth read from the depth buffer and you've got your world space position.

Doing the final lighting pass.
After you've got your shadow maps and you have your world space positions for each pixel, inside the final shader all you need to do is transform the pixel from world space into the light space and run your comparison. Here's the really basic fragment shader I used, there's a bunch of optimizations (and modernizations, running off of the old 1.1 GLSL spec) to do and the filtering is only the hardware PCF at the moment but it should be a good reference point to just get something on screen.

 #version 110   
   
 //#define DEBUG   
   
 /// Texture units.  
 uniform sampler2D albedoBuffer;  
 uniform sampler2D normalBuffer;  
 uniform sampler2D depthBuffer;  
   
 uniform sampler2DShadow shadowCascade0;  
 uniform sampler2DShadow shadowCascade1;  
 uniform sampler2DShadow shadowCascade2;  
 uniform sampler2DShadow shadowCascade3;  
   
 varying vec2 vTexCoord0;  
   
 /// Contains the components A, B, n, f in that order.  
 /// Used for depth linearization.  
 uniform vec4 ABnf;  
   
 /// World space reconstruction.  
 varying vec3 vFrustumVector;  
 uniform vec3 cameraPosition;  
   
 /// Lighting values.  
 uniform vec3 viewspaceDirection;  
 uniform vec3 lightColor;  
   
 uniform mat4 cascade0WVP;  
 uniform mat4 cascade1WVP;  
 uniform mat4 cascade2WVP;  
 uniform mat4 cascade3WVP;  
   
 void ProcessCascade0(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(2.0, 0.5, 0.5);  
     #endif  
   
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.0005;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade0, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void ProcessCascade1(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(0.5, 2.0, 0.5);  
     #endif  
   
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.001;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade1, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void ProcessCascade2(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(0.5, 0.5, 2.0);  
     #endif  
       
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.002;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade2, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void ProcessCascade3(in vec4 cascadeClipSpacePosition)  
 {  
     #ifdef DEBUG  
     gl_FragColor.rgb *= vec3(2.0, 0.5, 0.5);  
     #endif  
   
     // Get depth in light space.  
     cascadeClipSpacePosition.xyz += 1.0;  
     cascadeClipSpacePosition.xyz *= 0.5;  
   
     cascadeClipSpacePosition.z -= 0.0025;  
     cascadeClipSpacePosition.w = 1.0;  
     float multiplier = shadow2DProj(shadowCascade3, cascadeClipSpacePosition).r;  
     gl_FragColor.rgb *= multiplier;      
 }  
   
 void StartCascadeSampling(in vec4 worldSpacePosition)  
 {  
     vec4 cascadeClipSpacePosition;  
     cascadeClipSpacePosition = cascade0WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade0(cascadeClipSpacePosition);  
         return;  
     }  
       
     cascadeClipSpacePosition = cascade1WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade1(cascadeClipSpacePosition);  
         return;  
     }  
     cascadeClipSpacePosition = cascade2WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade2(cascadeClipSpacePosition);  
         return;  
     }  
       
     cascadeClipSpacePosition = cascade3WVP * worldSpacePosition;  
     if (abs(cascadeClipSpacePosition.x) < 1.0 &&   
         abs(cascadeClipSpacePosition.y) < 1.0 &&   
         abs(cascadeClipSpacePosition.z) < 1.0)  
     {  
         ProcessCascade3(cascadeClipSpacePosition);  
         return;  
     }  
 }  
   
 void main()  
 {  
     float A = ABnf.x;  
     float B = ABnf.y;  
     float n = ABnf.z;  
     float f = ABnf.w;  
   
     // Get the initial z value of the pixel.  
     float z = texture2D(depthBuffer, vTexCoord0).x;  
     z = (2.0 * z) - 1.0;  
       
     // Transform into view space.      
     float zView = -B / (z + A);  
     zView /= -f;  
   
     // Normalize zView.  
     vec3 intermediate = (vFrustumVector * zView);  
       
     vec4 worldSpacePosition = vec4(intermediate + cameraPosition, 1.0);  
       
     vec3 texColor = texture2D(albedoBuffer, vTexCoord0).rgb;  
       
     // Do lighting calculation.  
     vec3 normal = texture2D(normalBuffer, vTexCoord0).rgb * 2.0 - 1.0;  
       
     float dotProduct = dot(viewspaceDirection, normal);  
   
     gl_FragColor.rgb = (max(texColor * dotProduct, 0.0));  
     gl_FragColor.rgb *= lightColor;  
       
     // Now we can transform the world space position of the pixel into the shadow map spaces and   
     // see if they're in shadow.  
     StartCascadeSampling(worldSpacePosition);  
       
     gl_FragColor.rgb += texColor.rgb * 0.2;  
 }  

The final result.

Here's a couple of screenshots:


Test scenario showing how the buildings cast shadow onto one another. 

Closer in details are still preserved with cascaded shadow maps. Self shadowing Tiger tank FTW!


Here you can see cascades 1-3 illustrated.


And here is cascade 0 included as well for the finer details close to the camera.
Just a thanks here to everyone from Turbosquid and the internet for the free models! :)

Friday, 29 August 2014

Rendering Post: Linearizing your depth buffer and simple fog.

Right so enough about tools for now, let's do some rendering work. I hadn't really devoted much time to the renderer for the project, since I've been focusing on the back end and tools. But, tools can be boring to talk about and not terribly useful to the reader, so I added a couple of features to the engine. Let's get started.

Linearizing Your Depth Buffer
An important fundamental step to a lot of graphics techniques is to be able to read the depth of the current pixel from your depth buffer. Well... no shit, but you need to be able to do it properly. When sampling from your depth buffer in the shader, you get a value back in the range [0, 1], and you might assume that 0 is the near plane and 1 is the far plane distance.. but nope, there's a characteristic here that we need to be aware of. What we need is to understand the transformation that the z component of a vertex will undergo and how and why it's interpolated across the surface of the triangle the way it is.

Now here is where it get's potentially confusing. When it comes to the handling of z values, there are two places we get the result 1/z and they are not related. Firstly we use it when it comes to interpolating vertex attributes successfully during rasterization. The second is a consequence of the perspective divide operation.

Let's start with the first (I'm going to assume that the you guys are familiar with perspective, our perception of the world, and why we need projection to replicate it in computer graphic. If not, see here). You see, when we go from the abstract three-dimensional description of a triangle to the two-dimensional triangle that gets interpolated across the screen, we suffer from a loss of dimensionality that makes it hard to determine the correct values we need for rendering. We project the three-dimensional triangle onto a two-dimensional plane and then fill it in. But we need to be able to reconstruct the three-dimensional information from the interpolated two-dimensional information.

Here on the left is the standard similar triangles diagram that explains why you divide by z to shrink the x value the farther the point is from the camera. Easy stuff.


For this quick explanation let's assume that your near plane is set to 1.0, which means we can remove it from the equation. We have:

x' = x / z

which we can manipulate to get the equation

x = x' * z

which is great because it means that we can reconstruct the original view space x from the projected x point by multiplying by the original z value. So as we're interpolating x' during the rasterization process we can find z and recalculate the original x value. We'd be able to recover a 3D attribute from a 2D interpolation. But this assumes that you have z handy, which we don't. We need to find a way to interpolate z linearly across the screen. A linear equation is an equation of the form y = Ax + B

Ignoring what A and B's actual values are for the moment, and substituting for x and z we get:

x = Az + B

but x = x' * z, so

x' * z = Az + B

which. after some manipulation gets us

z = B / (x' - A)

Which is hardly linear.  But now for the magic trick. Take the reciprocal of both sides.

(1 / z) = (1 / B) * x' - (A / B)

Hey, that's linear!

So, we can interpolate z's reciprocal in terms of x' (we get x' by interpolating across the screen during rasterization). From there, we can just take the interpolated value, and get it's reciprocal again and we'll have z. And from that we can recalculate the interpolated x and y in 3D space. So we've overcome the dimension loss of 2D rasterization. This is applied to all attributes associated to a vertex, things like texture coordinates etc.

Secondly, when we read from the depth buffer after a perspective projection. We get an odd value back. We need to understand where it comes from and how it's reversed to recover the proper depth value. Look at the summary below:

We can see the journey of the vertex from model space to view space. Here it gets projected into a unit cube centered on the origin where clipping is performed (hence the name). As an aside, it's also here that the GPU will generate more triangles if it needs to from the clipping process.

I use the stock standard OpenGL transformation system, so right handed coordinate systems with the part of view space that will be seen in the negative z range. Looking at the symmetric perspective projection matrix definition for OpenGL we have the following:



Renaming the items in the third row to A and B and multiplying a 4D homogeneous vector
(x, y, z, 1.0) by this matrix, our vector would be

(don't care, don't care, Az + B, -z)

Two things to note: Az + B is still a linear transform (it's just mapping the z range to the range [-1, 1] with -1 being the near plane and 1 being the far plane). but w has become -z. The negation is just because in our right handed view space the visible portion is on the negative z axis.
When the perspective w divide occurs we'll have a value of the form:

(1 / z) * C

Where C is -(Az + B).  I'd also add that somewhere here they sneakily transform the C value from the range [-1, 1] to [0, 1] so the final value is probably:

(C * 0.5 + 0.5) / z

And then, depending on your depth buffer format, that value might be converted to an integer value. But for modern day cards it should generally be floating point, and it's all hidden away by the API for you.

Using this information we can reconstruct the proper depth when we read from the depth buffer.
Firstly, we need to calculate some values and pass them into our shader as uniforms.
We'll calculate:

     float n = camera.GetNearClipPlaneDistance();  
     float f = camera.GetFarClipPlaneDistance();  
     float A = -(f + n) / (f - n);  
     float B = (-2.0f * f * n) / (f - n);  
     shader->SetUniform4f("ABnf", A, B, n, f);  

And in the shader, we firstly reverse the offset from [-1, 1] to [0, 1], so we get the value back in NDC space. Then we apply the inverse of the C / z equation, which can be best understood by this diagram:


So there's our reverse formula that we use to go from normalized device coordinates back to view space and we're done! In GLSL this is:
 #version 110  

uniform sampler2D textureUnit0;  

 /**  
 * Contains the components A, B, n, f in that order.  
 */  
 uniform vec4 ABnf;  

 varying vec2 vTexCoord0;  

 void main()  
 {  
     float A = ABnf.x;  
     float B = ABnf.y;  
     float n = ABnf.z;  
     float f = ABnf.w;  

     // Get the initial z value of the pixel.  
     float z = texture2D(textureUnit0, vTexCoord0).x;  
     z = (2.0 * z) - 1.0; 
 
     // Transform into view space.      
     float zView = -B / (z + A);  

     // Get normalized value.  
     zView /= -f;  

     gl_FragColor = vec4(zView, zView, zView, 1.0);  
 }  

I negate it because, remember, in OpenGL's view space, visible z values lie in the negative z axis.
Here's a screenshot of what this outputs in a test level:



To sum up the findings.
1) The odd nonlinear values you read from the depth buffer HAVE NOTHING TO DO WITH PERSPECTIVE CORRECT INTERPOLATION.  They are caused entirely by the perspective divide caused by perspective projection.

2) There still needs to be an interpolation of 1/z somewhere in order to enable perspective correct interpolation of vertex attributes, we just don't have to care about it. Your vertices attributes go in on one end, and come out perspective correct on the other, you don't need to divide by z again.

Bonus Feature: Simple exponential fog.
So what are the uses of linear depth? Well as you'll see in posts in the future its really used all over the place, but for starters, I just coded up a simple exponential fog function. All it really does is get the normalized linear distance (a value from [0, 1]) and square it, with some artist configurable parameters of course like starting distance and a density multiplier. It's as simple a shader as you can get, and takes about 30 minutes to get integrated... ok maybe a bit more because I coded up the beginnings of the post processing framework at the same time. Anyways, the point is for a simple shader, the effects can be quite dramatic:


Useful resources:

http://www.songho.ca/opengl/gl_projectionmatrix.html
A beautifully comprehensive guide to the derivations of both orthographic and perspective projection matrices for OpenGL camera systems. Also where I grabbed the projection matrix image from.

http://chrishecker.com/Miscellaneous_Technical_Articles#Perspective_Texture_Mapping
All hail! The original and unsurpassed articles on perspective texture mapping. A little outdated now but if you're looking for the fundamentals of texture mapping and manage to survive them, they're still very informative. In fact, while you're at it, go read all of Chris Hecker's articles, he's up there with Michael Abrash in terms of ability to bring extremely technical topics down to a casual conversation level without dumbing it down.

http://www.amazon.com/Mathematics-Programming-Computer-Graphics-Edition/dp/1584502770
I actually have two copies of this book. One I used to keep at work and one for home.  Chapters 1-3 are what you need if you need an introduction to matrices, vectors, affine transformations, projections, quaternions, and so on. Integrates nicely with OpenGL engine development as all the conventions are based off of that API.




Saturday, 9 August 2014

Tool Post: Static Mesh Compiler

Ok another post, and another tool to cover :)

We've done textures, so let's look at another asset type that we'll need to provide to the game.
Defined in the terminology of my engine, static meshes are essentially non-deformable meshes i.e. their vertex attributes won't change, they're uploaded to the GPU once and never modified again. You see examples of these in almost any game out there today. A rock mesh or a cup on a table, for example.

Let's think on the tool pipeline here. An artist/modeler is going to use his preferred tool, Blender, 3ds, or whatever to generate his mesh and it's going to spit out one of a variety of formats. So your tool has to have an extensible importer framework set up to handle it all. Right now, I have that set up but I've only written the importer for wavefront .obj. When I need another format it's easy to add it.

So, similarly to the Texture Compiler, the Static Mesh Compiler maintains an asset file, which contains imported static meshes in an intermediate format that allows for adding, removing, renaming, and re-positioning of the meshes. Then there is a publishing front end which takes the intermediate format database and outputs a platform optimized, compressed database that is optimal for the engine to load at runtime.

Level Of Detail
I need to branch away for a second and talk about level of detail and why it's important.
Level of detail is essentially a performance optimization. The key point is to not spend precious GPU time drawing something at it's highest possible fidelity when it's so far away you're not going to actually see and appreciate that detail; A simplified representation should suffice. Observe:


On the left we have your standard Stanford bunny representation, this one clocks in at 4968 triangles. On the right however you have a far simpler mesh, at 620 triangles. There's definitely a noticeable difference in quality here. But what happens when the camera is sufficiently far from the model?


Hard to see the difference right? At this point the objects contribution to the scene in terms of pixels output is too small to discern the detail. Voila, we don't need to care about drawing the detailed model any more.

It's worth mentioning that there are several different level of detail techniques. The first and most common, is discrete level of detail, whereby a chain of meshes is generated for every initial mesh, with each mesh in the chain being of a lower level of detail. This is quite common in games today and is characterized by being able to see a visible "pop" of a mesh as it transitions between detail levels. Secondly, continuous level of detail generates a progressive data structure that can be sampled in a continuous fashion at runtime. And thirdly, there exists view-dependent level of detail algorithms that take into account the angle of view that the mesh has with the camera in order to optimize detail. In addition, there are probably a few dozen different types that I'm missing out, notably, I'm interested in there being any level of detail techniques using the newer tessellation hardware available. For all of that, however, it must be said that discrete level of detail is still the most popular as the other techniques are either too slow at runtime or rely on the presence of special hardware in order to work and/or be performant. For instance, I know that one of the keystones of the game LAIR for PS3 was its progressive mesh technology, which required the Cell Processor and (probably) the ability of the GPU to read from system memory to be feasible. For our uses, we'll stick with good old discrete levels of detail. The only drawback to it is the potentially noticeable pop between LOD levels... and the added artist time to create all of the different levels of detail. Which leads to the next topic.

Automated LOD Generation Library
Another thing that would be ideal for our tool is some form of automated level of detail generation.
In an ideal world, maybe you'd have your artists go ahead and lovingly hand craft every single LOD level to be the optimum for a given triangle budget. But for scenarios like ours where you have one, maybe two artists available, you really want to take as much off their hands as possible. Luckily there are several libraries available that cater for automated level of detail generation. After some searching around, what I found most suitable for my immediate needs was GLOD (http://www.cs.jhu.edu/~graphics/GLOD/index.html), a powerful level of detail library developed by a team from The John Hopkins University and the University of Virginia. The thing that makes GLOD desirable for our needs is that it's designed to be closely integrated with OpenGL, for instance, it uses OpenGL vertex arrays as its unified geometry interface, meaning that it is by design capable of working with any per vertex attributes you would care to. If it works in OpenGL, GLOD can simplify/adapt it. In this respect, true to their design goals, it more closely resembles an extension to the OpenGL driver than a standalone LOD toolkit.
Also of benefit is that GLOD is separated into three modules, meaning you are free to use whatever subset of the provided functionality you need. We're just using it as a geometry simplification tool for our discrete level of detail code, however, it has additional level of detail features that I need to experiment with when I have the time.

The Tool Interface
Here's a screenshot of the main tool interface.


So, typical affair for UI, you have your list database view and publishing tool on the top left, the gizmo's for manipulating your mesh in space (good for view testing for the LODs etc) and on the bottom left you have the level of detail tool for manipulating the mesh levels of detail, you can set the individual levels distance to camera and its maximum triangle budget allowed (usually the tool spits out a mesh that's a few triangles under it, depending on circumstance). Of course, you can also move about the scene and set rendering options, like take a peek into the GBuffer, and see the wireframe and bounding volume or your mesh. For most of the tools that have a 3D scene widget that uses the renderer, those options are always available.

I should just add here that most of the tools I'll write about are under active development so they may get features added and/or removed and things may change. Which is great really because then I have new post material!


Thursday, 17 July 2014

Tool Post: Texture Compiler

All engines run off of generated assets. The most advanced renderer on the planet is meaningless if all you can do is draw a programmer defined cube with it. These assets are created by artist tools, such as Maya or 3ds max, but aren't necessarily loaded into the game in those formats. Try parsing a Wavefront .obj model every time you want to load an asset and you'll see what I mean, it's damn slow. Engines tend to run off their own optimized formats that are created from source material by a resource pipeline, a set of tools that converts artist created models, audio clips, textures etc, into a format that is optimal for the engine to locate, load and work with. In addition, the resource pipeline may bundle engine and game specific data into these formats for use later on.

The first tool I created was a texture compiler. Now loading in raw .png files and using them as textures isn't the most horrible thing that could be done. But it does have problems as you'll see later on in this post. It appears trivial at first, but there's a bit of work that needs to be done with source textures before you're ready to render with them properly. Chief among the concerns is the issue of Gamma Correction.

Gamma Correction
There are TONS of resources on this subject now, but I'll include the briefest explanation. 
So, from what I can gather, the story goes like this. Once upon a time we had CRT monitors, and it was noted that the physical output of the beam varied non-linearly with the input voltage applied. What this means is that if you wanted to display the middle grey between pitch black and pure white, and you input the RGB signal (0.5, 0.5, 0.5), you wouldn't get the middle grey as you would expect. If you measured the output brightness, you got something along the lines of (0.22, 0.22, 0.22). Worse still with this phenomenon, you actually get colour shifting(!), observe... I enter (0.9, 0.5, 0.1) and I get (0.79, 0.21, 0.006), the red becomes far more dominant in the result.

When plotted on a graph, the relationship could be viewed thus:


Note the blue line, this is the monitors natural gamma curve. Also note that I've used the power factor 2.2 to represent the exponent that the monitors have. The exponent actually varies, however, 2.2 is generally close enough to correct that it can be used in most cases.

Nowadays, most displays aren't CRT. But, in the interest of backwards compatibility, modern monitors emulate this gamma curve.

But how come all the images you see aren't darker than they should be?
Well that's because a lot of image formats these days are pre-gamma-corrected (jpeg, png are two). That means that the internal image values are mapped to be the green line in the graph, basically raised to the power of 1 / 2.2. This has the effect of cancelling out the monitors gamma when displayed to the user. So at the end you see the image values as they were intended. Which is great when all you're doing is viewing images, but it causes some serious (and subtle) issues when rendering. Because all of the operations that occur during rendering assume linear relationships. Obvious examples are texture filtering, mipmap generation, alpha-blending, or lighting.

Why didn't they keep the image formats linear and just pre-gamma correct before outputting to the display? What's with the inverse gamma curve on the images? The answer is another history lesson, it turns out by lucky coincidence (which was actually purposeful engineering) that raising the image values to the reciprocal gamma exponent had the side effect of allocating more bits to represent darker values. This makes sense as humans are more adept at seeing differences between dark tones than differences between light tones. In a way, it makes sense to still have it around.

What this all comes down to is that we have to deal with these non-linear inputs and outputs on either side of our linear rendering pipeline.

The Texture Compiler



Wow, a seriously complicated tool yeah? It's about as basic an interface as you can get for working with textures. Most of what's interesting happens in the background. It basically works as follows.

What the tool does is maintains a list of textures in an intermediate format, which can be saved out to a texture asset file (*.taf). This enables you to load up an existing asset file, add images, remove images, rename, change a parameter and so on, then save again. Then, when you want to export to the format the engine consumes, you select which platforms you want to publish to (right now it's just PC OpenGL) and hit the Publish button. This then generates a very simple database, it's basically a two part file. The index part and the data part.

When the engine loads up, it only loads the texture database's index. Then, when an asset is encountered that requests a texture, the following process occurs. The engine queries the resident texture set, if the texture has been loaded onto the GPU already, it's returned immediately. If it hasn't been loaded yet, then the texture database index is queried for the offset into the texture database file of the desired texture. The raw database entry is loaded and, if it was successfully compressed by the publishing tool, it's decompressed into the raw image data. Then that raw image data is compressed to whatever GPU friendly compression format is supported and sent off to the GPU. If a texture is requested that isn't inside the texture database, then a blank white texture is returned instead.




It should be noted that the textures inside the texture database file are already in linear space. If you look at the tools screenshot, you'll see that there's a "Gamma correct on publish." option. That will simply tell the tool that on publish, raise the texture values to the desired power (in this case 2.2) to bring the values back into linear space. Then all of the automatic mipmap generation and texture filtering in the API and on the GPU will be correct from the get go. It's also an option specifically because for some textures, you don't want to gamma correct. Normal maps for instance tend to not be pre corrected i.e are already linear. Because our inputs are now linear, and our internal operations are linear, all that's required at the end of the rendering pipeline is to apply the inverse gamma correction to the framebutffer and... that's a bingo!

Just as an addendum on the whole linear pipeline topic, note that the alpha channel of gamma corrected (sRGB) textures will also be linear and therefore need no correction. Aaaaand also that while storing linear textures has its advantages, you won't be allocating as many bits of precision to the lower intensity light values. There are a few ways to go about fixing this (such as moving from 8 bits per channel to 16). Having said that, I haven't really noticed any glaring artifacts as the textures we're using for our game are all bright and colorful, so its alright :)