Deferred Rendering

G-buffers and efficient multi-light rendering

The Forward Rendering Problem

In traditional forward rendering, each object is rendered once, and all lighting calculations happen in its fragment shader. For a scene with 100 lights, each fragment must loop through all 100 lights—even fragments that are occluded and will never appear on screen.

// Forward rendering: every fragment evaluates every light
@fragment
fn fragmentMain(input: VertexOutput) -> @location(0) vec4f {
  var finalColor = vec3f(0.0);
  
  for (var i = 0u; i < numLights; i++) {
    finalColor += calculateLight(input.position, input.normal, lights[i]);
  }
  
  return vec4f(finalColor, 1.0);
}

wgsl

This approach has two problems. First, occluded fragments waste work. If object A is behind object B, both objects' fragments run their full lighting calculations, but only B's are visible. Second, the cost scales as O(objects × lights). With many lights, this becomes prohibitively expensive.

Forward rendering works well for simple scenes. But games with hundreds of lights—streetlamps, explosions, flashlights, magic effects—need a different approach.

Deferred Rendering: Separate Geometry from Lighting

Deferred rendering splits the work into two distinct phases:

Geometry pass: Render all objects, but instead of computing final colors, write intermediate data to multiple textures. These textures—the G-buffer—store everything needed for lighting: positions, normals, albedo colors, material properties.

Lighting pass: Render a full-screen quad. For each pixel, read from the G-buffer and compute lighting. Only visible pixels are processed—occluded fragments never reach this stage.

Interactive: Deferred Pipeline

Scene Objects

→

Geometry Pass

→

G-Buffer

→

Lighting Pass

→

Final Image

1. Geometry Pass

Render all objects to G-buffer (no lighting)

The critical insight is that lighting calculations happen after visibility is resolved. In forward rendering, you compute lighting for fragments that might be overwritten. In deferred rendering, you only light the final visible surface.

The G-Buffer

The G-buffer (geometry buffer) is a set of textures that store all the geometric and material information needed for lighting. A typical G-buffer includes:

Position buffer: World-space or view-space position of each pixel. Alternatively, reconstruct from depth.

Normal buffer: Surface normal at each pixel, typically normalized and stored in a format like RGB10A2 for precision.

Albedo buffer: Base surface color before lighting. Often RGB with alpha for transparency hints.

Material buffer: Roughness, metalness, ambient occlusion—whatever PBR parameters your lighting model requires.

// Create G-buffer textures
const gBufferPosition = device.createTexture({
  size: [width, height],
  format: 'rgba16float',
  usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING,
});
 
const gBufferNormal = device.createTexture({
  size: [width, height],
  format: 'rgba16float',
  usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING,
});
 
const gBufferAlbedo = device.createTexture({
  size: [width, height],
  format: 'rgba8unorm',
  usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING,
});

javascript

Interactive: G-Buffer Contents

The G-buffer stores geometry data in separate textures. During the lighting pass, these are combined to compute final illumination.

Each buffer captures a different aspect of the scene's geometry. When combined during the lighting pass, they provide complete information for computing illumination.

The Geometry Pass

The geometry pass renders all scene objects to the G-buffer. The fragment shader outputs to multiple render targets simultaneously:

struct GBufferOutput {
  @location(0) position: vec4f,
  @location(1) normal: vec4f,
  @location(2) albedo: vec4f,
}
 
@fragment
fn geometryPass(input: VertexOutput) -> GBufferOutput {
  var output: GBufferOutput;
  output.position = vec4f(input.worldPosition, 1.0);
  output.normal = vec4f(normalize(input.normal), 0.0);
  output.albedo = vec4f(input.color, 1.0);
  return output;
}

wgsl

The render pass configuration specifies multiple color attachments:

const geometryPassDescriptor = {
  colorAttachments: [
    {
      view: gBufferPosition.createView(),
      clearValue: { r: 0, g: 0, b: 0, a: 0 },
      loadOp: 'clear',
      storeOp: 'store',
    },
    {
      view: gBufferNormal.createView(),
      clearValue: { r: 0, g: 0, b: 0, a: 0 },
      loadOp: 'clear',
      storeOp: 'store',
    },
    {
      view: gBufferAlbedo.createView(),
      clearValue: { r: 0, g: 0, b: 0, a: 0 },
      loadOp: 'clear',
      storeOp: 'store',
    },
  ],
  depthStencilAttachment: {
    view: depthTexture.createView(),
    depthClearValue: 1.0,
    depthLoadOp: 'clear',
    depthStoreOp: 'store',
  },
};

javascript

Notice that the geometry pass does no lighting. It simply captures geometric data. This is fast—the cost is essentially the same as rendering the scene once with basic shaders.

The Lighting Pass

The lighting pass demonstrates deferred rendering's core advantage. A full-screen quad is rendered, and the fragment shader samples the G-buffer to compute lighting for each pixel:

@group(0) @binding(0) var gPosition: texture_2d<f32>;
@group(0) @binding(1) var gNormal: texture_2d<f32>;
@group(0) @binding(2) var gAlbedo: texture_2d<f32>;
@group(0) @binding(3) var<storage, read> lights: array<Light>;
 
@fragment
fn lightingPass(@builtin(position) fragCoord: vec4f) -> @location(0) vec4f {
  let uv = vec2i(fragCoord.xy);
  
  let position = textureLoad(gPosition, uv, 0).xyz;
  let normal = normalize(textureLoad(gNormal, uv, 0).xyz);
  let albedo = textureLoad(gAlbedo, uv, 0).rgb;
  
  var finalColor = vec3f(0.0);
  
  // Ambient
  finalColor += albedo * 0.1;
  
  // Accumulate light contributions
  for (var i = 0u; i < arrayLength(&lights); i++) {
    finalColor += calculatePointLight(position, normal, albedo, lights[i]);
  }
  
  return vec4f(finalColor, 1.0);
}

wgsl

The key difference from forward rendering: this loop runs exactly once per visible pixel. No matter how many objects were rendered, no matter how much overdraw occurred during geometry pass—only the final visible surface is lit.

Why Deferred Helps with Many Lights

Interactive: Many Lights Performance

Light Count16

Forward Cost

O(pixels × 16)

Deferred Cost

O(pixels) + O(16)

With light volumes, each pixel only evaluates lights within range—not all 16 lights.

Consider a scene with 1000 objects and 100 lights.

Forward rendering cost: Each object's fragments evaluate 100 lights. With overdraw (multiple objects at the same pixel), some pixels evaluate lights multiple times. Total: O(visible pixels × overdraw × lights).

Deferred rendering cost: Geometry pass renders 1000 objects (O(objects)). Lighting pass processes each pixel exactly once, evaluating 100 lights (O(pixels × lights)). Total: O(objects + pixels × lights).

Deferred rendering removes overdraw from the lighting equation. When many objects overlap, forward rendering wastes significant work; deferred rendering does not.

Optimization: Light Volumes

Evaluating all lights for every pixel is wasteful. A light on the left side of the scene does not affect pixels on the right. Light volumes restrict which pixels evaluate which lights.

For point lights, render a sphere centered on the light. Only pixels inside the sphere run the light calculation:

// For each light, render its bounding sphere
for (const light of lights) {
  pass.setVertexBuffer(0, sphereMesh);
  pass.setBindGroup(1, light.bindGroup);
  pass.draw(sphereVertexCount);
}

javascript

The fragment shader then computes only this light's contribution. This approach converts the problem from O(pixels × lights) to O(pixels × average_lights_per_pixel)—a substantial win when lights have limited range.

Some engines take this further with tiled or clustered deferred rendering, dividing the screen into tiles and building lists of which lights affect each tile.

Forward vs Deferred: The Tradeoff

Interactive: Architecture Comparison

Forward: Object × Light iterations

Obj 1

Obj 2

Obj 3

Obj 4

Obj 5

Total lighting ops: 20

Objects5

Lights4

Forward Cost

O(objects × lights)

Deferred Cost

O(objects + lights)

Deferred is 2.2× more efficient for this configuration

Deferred rendering is not universally superior. Consider the tradeoffs:

Deferred advantages:

Decouples geometry complexity from light count
Only visible surfaces are lit (no wasted work on occluded fragments)
Easy to add post-process effects (you have depth, normals, positions readily available)

Deferred disadvantages:

High memory bandwidth (G-buffer can be 64+ bytes per pixel)
No hardware MSAA (anti-aliasing must be done differently)
Transparency is difficult (transparent objects must use forward rendering)
Multiple material models require G-buffer format changes

Games often use a hybrid: deferred for opaque geometry and many lights, forward for transparent objects and simple scenes.

The Transparency Problem

Transparent objects are deferred rendering's Achilles' heel. The G-buffer stores data for one surface per pixel. But transparency requires blending multiple surfaces—the G-buffer cannot represent that.

The standard solution: render opaque objects with deferred shading, then render transparent objects with forward shading on top. This requires sorting transparent objects back-to-front and adds complexity to the rendering pipeline.

// Typical hybrid approach
renderDeferredGeometryPass(opaqueObjects);
renderDeferredLightingPass();
renderForwardPass(transparentObjects, sortedBackToFront);

javascript

More advanced techniques like order-independent transparency (OIT) can integrate with deferred rendering, but they add complexity.

Memory Bandwidth Considerations

The G-buffer consumes significant memory bandwidth. A typical setup might use:

Position: RGBA16F (8 bytes per pixel)
Normal: RGBA16F (8 bytes per pixel)
Albedo: RGBA8 (4 bytes per pixel)
Material: RGBA8 (4 bytes per pixel)
Depth: D32F (4 bytes per pixel)

At 1920×1080, that is roughly 54 MB of G-buffer data. Every pixel read during the lighting pass touches all these textures.

Optimizations include:

Reconstruct position from depth: Store only depth, compute world position from depth and camera matrices. Trades compute for bandwidth.

Pack normals: Use octahedral encoding or other compression to fit normals in 2 channels instead of 3.

Minimize G-buffer size: Store only what you need. If you do not use roughness, do not store it.

// Reconstruct position from depth
fn reconstructPosition(uv: vec2f, depth: f32) -> vec3f {
  let clipPos = vec4f(uv * 2.0 - 1.0, depth, 1.0);
  let viewPos = inverseProjection * clipPos;
  let worldPos = inverseView * vec4f(viewPos.xyz / viewPos.w, 1.0);
  return worldPos.xyz;
}

wgsl

Practical Implementation Steps

Implementing deferred rendering from scratch:

First, create the G-buffer textures. Decide what data you need—position, normal, albedo, material properties. Choose formats that balance precision and bandwidth.

Second, modify your geometry shader to output to multiple render targets. The vertex shader stays mostly the same; the fragment shader writes to each G-buffer channel.

Third, create the lighting pass pipeline. It reads from the G-buffer and writes to the final framebuffer. The shader should sample all G-buffer textures and accumulate light contributions.

Fourth, handle transparency separately. Either use forward rendering for transparent objects or implement an OIT technique.

Fifth, optimize. Profile memory bandwidth. Consider light volumes. Experiment with G-buffer formats.

Key Takeaways

Forward rendering computes lighting per-fragment during object rendering; cost scales with objects × lights × overdraw
Deferred rendering separates geometry capture (G-buffer) from lighting computation (lighting pass)
The G-buffer stores position, normal, albedo, and material properties for each visible pixel
Geometry pass renders objects to the G-buffer with no lighting; lighting pass computes illumination from G-buffer data
Deferred rendering excels with many lights because lighting runs only on visible pixels
Transparency is problematic—typically handled with hybrid forward/deferred
Memory bandwidth is the main cost; minimize G-buffer size and consider position reconstruction
Modern games often use hybrid approaches, combining deferred for opaque objects with forward for transparency