Building a Voxel Engine with WebGPU

From data structures to greedy meshing to a rendered procedural world

What is a Voxel?

A voxel is a volume element—the 3D equivalent of a pixel. Where a pixel represents a point on a 2D grid, a voxel represents a point in a 3D grid. The term comes from "volume" and "pixel," coined in the 1970s for medical imaging.

In games, voxels are most famously associated with Minecraft, but the concept predates it by decades. Voxel engines power destructible terrain, procedural world generation, and any application where the world is fundamentally volumetric rather than surface-based.

The key difference from traditional mesh-based rendering: instead of defining surfaces explicitly with triangles, you define which cells in a 3D grid are filled. The rendering system then figures out what surfaces to draw. This inversion has profound implications for both flexibility and performance.

Voxels excel at representing worlds that change. Digging a tunnel through a mountain is trivial—you just mark those cells as empty. With traditional meshes, you would need complex boolean operations on geometry. But this flexibility comes at a cost: a naive voxel renderer is hopelessly inefficient. A 256x256x256 world contains over 16 million cells. You cannot draw 16 million cubes at 60fps.

This page walks through building an efficient voxel engine from scratch, using WebGPU. By the end, you will understand the core techniques that make voxel games possible: chunking, greedy meshing, ambient occlusion, and lighting.


Part I: Data Representation

The Naive Approach

The simplest voxel storage is a flat 3D array:

const world = new Uint8Array(width * height * depth);
 
function getVoxel(x: number, y: number, z: number): number {
  return world[x + y * width + z * width * height];
}
 
function setVoxel(x: number, y: number, z: number, value: number): void {
  world[x + y * width + z * width * height] = value;
}
typescript

Each cell stores a block type: 0 for air, 1 for stone, 2 for dirt, and so on. With a single byte per voxel, a 256x256x256 world uses 16MB. That sounds manageable until you realize games often have much larger worlds—Minecraft's world is 60 million blocks wide.

Chunk-Based Architecture

The solution is to divide the world into chunks—fixed-size regions that can be loaded and unloaded independently. A typical chunk size is 16x16x16 or 32x32x32 voxels.

Chunks provide several benefits. First, you only keep nearby chunks in memory. Second, when a voxel changes, you only need to regenerate the mesh for that chunk, not the entire world. Third, chunks are the natural unit for frustum culling—if a chunk is outside the camera's view, skip it entirely.

Interactive: Chunk Memory Calculator

Voxels per chunk
4,096
Bytes per chunk
4.0 KB
Chunks loaded
4,913
Total memory
19.2 MB

Adjust the parameters to see how chunk size, render distance, and voxel data size affect memory usage.

The memory calculation reveals an interesting tradeoff. Smaller chunks mean more chunks to manage and more draw calls, but faster mesh regeneration when blocks change. Larger chunks reduce overhead but make updates slower. Most engines settle on 16x16x16 or 32x32x32 as a sweet spot.

Sparse Representations

For worlds with lots of empty space (caves, floating islands), even chunk-based storage wastes memory storing air. Advanced engines use sparse representations.

Run-length encoding compresses consecutive identical voxels. A column of 64 air blocks becomes a single entry: "64 air." This works well for terrain with clear layers.

Octrees recursively subdivide space. An octree node either contains a single block type (if uniform) or eight child nodes (if mixed). A chunk of solid stone is a single node. A chunk with a single stone block in a sea of air uses about 20 nodes instead of 4096 entries.

These techniques matter for large worlds but add complexity. For learning, stick with flat arrays per chunk.


Part II: From Voxels to Triangles

A voxel world is data. To render it, you need triangles. The challenge: generating those triangles efficiently.

The Naive Approach

The simplest method: for each non-air voxel, emit a cube (6 faces, 12 triangles). A chunk with 4096 solid voxels produces 49,152 triangles. That is already problematic, and you have not even considered that most of those faces are hidden.

Face Culling

A cube has six faces: +X, -X, +Y, -Y, +Z, -Z. A face is only visible if the adjacent voxel in that direction is air (or transparent). A voxel buried underground has zero visible faces.

function isVisible(x: number, y: number, z: number, face: Face): boolean {
  const neighbor = getNeighbor(x, y, z, face);
  return neighbor === AIR;
}
typescript

This optimization alone often reduces triangle count by 80-90% for typical terrain. But we can do much better.

Greedy Meshing

Consider a flat stone floor—a 16x16 grid of stone voxels. With face culling, you emit 256 upward-facing quads. But visually, it is one surface. Why not emit a single quad covering the entire floor?

Greedy meshing merges adjacent coplanar faces into larger quads. The algorithm scans across each slice of the chunk, finding rectangular regions of identical faces and merging them.

Interactive: Greedy Meshing Step-Through

Step 1 of 140 quads / 32 cells

Start scanning from top-left

Step through the greedy meshing algorithm. Green outlines show merged quads.

The algorithm works face by face. For each of the six face directions, you slice the chunk perpendicular to that direction and process each slice as a 2D grid.

function greedyMesh(chunk: Chunk, face: Face): Quad[] {
  const quads: Quad[] = [];
  
  for (let depth = 0; depth < CHUNK_SIZE; depth++) {
    const mask = buildMask(chunk, face, depth);
    
    for (let j = 0; j < CHUNK_SIZE; j++) {
      for (let i = 0; i < CHUNK_SIZE; ) {
        if (mask[i + j * CHUNK_SIZE] === 0) {
          i++;
          continue;
        }
        
        const blockType = mask[i + j * CHUNK_SIZE];
        
        let width = 1;
        while (i + width < CHUNK_SIZE && 
               mask[i + width + j * CHUNK_SIZE] === blockType) {
          width++;
        }
        
        let height = 1;
        let done = false;
        while (j + height < CHUNK_SIZE && !done) {
          for (let k = 0; k < width; k++) {
            if (mask[i + k + (j + height) * CHUNK_SIZE] !== blockType) {
              done = true;
              break;
            }
          }
          if (!done) height++;
        }
        
        quads.push({ x: i, y: j, z: depth, w: width, h: height, block: blockType });
        
        for (let dy = 0; dy < height; dy++) {
          for (let dx = 0; dx < width; dx++) {
            mask[i + dx + (j + dy) * CHUNK_SIZE] = 0;
          }
        }
        
        i += width;
      }
    }
  }
  
  return quads;
}
typescript

The mask is a 2D array for the current slice. Each cell contains the block type if that face should be drawn, or 0 if already processed or not visible. The algorithm finds the widest row of matching blocks starting at each position, then extends downward as far as possible while maintaining that width.

Naive vs Greedy Meshing Comparison

Naive: 128 triangles
Greedy: 12 triangles
91% reduction

The triangle savings are dramatic. A flat floor goes from 512 triangles (256 quads) to 2 triangles (1 quad). Real terrain with varied surfaces still sees 70-90% reduction.


Part III: Vertex Data

Each quad becomes four vertices and six indices (two triangles). What data does each vertex need?

Position

Three floats for world position. With chunk-local coordinates and a chunk offset uniform, you can use smaller integers:

const x = localX + chunkOffsetX * CHUNK_SIZE;
const y = localY + chunkOffsetY * CHUNK_SIZE;
const z = localZ + chunkOffsetZ * CHUNK_SIZE;
typescript

Normal

The face direction. Since voxel faces are always axis-aligned, you can encode the normal as a single integer (0-5) and decode in the shader:

const NORMALS = array<vec3f, 6>(
  vec3f(1.0, 0.0, 0.0),   // +X
  vec3f(-1.0, 0.0, 0.0),  // -X
  vec3f(0.0, 1.0, 0.0),   // +Y
  vec3f(0.0, -1.0, 0.0),  // -Y
  vec3f(0.0, 0.0, 1.0),   // +Z
  vec3f(0.0, 0.0, -1.0),  // -Z
);
wgsl

Block Type and Ambient Occlusion

The block type determines the color (or texture). AO is a lighting value we will compute shortly. Both can be packed into a single byte or two.

Packed Format

A minimal vertex format might be:

// 8 bytes per vertex
struct Vertex {
  x: u8,        // local X (0-31)
  y: u8,        // local Y (0-31)
  z: u8,        // local Z (0-31)
  normal: u8,   // face direction (0-5)
  blockType: u8,// which block
  ao: u8,       // ambient occlusion (0-3)
  u: u8,        // texture U
  v: u8,        // texture V
}
typescript

Eight bytes per vertex. A heavily detailed chunk might have 5000 vertices—40KB. Manageable.


Part IV: The Render Pipeline

Chunk Management

The engine maintains a set of active chunks around the player. As the player moves, distant chunks are unloaded and new ones are loaded. This happens on a worker thread to avoid stuttering.

class ChunkManager {
  private chunks = new Map<string, Chunk>();
  private meshes = new Map<string, GPUBuffer>();
  
  update(playerPosition: Vec3): void {
    const playerChunk = worldToChunk(playerPosition);
    
    for (let dx = -RENDER_DISTANCE; dx <= RENDER_DISTANCE; dx++) {
      for (let dy = -RENDER_DISTANCE; dy <= RENDER_DISTANCE; dy++) {
        for (let dz = -RENDER_DISTANCE; dz <= RENDER_DISTANCE; dz++) {
          const key = chunkKey(playerChunk.x + dx, playerChunk.y + dy, playerChunk.z + dz);
          if (!this.chunks.has(key)) {
            this.loadChunk(key);
          }
        }
      }
    }
    
    for (const key of this.chunks.keys()) {
      if (distanceToPlayer(key, playerChunk) > RENDER_DISTANCE + 1) {
        this.unloadChunk(key);
      }
    }
  }
}
typescript

Frustum Culling

Before drawing, test each chunk against the camera frustum. A chunk outside the view contributes nothing to the image—skip it entirely.

Interactive: Frustum Culling

Chunks inside the camera frustum (blue) are rendered. Chunks outside are culled.

The frustum is defined by six planes (near, far, left, right, top, bottom). A chunk is visible if its bounding box intersects the frustum. The test is conservative—some chunks that pass the test may still be fully occluded—but it is fast and eliminates most invisible geometry.

function isChunkVisible(chunk: Chunk, frustum: Frustum): boolean {
  const min = chunk.worldPosition;
  const max = vec3Add(min, vec3(CHUNK_SIZE, CHUNK_SIZE, CHUNK_SIZE));
  
  for (const plane of frustum.planes) {
    const px = plane.normal.x > 0 ? max.x : min.x;
    const py = plane.normal.y > 0 ? max.y : min.y;
    const pz = plane.normal.z > 0 ? max.z : min.z;
    
    if (vec3Dot(plane.normal, vec3(px, py, pz)) + plane.d < 0) {
      return false;
    }
  }
  
  return true;
}
typescript

Draw Calls

Each visible chunk with geometry becomes a draw call. With greedy meshing, vertex counts are reasonable, but draw call overhead can still hurt. Techniques to reduce draw calls include:

Batching: Combine multiple chunk meshes into a single buffer and draw with offsets.

Indirect Drawing: Use drawIndirect with a buffer populated by compute shaders. The GPU decides what to draw without CPU round-trips.

Instancing: For identical geometry (trees, decorations), draw many copies with a single call.

For a basic engine, one draw call per chunk is fine. Optimize when profiling reveals bottlenecks.


Part V: Ambient Occlusion

Ambient occlusion darkens corners and crevices where light would naturally be blocked. For voxels, we compute AO per vertex based on neighboring block occupancy.

The Eight Neighbors

Each vertex of a face is shared by up to eight voxels. The AO value depends on how many of those neighbors are solid:

  +---+---+
  | 2 | 1 |
  +---V---+   V = vertex in question
  | 3 | 0 |   Numbers = neighboring voxels
  +---+---+
plaintext

For a vertex on the top face of a block, the four relevant neighbors are the blocks to the side and diagonally above. If all four are empty, AO = 3 (brightest). If all four are solid, AO = 0 (darkest).

function computeAO(side1: boolean, side2: boolean, corner: boolean): number {
  if (side1 && side2) return 0;
  return 3 - (side1 ? 1 : 0) - (side2 ? 1 : 0) - (corner ? 1 : 0);
}
typescript

The formula handles a subtle case: if both side neighbors are solid, the corner cannot contribute light regardless of whether it is filled. This prevents light leaking through diagonal cracks.

Interactive: Ambient Occlusion

Toggle ambient occlusion to see how it adds depth at corners and edges.

AO values are stored per vertex and interpolated across the face. The GPU's rasterizer handles the interpolation automatically—you just output the AO value from the vertex shader.

@fragment
fn fragmentMain(@location(0) ao: f32, @location(1) blockType: u32) -> @location(0) vec4f {
  let baseColor = getBlockColor(blockType);
  let aoFactor = 0.5 + ao * 0.5 / 3.0;  // Range: 0.5 to 1.0
  return vec4f(baseColor.rgb * aoFactor, 1.0);
}
wgsl

The visual impact is substantial. Without AO, voxel scenes look flat and artificial. With AO, depth and structure emerge.


Part VI: Lighting

Beyond ambient occlusion, voxel engines typically implement two kinds of light: sunlight and block light.

Sunlight

Sunlight comes from above and attenuates as it passes through blocks. The simplest model: each column starts at maximum light, decreasing by one for each block of air below a solid block.

function computeSunlight(chunk: Chunk): void {
  for (let x = 0; x < CHUNK_SIZE; x++) {
    for (let z = 0; z < CHUNK_SIZE; z++) {
      let light = MAX_LIGHT;
      for (let y = CHUNK_SIZE - 1; y >= 0; y--) {
        if (chunk.getBlock(x, y, z) !== AIR) {
          light = 0;
        }
        chunk.setSunlight(x, y, z, light);
      }
    }
  }
}
typescript

Block Lights

Torches, lava, and glowing blocks emit light that spreads through air. This is a flood-fill problem: light propagates from sources, decreasing by one at each step, blocked by solid blocks.

Interactive: Light Propagation

8
Step 1 / 60

Light spreads from the source via BFS, decreasing by 1 each step. Solid blocks (gray) stop propagation.

The standard algorithm uses breadth-first search:

function propagateLight(chunk: Chunk, sources: Vec3[]): void {
  const queue: Array<{ pos: Vec3; light: number }> = [];
  
  for (const source of sources) {
    const light = getEmission(chunk.getBlock(source.x, source.y, source.z));
    chunk.setBlockLight(source.x, source.y, source.z, light);
    queue.push({ pos: source, light });
  }
  
  while (queue.length > 0) {
    const { pos, light } = queue.shift()!;
    if (light <= 1) continue;
    
    for (const dir of DIRECTIONS) {
      const neighbor = vec3Add(pos, dir);
      if (!inBounds(neighbor)) continue;
      if (chunk.getBlock(neighbor.x, neighbor.y, neighbor.z) !== AIR) continue;
      
      const newLight = light - 1;
      if (chunk.getBlockLight(neighbor.x, neighbor.y, neighbor.z) < newLight) {
        chunk.setBlockLight(neighbor.x, neighbor.y, neighbor.z, newLight);
        queue.push({ pos: neighbor, light: newLight });
      }
    }
  }
}
typescript

Light values are combined in the shader: finalLight = max(sunlight, blockLight). More sophisticated engines blend them or use colored light.

Light in the Mesh

Light values are baked into vertices during mesh generation. When light changes (a torch is placed or removed), affected chunks regenerate their meshes. This is why fast mesh generation matters.


Part VII: Optimization

A basic engine following the techniques above runs well for small worlds. Scaling up requires additional optimizations.

Vertex Compression

The 8-byte vertex format can shrink further. Position can pack into 15 bits (5 bits per axis for 0-31). Normal into 3 bits. Block type into 8-12 bits depending on block variety. AO into 2 bits. A 32-bit vertex is possible, halving memory bandwidth.

Level of Detail

Distant chunks do not need full detail. LOD systems render far chunks with simplified meshes—larger voxels, fewer triangles. The transition between LOD levels requires care to avoid visible seams.

Compute Shader Meshing

Moving mesh generation to the GPU eliminates the CPU bottleneck. A compute shader reads voxel data and writes vertices directly to GPU buffers. This is complex to implement but enables much larger view distances.

Occlusion Culling

Frustum culling misses chunks that are in view but hidden behind terrain. Hierarchical Z-buffer or software rasterization can identify occluded chunks, but the cost of the culling itself must be weighed against the savings.


Putting It Together

Here is a procedural voxel terrain rendered with the techniques from this page. Use WASD to explore the infinite world—new chunks generate as you move. Drag to look around, scroll to zoom.

Interactive: Voxel Terrain

Generating terrain...
0 triangles0 chunks grass stone snow sand waterWebGPU

WASD to move, drag to look around, scroll to zoom. New chunks generate as you explore.

The terrain combines multiple noise functions at different scales. A low-frequency continental layer provides broad elevation changes, a medium-frequency mountain layer adds peaks and valleys, and high-frequency detail adds local variation. Trees are placed using a separate noise function with distance constraints to prevent clustering.

Caves form where 3D Perlin noise exceeds a threshold, carving tunnels through solid rock. Water fills any air below a fixed sea level with semi-transparent rendering and subtle wave animation. Beaches appear where land meets water.

The chunk system demonstrates the core principle of voxel engines: as the camera moves, the engine generates terrain for nearby chunks and caches them for reuse. This allows infinite worlds without loading everything into memory at once. Terrain generation runs in a Web Worker to keep the main thread responsive while chunks load in the background. The mesh uses greedy meshing to reduce triangle count, and ambient occlusion combined with sunlight propagation adds depth to the scene.


Bonus: Meshing Playground

Try it yourself. Click cells below to toggle voxels on and off. Watch how greedy meshing merges adjacent faces into larger quads, and compare the triangle count between naive (one quad per face) and greedy approaches.

Interactive: Greedy Meshing Playground

Naive: 608 tris (152 quads)Greedy: 56 tris (14 quads)Savings: 91%WebGPU

Toggle between naive and greedy meshing to see the difference. Each color represents a merged quad. Drag to rotate, scroll to zoom.

The algorithm scans each slice of the volume, looking for runs of identical faces that can merge. A 4x4 wall that would need 32 triangles with naive meshing becomes just 2 triangles with greedy meshing. The savings compound quickly in real terrain.


Key Takeaways

  • Voxels represent volume, not surface. The rendering system derives surfaces from the volume data.
  • Chunk-based storage enables streaming large worlds and localized updates.
  • Greedy meshing dramatically reduces triangle count by merging coplanar faces.
  • Per-vertex ambient occlusion adds depth with minimal cost.
  • Light propagates via flood-fill, baked into mesh vertices.
  • Frustum culling skips invisible chunks; further optimizations target draw calls and memory bandwidth.

From here, you might explore octree compression for sparse worlds, GPU-driven rendering with indirect draw calls, or multiplayer synchronization. Each direction has its own set of techniques, but the foundation remains the same: efficient data structures and intelligent mesh generation.