3D Transformations

Model, view, and projection matrices

The Journey of a Vertex

Every vertex in a 3D scene takes a journey. It starts as a position in model space, relative to an object's own origin. By the time it appears on your screen, it has been transformed through multiple coordinate systems, each serving a different purpose.

Understanding this pipeline is fundamental to 3D graphics. When something looks wrong—an object appears in the wrong place, the perspective is off, the scene is inside out—the problem almost always traces back to a transformation matrix. Knowing how the pipeline works lets you debug visually and reason about what each matrix does.

The Transformation Pipeline

A vertex passes through five distinct coordinate spaces:

Model Space (or Object Space) is where vertices are defined relative to the object's center. A cube's vertices might range from -1 to 1 on each axis, centered at the origin.

World Space is where all objects exist together. The model matrix positions each object in the world—translating, rotating, and scaling it.

View Space (or Camera Space) is the world from the camera's perspective. The view matrix repositions everything so the camera sits at the origin, looking down the negative Z axis.

Clip Space is where the projection matrix maps the 3D world into a normalized volume. Anything outside this volume gets clipped away.

NDC (Normalized Device Coordinates) comes after perspective division. Coordinates now range from -1 to 1, ready for the final mapping to screen pixels.

Interactive: The Transformation Pipeline

Model X2.0

Model Y1.0

Rotation0°

Model Space

Vertex defined relative to object center

vertex = (0.50, 0.50, 0.50)

Watch a vertex travel through each space. In model space, it is just a point on a cube. The model matrix places that cube in the world. The view matrix shifts the world so the camera is at the origin. The projection matrix warps space to create depth perception. Finally, perspective division normalizes everything into the canonical viewing volume.

The Model Matrix

The model matrix transforms vertices from model space to world space. It encodes an object's position, orientation, and scale in the world.

Building a model matrix typically combines three transformations:

fn modelMatrix(translation: vec3f, rotation: vec3f, scale: vec3f) -> mat4x4f {
    let t = translationMatrix(translation);
    let r = rotationMatrix(rotation);
    let s = scaleMatrix(scale);
    return t * r * s; // Order matters: scale, then rotate, then translate
}

wgsl

Order matters because matrix multiplication is not commutative. The rightmost matrix applies first. So t * r * s scales the object, rotates it, then translates it into position. Reversing the order would produce a completely different result.

A simple translation matrix shifts every point by a fixed offset:

T = \begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}

Multiplying a position (x, y, z, 1) by this matrix yields (x + tx, y + ty, z + tz, 1). The fourth column slides the point through space.

Rotation and scale matrices work similarly, encoding their operations in the upper-left 3×3 block of the matrix.

The View Matrix

The view matrix transforms from world space to camera space. Think of it as positioning the entire world relative to the camera, rather than positioning the camera in the world.

The classic construction is the lookAt function. Given the camera's position (eye), what it is looking at (target), and which way is up, it builds the view matrix:

fn lookAt(eye: vec3f, target: vec3f, up: vec3f) -> mat4x4f {
    let forward = normalize(target - eye);
    let right = normalize(cross(forward, up));
    let camUp = cross(right, forward);
    
    return mat4x4f(
        vec4f(right.x, camUp.x, -forward.x, 0.0),
        vec4f(right.y, camUp.y, -forward.y, 0.0),
        vec4f(right.z, camUp.z, -forward.z, 0.0),
        vec4f(-dot(right, eye), -dot(camUp, eye), dot(forward, eye), 1.0)
    );
}

wgsl

The view matrix does two things: it rotates the world so the camera's forward direction aligns with the negative Z axis, then translates so the camera sits at the origin. In camera space, the camera always looks down -Z. Objects in front of the camera have negative Z coordinates.

The Projection Matrix

The projection matrix transforms from camera space to clip space. This is where the 3D world gets mapped onto a 2D surface, creating the illusion of depth.

Perspective projection mimics how human vision works. Distant objects appear smaller. Parallel lines converge toward vanishing points. The field of view determines how wide the camera sees.

fn perspectiveMatrix(fov: f32, aspect: f32, near: f32, far: f32) -> mat4x4f {
    let f = 1.0 / tan(fov * 0.5);
    return mat4x4f(
        vec4f(f / aspect, 0.0, 0.0, 0.0),
        vec4f(0.0, f, 0.0, 0.0),
        vec4f(0.0, 0.0, far / (near - far), -1.0),
        vec4f(0.0, 0.0, near * far / (near - far), 0.0)
    );
}

wgsl

The field of view (fov) is the vertical angle the camera sees, in radians. Aspect ratio is width divided by height. Near and far define the clipping planes—anything closer than near or farther than far gets clipped away.

Orthographic projection preserves parallel lines. Objects stay the same size regardless of distance. This is useful for 2D games, UI rendering, CAD applications, and any situation where you want to show true proportions.

Interactive: Perspective vs Orthographic

Field of View60°

Perspective: Distant cubes appear smaller. Parallel edges converge toward vanishing points.

Toggle between projection types and notice how the cubes behave. With perspective, distant cubes shrink and parallel edges converge. With orthographic, everything maintains its actual size and parallel lines stay parallel.

The MVP Matrix

Rather than transforming vertices through multiple matrices separately, we combine them into a single MVP matrix (Model-View-Projection):

let mvp = projection * view * model;
let clipPos = mvp * vec4f(vertexPosition, 1.0);

wgsl

Multiplying the three matrices produces one matrix that does all transformations in a single step. This is more efficient—one matrix multiply per vertex instead of three.

The order is critical. Reading right to left: the model matrix applies first (transforming to world space), then the view matrix (transforming to camera space), then the projection matrix (transforming to clip space).

Interactive: Matrix Stack

Model X2.0

Model Y0.0

Model Rotation0°

Camera Z5.0

P (Projection)

0.10

0.00

0.10

0.00

-1.00

-0.20

0.00

-1.00

0.00

V (View)

1.00

0.00

0.93

-0.37

0.00

0.37

0.93

0.00

1.00

M (Model)

1.00

0.00

2.00

0.00

1.00

0.00

1.00

0.00

1.00

= MVP Matrix

0.10

0.00

0.20

0.00

0.09

-0.04

0.00

-0.37

-0.93

-0.20

0.00

-0.37

-0.93

0.00

Adjust each transformation and watch how M, V, and P combine. The sliders control model position and rotation, camera position, and projection parameters. The final MVP matrix shown at the bottom encapsulates all these transformations.

Homogeneous Coordinates

You may have noticed we use vec4 for positions and mat4x4 for transformations, even though we are working in 3D. The fourth component, called w, enables something ordinary 3D vectors cannot do: translation via matrix multiplication.

In standard 3D, a 3×3 matrix can rotate and scale, but not translate. Translation requires adding an offset. Homogeneous coordinates solve this by adding a fourth dimension.

A point (x, y, z) becomes (x, y, z, 1) in homogeneous coordinates. A direction (x, y, z) becomes (x, y, z, 0). The w component distinguishes positions from directions:

When w = 1, the 4×4 matrix can translate the point using its fourth column. When w = 0, translation has no effect—directions do not have positions to translate.

After projection, the w component is no longer 1. Perspective projection places depth information in w. To get back to 3D coordinates, we perform perspective division, dividing x, y, and z by w:

// After projection, clipPos.w contains depth info
let ndc = clipPos.xyz / clipPos.w;

wgsl

This division is what makes distant objects appear smaller. Points farther from the camera have larger w values, so dividing by w shrinks their x and y coordinates.

Interactive: Homogeneous Coordinates

x2.0

y1.0

w1.0

Homogeneous:(2, 1, w=1)

→

Projected:(2.00, 1.00)

w = 1: No change from division

The visualization shows how w affects the final position. When w = 1, the homogeneous point maps directly to its 3D position. When w differs from 1, perspective division scales the point—larger w makes it appear closer to the center, smaller w pushes it outward.

Implementing in WGSL

Here is a minimal vertex shader using the full transformation pipeline:

struct Uniforms {
    model: mat4x4f,
    view: mat4x4f,
    projection: mat4x4f,
}
 
@group(0) @binding(0) var<uniform> uniforms: Uniforms;
 
struct VertexInput {
    @location(0) position: vec3f,
}
 
struct VertexOutput {
    @builtin(position) clipPosition: vec4f,
}
 
@vertex
fn main(input: VertexInput) -> VertexOutput {
    var output: VertexOutput;
    let worldPos = uniforms.model * vec4f(input.position, 1.0);
    let viewPos = uniforms.view * worldPos;
    output.clipPosition = uniforms.projection * viewPos;
    return output;
}

wgsl

Alternatively, precompute the MVP matrix on the CPU and send just that:

struct Uniforms {
    mvp: mat4x4f,
}
 
@group(0) @binding(0) var<uniform> uniforms: Uniforms;
 
@vertex
fn main(@location(0) position: vec3f) -> @builtin(position) vec4f {
    return uniforms.mvp * vec4f(position, 1.0);
}

wgsl

The second approach is more efficient when you only need the final clip position. The first approach is useful when you need intermediate values, like world position for lighting calculations.

Key Takeaways

Vertices transform through model → world → view → clip → NDC → screen
The model matrix positions, rotates, and scales objects in the world
The view matrix positions the world relative to the camera using lookAt
Perspective projection creates depth; orthographic preserves proportions
Combine matrices into MVP (projection × view × model) for efficiency
Homogeneous coordinates (vec4, mat4x4) enable translation via matrix multiplication
Perspective division (divide by w) maps from clip space to NDC