Singular Value Decomposition

The universal matrix factorization

What Every Matrix Does

The Singular Value Decomposition (SVD) reveals a simple truth: every matrix, no matter its shape or entries, can be decomposed into three operations: a rotation (or reflection), a scaling along perpendicular axes, and another rotation.

A = U\Sigma V^T

Here $V^T$ rotates the input, $\Sigma$ scales along coordinate axes, and $U$ rotates the output. The diagonal entries of $\Sigma$ are the singular values.

Interactive: Any matrix transforms the unit circle into an ellipse

Rotation30

σ₁2.0

σ₂0.8

Any matrix transforms the unit circle into an ellipse. The singular values are the semi-axis lengths; the singular vectors point along the axes.

The singular values are the lengths of the ellipse's semi-axes. The singular vectors point along those axes. Every matrix, even non-square ones, has this geometric interpretation.

The Three Steps

Let's trace through what SVD does step by step. Starting from the unit circle:

$V^T$ rotates to align with the "input directions" that will be stretched
$\Sigma$ scales each axis by its singular value
$U$ rotates to the final "output directions"

Interactive: Watch the three steps of SVD

Start with the unit circle.

A = U\Sigma V^T

The matrices $U$ and $V$ are orthogonal—their columns form orthonormal bases. Only $\Sigma$ changes the shape; the rotations preserve it.

Low-Rank Approximation

One of SVD's most powerful applications is approximation. By keeping only the largest singular values and setting the rest to zero, we get the best rank-k approximation of the original matrix (in the least-squares sense).

A \approx \sigma_1 \vec{u}_1 \vec{v}_1^T + \sigma_2 \vec{u}_2 \vec{v}_2^T + \cdots + \sigma_k \vec{u}_k \vec{v}_k^T

The singular values tell you how "important" each component is. Often, the first few singular values capture most of the matrix's structure, while the smaller ones represent noise or fine details.

Interactive: Data approximation by rank

Gray: original data. Blue: same data (full rank).

Image Compression

Images are matrices of pixel values. SVD compression stores only the top singular values and their associated vectors instead of every pixel. For a rank-k approximation of an m×n image, you store k(m + n + 1) numbers instead of mn.

Interactive: Image compression via low-rank approximation

Original

Compressed

Quality100

Effective Rank

16 / 16

Storage

200%

SVD enables lossy compression by keeping only the largest singular values. Lower rank means smaller storage but more approximation error.

This is the principle behind various compression and dimensionality reduction techniques. When singular values drop off rapidly, dramatic compression is possible with minimal quality loss.

Connection to Eigenvalues

SVD is intimately connected to eigendecomposition. The singular values of $A$ are the square roots of the eigenvalues of $A^TA$ (or equivalently, of $AA^T$ ).

\sigma_i = \sqrt{\lambda_i(A^TA)}

Interactive: Singular values from eigenvalues of A^T A

A = \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}

Singular values: σ₁ = 2.29, σ₂ = 0.87

The singular values of A are the square roots of the eigenvalues of A^T A. The eigenvectors of A^T A are the right singular vectors (V).

The right singular vectors (columns of $V$ ) are eigenvectors of $A^TA$ . The left singular vectors (columns of $U$ ) are eigenvectors of $AA^T$ .

Why SVD Matters

SVD is everywhere in applied mathematics:

Principal Component Analysis (PCA): The principal components are the right singular vectors of the centered data matrix
Least Squares: The pseudoinverse used for overdetermined systems comes from SVD
Recommender Systems: Matrix factorization for Netflix-style recommendations uses truncated SVD
Numerical Stability: The condition number (ratio of largest to smallest singular value) measures sensitivity to errors
Latent Semantic Analysis: Finding hidden structure in text data

Unlike eigendecomposition, SVD exists for any matrix—not just square ones, and without requiring special properties like symmetry.

Key Takeaways

Every matrix decomposes as $A = U\Sigma V^T$ : rotate, scale along axes, rotate again
Singular values (diagonal of $\Sigma$ ) measure importance of each direction
$U$ and $V$ are orthogonal matrices—their columns are orthonormal
Low-rank approximation keeps only the top singular values, enabling compression
Singular values are square roots of eigenvalues of $A^TA$
SVD works for any matrix, including non-square ones
Applications: PCA, least squares, compression, recommender systems, and more