Singular Value Decomposition

The universal matrix factorization

What Every Matrix Does

The Singular Value Decomposition (SVD) reveals a simple truth: every matrix, no matter its shape or entries, can be decomposed into three operations: a rotation (or reflection), a scaling along perpendicular axes, and another rotation.

A=UΣVTA = U\Sigma V^T

Here VTV^T rotates the input, Σ\Sigma scales along coordinate axes, and UU rotates the output. The diagonal entries of Σ\Sigma are the singular values.

Interactive: Any matrix transforms the unit circle into an ellipse

30
2.0
0.8

Any matrix transforms the unit circle into an ellipse. The singular values are the semi-axis lengths; the singular vectors point along the axes.

The singular values are the lengths of the ellipse's semi-axes. The singular vectors point along those axes. Every matrix, even non-square ones, has this geometric interpretation.

The Three Steps

Let's trace through what SVD does step by step. Starting from the unit circle:

  1. VTV^T rotates to align with the "input directions" that will be stretched
  2. Σ\Sigma scales each axis by its singular value
  3. UU rotates to the final "output directions"

Interactive: Watch the three steps of SVD

Start with the unit circle.

A=UΣVTA = U\Sigma V^T

The matrices UU and VV are orthogonal—their columns form orthonormal bases. Only Σ\Sigma changes the shape; the rotations preserve it.

Low-Rank Approximation

One of SVD's most powerful applications is approximation. By keeping only the largest singular values and setting the rest to zero, we get the best rank-k approximation of the original matrix (in the least-squares sense).

Aσ1u1v1T+σ2u2v2T++σkukvkTA \approx \sigma_1 \vec{u}_1 \vec{v}_1^T + \sigma_2 \vec{u}_2 \vec{v}_2^T + \cdots + \sigma_k \vec{u}_k \vec{v}_k^T

The singular values tell you how "important" each component is. Often, the first few singular values capture most of the matrix's structure, while the smaller ones represent noise or fine details.

Interactive: Data approximation by rank

Gray: original data. Blue: same data (full rank).

Image Compression

Images are matrices of pixel values. SVD compression stores only the top singular values and their associated vectors instead of every pixel. For a rank-k approximation of an m×n image, you store k(m + n + 1) numbers instead of mn.

Interactive: Image compression via low-rank approximation

Original
Compressed
100
Effective Rank
16 / 16
Storage
200%

SVD enables lossy compression by keeping only the largest singular values. Lower rank means smaller storage but more approximation error.

This is the principle behind various compression and dimensionality reduction techniques. When singular values drop off rapidly, dramatic compression is possible with minimal quality loss.

Connection to Eigenvalues

SVD is intimately connected to eigendecomposition. The singular values of AA are the square roots of the eigenvalues of ATAA^TA (or equivalently, of AATAA^T).

σi=λi(ATA)\sigma_i = \sqrt{\lambda_i(A^TA)}

Interactive: Singular values from eigenvalues of A^T A

A=[2101]A = \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}
Singular values: σ₁ = 2.29, σ₂ = 0.87

The singular values of A are the square roots of the eigenvalues of A^T A. The eigenvectors of A^T A are the right singular vectors (V).

The right singular vectors (columns of VV) are eigenvectors of ATAA^TA. The left singular vectors (columns of UU) are eigenvectors of AATAA^T.

Why SVD Matters

SVD is everywhere in applied mathematics:

  • Principal Component Analysis (PCA): The principal components are the right singular vectors of the centered data matrix
  • Least Squares: The pseudoinverse used for overdetermined systems comes from SVD
  • Recommender Systems: Matrix factorization for Netflix-style recommendations uses truncated SVD
  • Numerical Stability: The condition number (ratio of largest to smallest singular value) measures sensitivity to errors
  • Latent Semantic Analysis: Finding hidden structure in text data

Unlike eigendecomposition, SVD exists for any matrix—not just square ones, and without requiring special properties like symmetry.

Key Takeaways

  • Every matrix decomposes as A=UΣVTA = U\Sigma V^T: rotate, scale along axes, rotate again
  • Singular values (diagonal of Σ\Sigma) measure importance of each direction
  • UU and VV are orthogonal matrices—their columns are orthonormal
  • Low-rank approximation keeps only the top singular values, enabling compression
  • Singular values are square roots of eigenvalues of ATAA^TA
  • SVD works for any matrix, including non-square ones
  • Applications: PCA, least squares, compression, recommender systems, and more