Singular Value Decomposition
The universal matrix factorization
What Every Matrix Does
The Singular Value Decomposition (SVD) reveals a simple truth: every matrix, no matter its shape or entries, can be decomposed into three operations: a rotation (or reflection), a scaling along perpendicular axes, and another rotation.
Here rotates the input, scales along coordinate axes, and rotates the output. The diagonal entries of are the singular values.
Interactive: Any matrix transforms the unit circle into an ellipse
Any matrix transforms the unit circle into an ellipse. The singular values are the semi-axis lengths; the singular vectors point along the axes.
The singular values are the lengths of the ellipse's semi-axes. The singular vectors point along those axes. Every matrix, even non-square ones, has this geometric interpretation.
The Three Steps
Let's trace through what SVD does step by step. Starting from the unit circle:
- rotates to align with the "input directions" that will be stretched
- scales each axis by its singular value
- rotates to the final "output directions"
Interactive: Watch the three steps of SVD
Start with the unit circle.
The matrices and are orthogonal—their columns form orthonormal bases. Only changes the shape; the rotations preserve it.
Low-Rank Approximation
One of SVD's most powerful applications is approximation. By keeping only the largest singular values and setting the rest to zero, we get the best rank-k approximation of the original matrix (in the least-squares sense).
The singular values tell you how "important" each component is. Often, the first few singular values capture most of the matrix's structure, while the smaller ones represent noise or fine details.
Interactive: Data approximation by rank
Gray: original data. Blue: same data (full rank).
Image Compression
Images are matrices of pixel values. SVD compression stores only the top singular values and their associated vectors instead of every pixel. For a rank-k approximation of an m×n image, you store k(m + n + 1) numbers instead of mn.
Interactive: Image compression via low-rank approximation
SVD enables lossy compression by keeping only the largest singular values. Lower rank means smaller storage but more approximation error.
This is the principle behind various compression and dimensionality reduction techniques. When singular values drop off rapidly, dramatic compression is possible with minimal quality loss.
Connection to Eigenvalues
SVD is intimately connected to eigendecomposition. The singular values of are the square roots of the eigenvalues of (or equivalently, of ).
Interactive: Singular values from eigenvalues of A^T A
The singular values of A are the square roots of the eigenvalues of A^T A. The eigenvectors of A^T A are the right singular vectors (V).
The right singular vectors (columns of ) are eigenvectors of . The left singular vectors (columns of ) are eigenvectors of .
Why SVD Matters
SVD is everywhere in applied mathematics:
- Principal Component Analysis (PCA): The principal components are the right singular vectors of the centered data matrix
- Least Squares: The pseudoinverse used for overdetermined systems comes from SVD
- Recommender Systems: Matrix factorization for Netflix-style recommendations uses truncated SVD
- Numerical Stability: The condition number (ratio of largest to smallest singular value) measures sensitivity to errors
- Latent Semantic Analysis: Finding hidden structure in text data
Unlike eigendecomposition, SVD exists for any matrix—not just square ones, and without requiring special properties like symmetry.
Key Takeaways
- Every matrix decomposes as : rotate, scale along axes, rotate again
- Singular values (diagonal of ) measure importance of each direction
- and are orthogonal matrices—their columns are orthonormal
- Low-rank approximation keeps only the top singular values, enabling compression
- Singular values are square roots of eigenvalues of
- SVD works for any matrix, including non-square ones
- Applications: PCA, least squares, compression, recommender systems, and more