Least Squares
When there is no exact solution, find the best approximation
When Ax = b Has No Solution
Throughout linear algebra, we have been solving . But what if no solution exists? This happens when does not lie in the column space of .
The most common case is an overdetermined system: more equations than unknowns. You cannot draw a straight line through three random points—you have 3 equations but only 2 parameters (slope and intercept).
Overdetermined: Three points, two parameters
Three points, but we want a line y = mx + b (only 2 parameters). No single line passes through all three—the system is overdetermined.
When no exact solution exists, we settle for the next best thing: find the that makes as close as possible to .
Minimizing the Error
The least squares solution minimizes the squared length of the residual vector:
Why squared? Because the squared norm is differentiable and has a unique minimum. It also heavily penalizes large errors, which is often desirable.
Interactive: Drag points to see the best-fit line
Drag the points to see how the best-fit line adjusts. The red dashed lines show the residuals—the errors we are minimizing.
The red lines show the residuals—the vertical distances from points to the line. Least squares finds the line that minimizes the sum of squared residuals.
The Geometric View: Projection
Here is the key insight: since must lie in the column space of , the closest we can get to is its orthogonal projection onto that column space.
The residual must be perpendicular to the column space. This means it is perpendicular to every column of :
3D view: Project b onto the column space
The column space (blue plane) contains all possible Ax. The least squares solution finds the point in this plane closest to b—its orthogonal projection.
The green point is the projection of onto the column space (the blue plane). The orange vector is the residual, which is perpendicular to the plane.
The Normal Equations
Rearranging gives the normal equations:
If is invertible (which happens when has full column rank), the solution is:
The matrix is called the pseudoinverse of . It generalizes the inverse to non-square matrices.
Interactive: Solve the normal equations
The solution x̂ minimizes ||Ax - b||. The residual (red dashed line) is perpendicular to the column space.
Connection to Orthogonality
The normal equations encode the orthogonality condition. Multiplying by projects the residual onto the column space. When that projection is zero, the residual is perpendicular to the column space.
This connects back to our study of orthogonal projections. The projection matrix onto the column space of is:
Applying to gives the projected point —the closest point in the column space.
Applications
Least squares is one of the most widely used techniques in applied mathematics:
Applications of least squares
Linear Regression
Fit a line (or hyperplane) to noisy data. Each data point gives one equation, but we have fewer parameters than points.
Predicting house prices from square footage, bedrooms, etc.
Whenever you have more measurements than parameters and want to find the best fit, least squares provides the answer. It is the mathematical foundation of regression, filtering, and estimation.
Numerical Considerations
While the normal equations are elegant, directly computing can be numerically unstable. In practice, better methods exist:
- QR decomposition: Decompose A = QR, then solve Rx̂ = Q^T b
- SVD: Use the pseudoinverse from the singular value decomposition
These methods are more stable when is nearly rank-deficient or ill-conditioned.
Key Takeaways
- When has no solution, we minimize
- The least squares solution is the projection of onto the column space of
- The residual is perpendicular to the column space
- The normal equations:
- Solution: (when full column rank)
- Applications: regression, signal processing, GPS, computer vision, and countless other fields