Chapter 51. Orthogonal Projections
Chapter 51. Orthogonal Projections
An orthogonal projection is the operation of replacing a vector by its closest vector in a chosen subspace. It is the precise linear algebra version of dropping a perpendicular from a point to a line, plane, or higher-dimensional subspace.
If (W) is a subspace of an inner product space (V), then every vector (v) can often be decomposed into two parts:
$$ v = w + r, $$
where
$$ w \in W, \qquad r \in W^\perp. $$
The vector (w) is the orthogonal projection of (v) onto (W). The vector (r) is the residual. In finite-dimensional inner product spaces, this decomposition exists and is unique for every subspace (W). The projected vector is the closest vector in (W) to the original vector.
51.1 Projection onto a Line
Let (u) be a nonzero vector in an inner product space (V). The line generated by (u) is
$$ L = \operatorname{span}{u}. $$
The projection of (v) onto (L) is the vector in the direction of (u) closest to (v). It has the form
$$ p = cu $$
for some scalar (c).
The residual is
$$ r = v - cu. $$
For (cu) to be the orthogonal projection, the residual must be orthogonal to the line. Since the line is spanned by (u), it is enough to require
$$ \langle v-cu,u\rangle = 0. $$
Using linearity,
$$ \langle v,u\rangle - c\langle u,u\rangle = 0. $$
Therefore
$$ c = \frac{\langle v,u\rangle}{\langle u,u\rangle}. $$
So the projection is
$$ \operatorname{proj}_u(v) = \frac{\langle v,u\rangle}{\langle u,u\rangle}u. $$
This formula is valid whenever (u\ne 0).
51.2 Projection onto a Unit Vector
If (q) is a unit vector, then
$$ \langle q,q\rangle = 1. $$
The projection formula simplifies to
$$ \operatorname{proj}_q(v) = \langle v,q\rangle q. $$
This is the simplest projection formula. The scalar
$$ \langle v,q\rangle $$
is the coordinate of (v) in the direction (q). The vector
$$ \langle v,q\rangle q $$
is the component of (v) along (q).
For example, let
$$ v = \begin{bmatrix} 3\ 4 \end{bmatrix}, \qquad q = \begin{bmatrix} 1\ 0 \end{bmatrix}. $$
Then
$$ \langle v,q\rangle = 3, $$
so
$$ \operatorname{proj}_q(v) = 3 \begin{bmatrix} 1\ 0 \end{bmatrix} = \begin{bmatrix} 3\ 0 \end{bmatrix}. $$
The projection keeps the horizontal component and removes the vertical component.
51.3 The Residual
The residual of (v) after projection onto a subspace (W) is
$$ r = v - \operatorname{proj}_W(v). $$
The defining property of orthogonal projection is
$$ r \in W^\perp. $$
Equivalently,
$$ \langle r,w\rangle = 0 $$
for every (w\in W).
Thus projection separates a vector into an explained part and an unexplained part:
$$ v = \operatorname{proj}_W(v) + r, $$
where
$$ \operatorname{proj}_W(v)\in W, \qquad r\in W^\perp. $$
This is the orthogonal decomposition of (v) with respect to (W).
51.4 Projection onto an Orthonormal Basis
Let (W) be a subspace with orthonormal basis
$$ q_1,q_2,\ldots,q_k. $$
The projection of (v) onto (W) is
$$ \operatorname{proj}W(v) = \sum{j=1}^k \langle v,q_j\rangle q_j. $$
This formula follows from the coordinate formula for orthonormal bases. The projected vector lies in (W), and the residual is orthogonal to every (q_j).
Indeed, let
$$ p = \sum_{j=1}^k \langle v,q_j\rangle q_j. $$
Then for each (i),
$$ \langle v-p,q_i\rangle = \langle v,q_i\rangle - \left\langle \sum_{j=1}^k \langle v,q_j\rangle q_j, q_i \right\rangle. $$
Using orthonormality,
$$ \left\langle \sum_{j=1}^k \langle v,q_j\rangle q_j, q_i \right\rangle = \langle v,q_i\rangle. $$
Hence
$$ \langle v-p,q_i\rangle = 0. $$
Since the (q_i) span (W), the residual is orthogonal to all of (W).
51.5 Projection Matrix for an Orthonormal Basis
Let (Q) be the matrix whose columns are the orthonormal vectors
$$ q_1,\ldots,q_k. $$
Then
$$ Q^TQ=I_k. $$
The projection of (v) onto (\operatorname{Col}(Q)) is
$$ p = QQ^T v. $$
Thus the projection matrix is
$$ P = QQ^T. $$
This formula is important because it expresses projection as matrix multiplication.
The matrix (P) satisfies
$$ P^2=P. $$
Indeed,
$$ P^2 = (QQ^T)(QQ^T) = Q(Q^TQ)Q^T = QIQ^T = QQ^T = P. $$
It also satisfies
$$ P^T=P. $$
Therefore (P) is symmetric and idempotent. A real matrix with these two properties is an orthogonal projection matrix.
51.6 Idempotence
A projection is idempotent. This means that applying it twice gives the same result as applying it once:
$$ P^2=P. $$
The reason is geometric. Once a vector has been projected onto a subspace, projecting it onto the same subspace again changes nothing.
If
$$ p = P v $$
and (p\in W), then
$$ Pp=p. $$
Therefore
$$ P(Pv)=Pv. $$
In matrix form,
$$ P^2v=Pv $$
for every vector (v), so
$$ P^2=P. $$
Idempotence is the algebraic signature of projection. General projections are idempotent linear maps, while orthogonal projections also respect the inner product geometry.
51.7 Symmetry
A real projection matrix (P) is an orthogonal projection matrix precisely when it is both idempotent and symmetric:
$$ P^2=P, \qquad P^T=P. $$
The idempotent condition says that (P) is a projection. The symmetry condition says that the projection is orthogonal rather than oblique.
For complex matrices, symmetry is replaced by self-adjointness:
$$ P^2=P, \qquad P^*=P. $$
Here (P^*) denotes the conjugate transpose.
Orthogonal projection matrices preserve the perpendicular relationship between the range and the residual. If (p=Pv), then
$$ v-p \in \operatorname{Range}(P)^\perp. $$
This is the geometric content of the symmetry condition.
51.8 Projection onto a Column Space
Let (A) be an (m\times n) real matrix with linearly independent columns. We want the projection of (b\in\mathbb{R}^m) onto
$$ \operatorname{Col}(A). $$
The projected vector has the form
$$ p=A\hat{x} $$
for some (\hat{x}\in\mathbb{R}^n).
The residual is
$$ r=b-A\hat{x}. $$
For (p) to be the orthogonal projection, the residual must be orthogonal to every column of (A). This condition is
$$ A^T(b-A\hat{x})=0. $$
Rearranging gives the normal equations:
$$ A^TA\hat{x}=A^Tb. $$
Since the columns of (A) are linearly independent, (A^TA) is invertible. Thus
$$ \hat{x}=(A^TA)^{-1}A^Tb. $$
Therefore
$$ p=A(A^TA)^{-1}A^Tb. $$
The projection matrix onto (\operatorname{Col}(A)) is
$$ P=A(A^TA)^{-1}A^T. $$
51.9 Why (A^TA) Is Invertible
Assume the columns of (A) are linearly independent. Then (A^TA) is invertible.
To see this, suppose
$$ A^TAx=0. $$
Multiply on the left by (x^T):
$$ x^TA^TAx=0. $$
But
$$ x^TA^TAx = (Ax)^T(Ax)=|Ax|^2. $$
Hence
$$ |Ax|^2=0. $$
Therefore
$$ Ax=0. $$
Since the columns of (A) are linearly independent, the null space of (A) is trivial. Thus
$$ x=0. $$
So the null space of (A^TA) is trivial, and (A^TA) is invertible.
51.10 Projection Matrix onto a Column Space
For a full-column-rank matrix (A), the projection matrix
$$ P=A(A^TA)^{-1}A^T $$
has two key properties.
First,
$$ P^2=P. $$
Indeed,
$$ P^2 = A(A^TA)^{-1}A^T A(A^TA)^{-1}A^T. $$
Since
$$ (A^TA)^{-1}A^TA(A^TA)^{-1} = (A^TA)^{-1}, $$
we get
$$ P^2 = A(A^TA)^{-1}A^T = P. $$
Second,
$$ P^T=P. $$
This follows because (A^TA) is symmetric, so its inverse is symmetric:
$$ P^T = \left(A(A^TA)^{-1}A^T\right)^T = A(A^TA)^{-1}A^T = P. $$
Thus (P) is an orthogonal projection matrix.
51.11 Closest Vector Property
Orthogonal projection gives the closest vector in a subspace.
Let (W) be a finite-dimensional subspace of an inner product space (V). Let
$$ p=\operatorname{proj}_W(v), \qquad r=v-p. $$
Then
$$ p\in W, \qquad r\in W^\perp. $$
For any other vector (w\in W),
$$ v-w = (v-p)+(p-w). $$
Here
$$ v-p \in W^\perp, $$
and
$$ p-w\in W. $$
Therefore the two vectors (v-p) and (p-w) are orthogonal. By the Pythagorean theorem,
$$ |v-w|^2 = |v-p|^2+|p-w|^2. $$
Since
$$ |p-w|^2\ge 0, $$
we have
$$ |v-w|^2\ge |v-p|^2. $$
Thus
$$ |v-w|\ge |v-p|. $$
So (p) is the closest vector in (W) to (v). Equality occurs only when (w=p). This is the best approximation property of orthogonal projection.
51.12 Distance to a Subspace
The distance from (v) to a subspace (W) is
$$ \operatorname{dist}(v,W) = \inf_{w\in W}|v-w|. $$
When (W) is finite-dimensional, this infimum is attained by the orthogonal projection:
$$ \operatorname{dist}(v,W) = |v-\operatorname{proj}_W(v)|. $$
If (p=\operatorname{proj}_W(v)), then
$$ \operatorname{dist}(v,W)=|v-p|. $$
The residual is therefore the shortest error vector. It measures exactly how far (v) lies from the subspace.
51.13 Least Squares
Orthogonal projection is the geometric core of least squares.
Consider an inconsistent system
$$ Ax=b. $$
If (b\notin \operatorname{Col}(A)), there is no exact solution. Instead, we seek (\hat{x}) such that
$$ A\hat{x} $$
is as close as possible to (b). This means minimizing
$$ |b-Ax|_2. $$
The closest vector in (\operatorname{Col}(A)) is the orthogonal projection of (b) onto (\operatorname{Col}(A)). Therefore
$$ A\hat{x} = \operatorname{proj}_{\operatorname{Col}(A)}(b). $$
The residual
$$ r=b-A\hat{x} $$
must be orthogonal to (\operatorname{Col}(A)). Hence
$$ A^Tr=0. $$
Substituting (r=b-A\hat{x}) gives
$$ A^T(b-A\hat{x})=0, $$
or
$$ A^TA\hat{x}=A^Tb. $$
These are the normal equations.
51.14 Example: Projection onto a Line in (\mathbb{R}^2)
Let
$$ u= \begin{bmatrix} 1\ 2 \end{bmatrix}, \qquad v= \begin{bmatrix} 3\ 1 \end{bmatrix}. $$
The projection of (v) onto (\operatorname{span}{u}) is
$$ p= \frac{v^Tu}{u^Tu}u. $$
Compute
$$ v^Tu = 3\cdot 1 + 1\cdot 2 = 5, $$
and
$$ u^Tu = 1^2+2^2=5. $$
Thus
$$ p= \frac{5}{5} \begin{bmatrix} 1\ 2 \end{bmatrix} = \begin{bmatrix} 1\ 2 \end{bmatrix}. $$
The residual is
$$ r=v-p = \begin{bmatrix} 3\ 1 \end{bmatrix} - \begin{bmatrix} 1\ 2 \end{bmatrix} = \begin{bmatrix} 2\ -1 \end{bmatrix}. $$
Check orthogonality:
$$ r^Tu = 2\cdot 1 + (-1)\cdot 2 = 0. $$
Thus the decomposition is
$$ \begin{bmatrix} 3\ 1 \end{bmatrix} = \begin{bmatrix} 1\ 2 \end{bmatrix} + \begin{bmatrix} 2\ -1 \end{bmatrix}, $$
with the first vector on the line and the second vector perpendicular to the line.
51.15 Example: Projection onto a Plane
Let (W\subseteq \mathbb{R}^3) be the (xy)-plane:
$$ W= \left{ \begin{bmatrix} x\ y\ 0 \end{bmatrix} :x,y\in\mathbb{R} \right}. $$
For
$$ v= \begin{bmatrix} a\ b\ c \end{bmatrix}, $$
the projection onto (W) is
$$ p= \begin{bmatrix} a\ b\ 0 \end{bmatrix}. $$
The residual is
$$ r= \begin{bmatrix} 0\ 0\ c \end{bmatrix}. $$
The projection matrix is
$$ P= \begin{bmatrix} 1&0&0\ 0&1&0\ 0&0&0 \end{bmatrix}. $$
Then
$$ Pv= \begin{bmatrix} a\ b\ 0 \end{bmatrix}. $$
This matrix satisfies
$$ P^2=P, \qquad P^T=P. $$
So it is an orthogonal projection matrix.
51.16 Example: Projection Using a Matrix
Let
$$ A= \begin{bmatrix} 1\ 1\ 0 \end{bmatrix}. $$
The column space of (A) is the line in (\mathbb{R}^3) spanned by
$$ u= \begin{bmatrix} 1\ 1\ 0 \end{bmatrix}. $$
The projection matrix is
$$ P=A(A^TA)^{-1}A^T. $$
Compute
$$ A^TA = \begin{bmatrix} 1&1&0 \end{bmatrix} \begin{bmatrix} 1\ 1\ 0 \end{bmatrix} = 2. $$
Thus
$$ P= \frac12 \begin{bmatrix} 1\ 1\ 0 \end{bmatrix} \begin{bmatrix} 1&1&0 \end{bmatrix} = \frac12 \begin{bmatrix} 1&1&0\ 1&1&0\ 0&0&0 \end{bmatrix}. $$
For
$$ b= \begin{bmatrix} 2\ 4\ 5 \end{bmatrix}, $$
the projection is
$$ p=Pb = \frac12 \begin{bmatrix} 1&1&0\ 1&1&0\ 0&0&0 \end{bmatrix} \begin{bmatrix} 2\ 4\ 5 \end{bmatrix} = \begin{bmatrix} 3\ 3\ 0 \end{bmatrix}. $$
The residual is
$$ r=b-p = \begin{bmatrix} -1\ 1\ 5 \end{bmatrix}. $$
Check orthogonality to the column of (A):
$$ A^Tr = \begin{bmatrix} 1&1&0 \end{bmatrix} \begin{bmatrix} -1\ 1\ 5 \end{bmatrix} = 0. $$
Thus (p) is the orthogonal projection of (b) onto (\operatorname{Col}(A)).
51.17 Orthogonal Projection and Coordinates
If (W) has an orthonormal basis (q_1,\ldots,q_k), then the projection coefficients are
$$ c_j=\langle v,q_j\rangle. $$
Thus
$$ \operatorname{proj}_W(v) = c_1q_1+\cdots+c_kq_k. $$
These coefficients are the coordinates of the projected vector in the orthonormal basis of (W).
The projection discards all components of (v) in (W^\perp). If (V=W\oplus W^\perp), and
$$ v=w+z, \qquad w\in W, \qquad z\in W^\perp, $$
then
$$ \operatorname{proj}_W(v)=w. $$
Projection is therefore a coordinate-selection operation relative to an orthogonal decomposition.
51.18 Orthogonal Projection and Energy
Let (p=\operatorname{proj}_W(v)) and (r=v-p). Since
$$ p\perp r, $$
the Pythagorean theorem gives
$$ |v|^2=|p|^2+|r|^2. $$
Thus projection splits the squared norm into two parts:
| Term | Meaning |
|---|---|
| (|p|^2) | Energy captured by the subspace |
| (|r|^2) | Energy left outside the subspace |
If (W) has orthonormal basis (q_1,\ldots,q_k), then
$$ |p|^2 = \sum_{j=1}^k |\langle v,q_j\rangle|^2. $$
Therefore
$$ |r|^2 = |v|^2 - \sum_{j=1}^k |\langle v,q_j\rangle|^2. $$
This form appears in approximation theory, signal processing, Fourier analysis, statistics, and numerical linear algebra.
51.19 Oblique Projections
A projection need not be orthogonal.
A linear map (P:V\to V) is a projection if
$$ P^2=P. $$
This only means that applying (P) twice is the same as applying it once. It does not require the residual to be orthogonal to the range.
An oblique projection is a projection whose range and null space are complementary but not orthogonal.
For example,
$$ P= \begin{bmatrix} 1&1\ 0&0 \end{bmatrix} $$
satisfies
$$ P^2=P, $$
so it is a projection. But
$$ P^T\ne P, $$
so it is not an orthogonal projection.
Orthogonal projections are usually preferred when distance minimization matters. Oblique projections appear in other settings where the decomposition directions are prescribed by constraints rather than perpendicularity.
51.20 Projection Theorem
In finite-dimensional inner product spaces, every subspace (W) has an orthogonal projection. For each (v\in V), there is a unique vector (p\in W) such that
$$ v-p\in W^\perp. $$
This vector (p) is the unique closest vector in (W) to (v).
In Hilbert spaces, the analogous result requires (W) to be closed. If (M) is a closed subspace of a Hilbert space (H), then every (x\in H) has a unique best approximation (\hat{x}\in M), and the error (x-\hat{x}) lies in (M^\perp).
This result is called the projection theorem. It is the abstract form of the closest vector property.
51.21 Summary
Orthogonal projection decomposes a vector into a component inside a subspace and a residual perpendicular to that subspace:
$$ v=\operatorname{proj}_W(v)+r, \qquad r\in W^\perp. $$
For a line spanned by a nonzero vector (u),
$$ \operatorname{proj}_u(v) = \frac{\langle v,u\rangle}{\langle u,u\rangle}u. $$
For a subspace with orthonormal basis (q_1,\ldots,q_k),
$$ \operatorname{proj}W(v) = \sum{j=1}^k \langle v,q_j\rangle q_j. $$
For a full-column-rank matrix (A), the projection matrix onto (\operatorname{Col}(A)) is
$$ P=A(A^TA)^{-1}A^T. $$
Orthogonal projection gives the closest vector in a subspace, produces the residual used in least squares, and gives the geometric meaning of many matrix formulas.