Chapter 126. Numerical Linear Algebra

Chapter 126. Numerical Linear Algebra

Numerical linear algebra studies algorithms for computing with vectors and matrices using finite-precision arithmetic.

Exact symbolic formulas are often impractical for large systems. Instead, computations are performed approximately on digital hardware. Numerical linear algebra analyzes the accuracy, stability, efficiency, and scalability of these computations.

The subject includes solving linear systems, least squares problems, eigenvalue problems, singular value decompositions, matrix factorizations, iterative methods, sparse matrices, conditioning, and floating-point error analysis.

Modern scientific computing, machine learning, optimization, graphics, signal processing, simulation, and data analysis all depend on numerical linear algebra. Matrix factorizations such as LU, QR, and SVD are central tools throughout the field.

126.1 Floating-Point Arithmetic

Computers do not represent real numbers exactly.

Instead, they use floating-point numbers of the form

$$ \pm m \times b^e, $$

where:

Symbol Meaning
(m) Mantissa or significand
(b) Base
(e) Exponent

Most modern systems use binary floating-point arithmetic.

A floating-point number approximates a real number. Arithmetic operations are rounded to nearby representable values.

Thus

$$ (a+b)+c $$

may differ from

$$ a+(b+c). $$

Finite precision changes algebraic behavior.

126.2 Machine Precision

Machine precision measures the relative spacing of floating-point numbers.

The machine epsilon (\varepsilon_{\text{mach}}) is approximately the smallest number such that

$$ 1+\varepsilon_{\text{mach}} > 1 $$

in floating-point arithmetic.

For IEEE double precision,

$$ \varepsilon_{\text{mach}}\approx 2^{-53}\approx 10^{-16}. $$

Machine precision limits attainable accuracy.

No stable algorithm can usually compute significantly beyond machine precision relative to the conditioning of the problem.

126.3 Rounding Error

Each arithmetic operation introduces small rounding error.

A common model is

$$ \operatorname{fl}(x\circ y) = (x\circ y)(1+\delta), $$

where:

Symbol Meaning
(\circ) Arithmetic operation
(\operatorname{fl}) Floating-point result
(\delta) Small relative error

Typically,

$$ |\delta|\leq \varepsilon_{\text{mach}}. $$

Although each error is tiny, many operations may accumulate substantial error.

Numerical analysis studies how these local errors propagate through algorithms.

126.4 Catastrophic Cancellation

Cancellation occurs when subtracting nearly equal numbers.

For example,

$$ 1.000000000001 - 1.000000000000 $$

may lose many significant digits.

The relative error after subtraction can become large even if the original numbers were accurate.

This is called catastrophic cancellation.

Algorithms should avoid unnecessary cancellation when possible.

For example, the quadratic formula

$$ x=\frac{-b\pm\sqrt{b^2-4ac}}{2a} $$

can suffer cancellation for one root. Stable rearrangements are preferred in numerical computation.

126.5 Backward Error

Backward error asks:

What nearby problem did the algorithm solve exactly?

Suppose an algorithm computes (\hat{x}) for the system

$$ Ax=b. $$

If

$$ (A+\Delta A)\hat{x}=b+\Delta b, $$

then (\Delta A) and (\Delta b) describe the backward error.

An algorithm is backward stable if it solves a nearby problem with small perturbations.

Backward stability is one of the most important concepts in numerical linear algebra. Cornell notes emphasize backward stability as solving a nearby problem exactly.

126.6 Conditioning

Conditioning measures sensitivity of the solution to perturbations in the data.

For a nonsingular matrix (A), the condition number is

$$ \kappa(A)=|A||A^{-1}|. $$

If (\kappa(A)) is large, small perturbations in the data may produce large changes in the solution.

If (\kappa(A)) is near (1), the problem is well-conditioned.

Conditioning depends on the mathematical problem itself, not on the algorithm.

Algorithms may be stable or unstable, but even a stable algorithm cannot overcome severe ill-conditioning.

126.7 Linear Systems

A central problem is solving

$$ Ax=b. $$

For dense matrices, direct factorization methods are common.

For sparse or very large matrices, iterative methods are often preferred.

The computational strategy depends on:

Matrix property Preferred methods
Dense LU, QR
Symmetric positive definite Cholesky, conjugate gradient
Sparse Sparse direct or iterative methods
Structured Specialized algorithms
Huge Krylov methods, multigrid

Linear system solution is one of the main workloads in scientific computing.

126.8 Gaussian Elimination

Gaussian elimination transforms a matrix into upper triangular form using row operations.

For

$$ Ax=b, $$

the process computes

$$ A=LU, $$

where:

Matrix Meaning
(L) Lower triangular
(U) Upper triangular

Then solve:

$$ Ly=b, $$

followed by

$$ Ux=y. $$

Triangular systems are solved efficiently by forward and backward substitution.

Gaussian elimination is one of the most fundamental algorithms in numerical linear algebra.

126.9 Pivoting

Without pivoting, Gaussian elimination may become numerically unstable.

Pivoting swaps rows or columns to avoid dividing by small numbers.

Partial pivoting chooses the largest available pivot in magnitude within the current column.

The factorization becomes

$$ PA=LU, $$

where (P) is a permutation matrix.

Pivoting greatly improves stability in practice.

Most dense linear algebra libraries use pivoted LU factorization by default.

126.10 Complexity of Dense Elimination

For an (n\times n) dense matrix, Gaussian elimination requires approximately

$$ \frac{2}{3}n^3 $$

floating-point operations.

Storage requires approximately

$$ n^2 $$

numbers.

Thus direct dense methods become expensive for large (n).

For example:

(n) Approximate operations
(10^3) (10^9)
(10^4) (10^{12})
(10^5) (10^{15})

This motivates sparse and iterative methods for large problems.

126.11 Sparse Matrices

A sparse matrix has mostly zero entries.

Examples arise naturally in:

Area Source of sparsity
Finite elements Local basis support
Graphs Limited adjacency
PDE discretization Local coupling
Networks Sparse connections
Markov chains Few transitions

Sparse matrices are stored using compressed formats such as CSR or CSC.

Algorithms try to avoid computations involving zeros.

Sparsity changes both computational complexity and memory usage dramatically.

126.12 Fill-In

During elimination, zero entries may become nonzero.

This phenomenon is called fill-in.

For sparse matrices, fill-in can destroy sparsity and increase cost.

Matrix ordering strategies attempt to reduce fill-in.

Examples include:

Ordering Purpose
Minimum degree Reduce local fill
Nested dissection Exploit graph separators
Reverse Cuthill-McKee Reduce bandwidth

Sparse matrix computation is strongly connected to graph structure.

126.13 Cholesky Factorization

If (A) is symmetric positive definite, then

$$ A=LL^T, $$

where (L) is lower triangular.

This is the Cholesky factorization.

Compared with LU factorization, Cholesky:

Property Advantage
Uses symmetry Half storage
Uses definiteness No pivoting needed
Fewer operations About half the work of LU

Cholesky is widely used in optimization, statistics, finite elements, and Gaussian process models.

126.14 QR Factorization

The QR factorization expresses a matrix as

$$ A=QR, $$

where:

Matrix Property
(Q) Orthogonal
(R) Upper triangular

QR factorization is fundamental for least squares problems.

Given

$$ Ax\approx b, $$

compute

$$ QRx\approx b. $$

Since orthogonal transformations preserve norms,

$$ Rx\approx Q^Tb. $$

This triangular system gives the least squares solution.

QR factorization is more stable than solving normal equations directly.

126.15 Householder Reflections

A Householder reflection has the form

$$ H=I-2vv^T, $$

where (v) is a unit vector.

It reflects vectors across a hyperplane.

Householder transformations are orthogonal and numerically stable.

They are widely used for QR factorization because they systematically eliminate subdiagonal entries.

A sequence of Householder reflections transforms a matrix into triangular form.

126.16 Givens Rotations

A Givens rotation acts on two coordinates at a time.

It has the form

$$ G= \begin{bmatrix} c & s \ -s & c \end{bmatrix}, $$

embedded into a larger identity matrix.

Givens rotations are useful when matrices are sparse because they modify only selected rows or columns.

They are commonly used in updating factorizations and in some eigenvalue algorithms.

126.17 Least Squares Problems

The least squares problem seeks

$$ x = \arg\min_x |Ax-b|^2. $$

The normal equations are

$$ A^TAx=A^Tb. $$

If (A) has full column rank, the solution is unique.

However, solving normal equations may square the condition number:

$$ \kappa(A^TA)=\kappa(A)^2. $$

Therefore QR factorization or SVD is often preferred.

Least squares problems appear throughout data fitting, regression, inverse problems, and machine learning.

126.18 Singular Value Decomposition

The singular value decomposition is

$$ A=U\Sigma V^T. $$

Here:

Matrix Property
(U) Orthogonal
(V) Orthogonal
(\Sigma) Diagonal with nonnegative entries

The diagonal entries

$$ \sigma_1\geq \sigma_2\geq \cdots $$

are the singular values.

The SVD reveals:

Quantity From SVD
Rank Number of nonzero singular values
Condition number (\sigma_1/\sigma_r)
Best low-rank approximation Truncated SVD
Null space Small singular vectors
Principal directions Singular vectors

The SVD is one of the most important matrix decompositions in numerical computation.

126.19 Eigenvalue Problems

The eigenvalue problem seeks

$$ Av=\lambda v. $$

Numerical eigenvalue algorithms include:

Method Main idea
Power iteration Dominant eigenvector
Inverse iteration Targeted eigenvalues
QR algorithm Orthogonal similarity iteration
Lanczos Sparse symmetric matrices
Arnoldi General sparse matrices

Eigenvalue problems arise in vibration analysis, quantum mechanics, PCA, graph analysis, stability analysis, and differential equations.

126.20 Power Iteration

The power method repeatedly multiplies by (A):

$$ x_{k+1}=\frac{Ax_k}{|Ax_k|}. $$

Under suitable conditions, the vectors converge toward the dominant eigenvector.

The convergence rate depends on the spectral gap:

$$ \left|\frac{\lambda_2}{\lambda_1}\right|. $$

Power iteration is simple and scalable for large sparse matrices.

It underlies PageRank and many large-scale spectral algorithms.

126.21 Krylov Subspaces

Iterative methods often work in Krylov subspaces.

The (m)-th Krylov subspace generated by (A) and (b) is

$$ \mathcal{K}_m(A,b) = \operatorname{span} {b,Ab,A^2b,\ldots,A^{m-1}b}. $$

Krylov methods build approximate solutions from these spaces.

Examples include:

Method Matrix type
Conjugate gradient Symmetric positive definite
GMRES General nonsymmetric
MINRES Symmetric indefinite
Lanczos Eigenvalue problems

These methods rely mainly on matrix-vector multiplication.

126.22 Conjugate Gradient Method

For symmetric positive definite matrices, the conjugate gradient method solves

$$ Ax=b $$

iteratively.

The method minimizes the quadratic energy

$$ \frac12 x^TAx-b^Tx. $$

The search directions are (A)-orthogonal:

$$ p_i^TAp_j=0 \quad \text{for } i\neq j. $$

Conjugate gradient often converges much faster than worst-case bounds suggest.

Its efficiency depends strongly on the spectrum of (A).

126.23 Preconditioning

Preconditioning improves iterative convergence.

Instead of solving

$$ Ax=b, $$

solve an equivalent transformed system

$$ M^{-1}Ax=M^{-1}b. $$

The matrix (M) approximates (A) but is easier to invert.

A good preconditioner reduces the effective condition number.

Common preconditioners include:

Preconditioner Idea
Jacobi Diagonal scaling
Incomplete LU Approximate factorization
Multigrid Hierarchical correction
Domain decomposition Subdomain solves

Preconditioning is often the key to practical large-scale performance.

126.24 Stability of Orthogonal Transformations

Orthogonal matrices preserve norms:

$$ |Qx|=|x|. $$

This makes orthogonal transformations numerically stable.

Algorithms based on orthogonal transformations, such as QR factorization, tend to have excellent stability properties.

This is one reason QR methods are preferred over explicitly forming normal equations.

Orthogonality is extremely valuable numerically.

126.25 Matrix Norms

Matrix norms measure matrix size.

Common norms include:

Norm Definition
Spectral norm Largest singular value
Frobenius norm Square root of sum of squares
1-norm Maximum column sum
Infinity norm Maximum row sum

Norms are used in error bounds, conditioning, perturbation analysis, and convergence analysis.

Different norms emphasize different aspects of matrix behavior.

126.26 Perturbation Theory

Perturbation theory studies how small changes affect solutions.

Questions include:

Problem Perturbation question
Linear systems How does (x) change if (A) changes?
Eigenvalues How stable are eigenvalues?
Singular values How sensitive are low-rank structures?
Least squares How sensitive is the fit?

Perturbation theory combines linear algebra with analysis and numerical stability.

It explains why some computations are inherently difficult.

126.27 Parallel and High-Performance Computation

Modern numerical linear algebra must exploit hardware efficiently.

Performance depends on:

Factor Importance
Cache locality Memory efficiency
Parallelism Multiple processors
GPU acceleration Massive throughput
Communication cost Distributed systems
Sparse access patterns Memory bandwidth

Large-scale computation often becomes limited by data movement rather than arithmetic itself.

This changes algorithm design.

126.28 Numerical Linear Algebra and Applications

Numerical linear algebra is foundational because many applications reduce to matrix problems.

Area Typical matrix problem
Machine learning Least squares, SVD
Finite elements Sparse linear systems
Graphics Geometric transformations
Optimization Newton systems
Signal processing Spectral analysis
Networks Eigenvectors
Statistics Covariance factorizations
Quantum mechanics Eigenvalue problems
PDEs Sparse operators

Efficient computation with matrices is therefore central across scientific and engineering disciplines.

126.29 Numerical Linear Algebra and Linear Algebra

The dictionary is direct.

Numerical concept Linear algebra object
Floating-point vector Approximate vector
Stability Error growth under operations
Conditioning Sensitivity of matrix problems
LU factorization Triangular decomposition
QR factorization Orthogonal-triangular decomposition
Cholesky Symmetric positive definite factorization
Iterative solver Krylov subspace method
Sparse matrix Graph-structured operator
SVD Orthogonal spectral factorization
Preconditioner Approximate inverse operator
Eigenvalue algorithm Spectral computation

Numerical linear algebra studies how linear algebra behaves under finite computation.

126.30 Summary

Numerical linear algebra develops practical algorithms for matrix computation under finite precision.

Floating-point arithmetic introduces rounding error, cancellation, and stability concerns. Conditioning measures sensitivity of problems. Stable algorithms aim to control error growth.

Linear systems are solved using LU, Cholesky, or iterative Krylov methods. Least squares problems use QR or SVD methods. Sparse matrices require specialized storage and algorithms. Eigenvalue problems use iterative spectral methods.

The central principle is approximation under finite precision. Numerical linear algebra studies how to compute meaningful answers efficiently and reliably even though arithmetic is inexact and matrices may be extremely large.