Eigenvalue Problems

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle...

Eigenvalue Problems

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle operations in automatic differentiation.

The main difficulty is that eigenvectors are not stable objects. Small perturbations in the matrix can produce large changes in eigenvectors when eigenvalues are close or repeated. The derivative formulas reveal this instability directly.

Eigenvalues and Eigenvectors

For a square matrix

$$ A \in \mathbb{R}^{n \times n}, $$

an eigenpair satisfies

$$ Av = \lambda v, $$

where

$$ v \ne 0. $$

The scalar $\lambda$ is an eigenvalue and $v$ is an eigenvector.

For symmetric matrices,

$$ A = A^T, $$

all eigenvalues are real, and there exists an orthonormal basis of eigenvectors:

$$ A = Q\Lambda Q^T, $$

where

$$ Q^TQ = I, $$

and

$$ \Lambda = \operatorname{diag}(\lambda_1,\ldots,\lambda_n). $$

Symmetric eigenproblems are much more stable and better behaved for AD than general nonsymmetric eigenproblems.

Why Eigenproblems Matter

Eigenvalue computations appear in:

Application Role
PCA Principal directions
Spectral clustering Graph Laplacian eigenvectors
Stability analysis System modes
Differential equations Modal decomposition
Quantum mechanics Hamiltonian spectra
Covariance analysis Principal components
Optimization Hessian curvature
Graph neural networks Spectral filters

Many machine learning systems differentiate through eigendecompositions implicitly or explicitly.

Differential of Eigenvalues

Assume $A$ is symmetric with normalized eigenvectors:

$$ q_i^T q_i = 1. $$

The eigenvalue equation is

$$ Aq_i = \lambda_i q_i. $$

Differentiate:

$$ dA,q_i + A,dq_i = d\lambda_i,q_i + \lambda_i,dq_i. $$

Left-multiply by $q_i^T$:

$$ q_i^T dA,q_i + q_i^T A,dq_i = d\lambda_i,q_i^T q_i + \lambda_i q_i^T dq_i. $$

Since

$$ Aq_i = \lambda_i q_i, $$

we obtain

$$ q_i^T A,dq_i = \lambda_i q_i^T dq_i. $$

The terms cancel:

$$ d\lambda_i = q_i^T dA,q_i. $$

This is the first-order perturbation formula for eigenvalues.

The gradient of the eigenvalue with respect to the matrix is

$$ \nabla_A \lambda_i = q_i q_i^T. $$

This formula is stable even when eigenvectors are not.

Reverse Rule for Eigenvalues

Suppose a scalar loss depends only on eigenvalues:

$$ L = f(\lambda_1,\ldots,\lambda_n). $$

Let

$$ \bar{\lambda}_i = \frac{\partial L}{\partial \lambda_i}. $$

Then

$$ dL = \sum_i \bar{\lambda}_i d\lambda_i. $$

Substitute the perturbation formula:

$$ dL = \sum_i \bar{\lambda}_i q_i^T dA q_i. $$

Rewrite using trace:

$$ dL = \operatorname{tr} \left( Q,\operatorname{diag}(\bar{\lambda}),Q^T dA \right). $$

Therefore,

$$ \bar{A} = Q,\operatorname{diag}(\bar{\lambda}),Q^T. $$

This rule is clean and stable for symmetric matrices.

Differential of Eigenvectors

Eigenvectors are more difficult.

Assume distinct eigenvalues. Differentiate:

$$ Aq_i = \lambda_i q_i. $$

Rearrange:

$$ (A - \lambda_i I)dq_i = (d\lambda_i I - dA)q_i. $$

Expand $dq_i$ in the eigenbasis:

$$ dq_i = \sum_{j \ne i} c_{ji} q_j. $$

Project onto $q_j$:

$$ q_j^T(A-\lambda_i I)dq_i = q_j^T(d\lambda_i I - dA)q_i. $$

Since

$$ Aq_j = \lambda_j q_j, $$

we obtain

$$ (\lambda_j - \lambda_i)c_{ji} = -q_j^T dA q_i. $$

Thus

$$ c_{ji} = \frac{q_j^T dA q_i}{\lambda_i - \lambda_j}. $$

Therefore

$$ dq_i = \sum_{j \ne i} q_j \frac{q_j^T dA q_i}{\lambda_i - \lambda_j}. $$

This formula is extremely important.

It shows that eigenvector derivatives contain terms:

$$ \frac{1}{\lambda_i - \lambda_j}. $$

If eigenvalues are close, derivatives become large. If eigenvalues are equal, the derivative becomes undefined.

Geometric Meaning

Eigenvectors are unstable because only the eigenspace is intrinsic.

Suppose

$$ \lambda_i = \lambda_j. $$

Then any orthonormal basis of the corresponding eigenspace is valid. The individual vectors are arbitrary.

Thus the mapping

$$ A \mapsto q_i $$

is not continuous at repeated eigenvalues.

But the projector onto the eigenspace:

$$ P = QQ^T $$

is stable and differentiable under broader conditions.

This distinction matters in AD. Stable objectives should depend on invariant subspaces rather than arbitrary basis vectors.

Symmetric vs Nonsymmetric Problems

For symmetric matrices:

Property Result
Eigenvalues Real
Eigenvectors Orthogonal
Decomposition Stable
Gradient formulas Relatively clean

For general matrices:

Property Result
Eigenvalues May be complex
Eigenvectors May be nonorthogonal
Jordan blocks Possible
Perturbation theory Much harder

Most ML and scientific computing systems restrict differentiable eigendecompositions to symmetric or Hermitian matrices.

Spectral Functions

A spectral function depends only on eigenvalues:

$$ f(A) = Q f(\Lambda) Q^T. $$

Examples:

Function Spectral Form
Matrix exponential $e^A = Q e^\Lambda Q^T$
Matrix logarithm $\log A = Q \log \Lambda Q^T$
Matrix square root $A^{1/2} = Q \Lambda^{1/2} Q^T$
Matrix inverse $A^{-1} = Q \Lambda^{-1} Q^T$

Differentiating spectral functions involves differentiating eigenvalues and eigenvectors simultaneously.

The resulting formulas often contain divided differences:

$$ \frac{f(\lambda_i)-f(\lambda_j)} {\lambda_i-\lambda_j}. $$

These expressions remain finite when eigenvalues coincide if handled carefully.

Example: Largest Eigenvalue

Consider

$$ \lambda_{\max}(A). $$

If the largest eigenvalue is simple, then

$$ d\lambda_{\max} = q^T dA q. $$

Thus

$$ \nabla_A \lambda_{\max} = qq^T. $$

This appears in:

Application Use
Spectral norm regularization Largest singular/eigenvalue
Stability constraints Spectral radius
Graph methods Leading eigenvector
Optimization Convex spectral penalties

If the largest eigenvalue is repeated, the gradient becomes set-valued.

Rayleigh Quotient

The Rayleigh quotient is

$$ R(x) = \frac{x^T A x}{x^T x}. $$

For symmetric $A$, stationary points are eigenvectors.

Differentiate numerator and denominator:

$$ dR = \frac{ 2x^T A,dx + x^T dA x }{x^T x} - \frac{ x^T A x }{ (x^T x)^2 } 2x^T dx. $$

When $x$ is normalized:

$$ x^T x = 1, $$

this simplifies to

$$ dR = x^T dA x + 2(Ax - R(x)x)^T dx. $$

At an eigenvector,

$$ Ax = \lambda x, $$

the second term vanishes.

Then

$$ dR = x^T dA x. $$

The Rayleigh quotient connects optimization and eigensystems directly.

Orthogonality Constraints

Eigenvector matrices satisfy

$$ Q^TQ = I. $$

Differentiate:

$$ dQ^TQ + Q^TdQ = 0. $$

Therefore

$$ Q^TdQ $$

is skew-symmetric.

This constraint appears in all orthogonal matrix derivatives: QR, eigendecomposition, SVD, Stiefel manifolds, and orthogonal parameterizations.

A backward rule must preserve this geometry.

Reverse Mode for Symmetric Eigendecomposition

Suppose

$$ A = Q\Lambda Q^T, $$

and a loss depends on both $Q$ and $\Lambda$.

Let:

$$ \bar{Q} = \frac{\partial L}{\partial Q}, \qquad \bar{\Lambda} = \frac{\partial L}{\partial \Lambda}. $$

Define matrix $F$:

$$ F_{ij} = \begin{cases} \frac{1}{\lambda_i - \lambda_j}, & i \ne j, \ 0, & i=j. \end{cases} $$

Then the symmetric reverse rule is:

$$ \bar{A} = Q \left( \bar{\Lambda} + F \odot (Q^T \bar{Q}) \right) Q^T. $$

Symmetrization is usually required:

$$ \bar{A} \leftarrow \frac{1}{2}(\bar{A}+\bar{A}^T). $$

The term involving $F$ is precisely where instability appears when eigenvalues collide.

Eigenvalue Crossing

Suppose two eigenvalues approach each other:

$$ \lambda_i \to \lambda_j. $$

Then

$$ \frac{1}{\lambda_i - \lambda_j} \to \infty. $$

The gradient can explode.

This is not a bug in AD. The mathematical derivative itself becomes singular.

Near eigenvalue crossings:

Quantity Stability
Eigenvalues Stable
Eigenspaces Usually stable
Individual eigenvectors Unstable

This distinction is critical in spectral machine learning methods.

Numerical Stability

Practical eigendecomposition differentiation often requires:

Technique Purpose
Eigenvalue regularization Avoid collisions
Symmetrization Remove asymmetric noise
Projector objectives Avoid unstable basis gradients
Truncated spectra Ignore small unstable modes
Stable spectral gaps Improve conditioning

Spectral methods are fundamentally conditioning-sensitive.

Power Iteration and Implicit Differentiation

Some systems compute dominant eigenvectors iteratively:

$$ x_{k+1} = \frac{Ax_k}{|Ax_k|}. $$

AD can:

  1. Differentiate through all iterations.
  2. Use implicit differentiation on the fixed-point equation.

Implicit methods are often cheaper and more stable for long iterative processes.

The fixed-point condition is:

$$ x = \frac{Ax}{|Ax|}. $$

Differentiating the fixed-point equation yields a linear system for the derivative.

This approach generalizes to many iterative eigensolvers.

Graph Laplacians

Graph learning frequently uses eigendecompositions of Laplacians:

$$ L = D - A. $$

The smallest eigenvectors encode graph geometry.

Spectral clustering, diffusion maps, graph embeddings, and manifold learning all rely on these spectra.

Repeated or nearly repeated eigenvalues are common in graphs with weakly connected components. This can make eigenvector gradients unstable.

Using subspace projectors instead of raw eigenvectors often improves robustness.

Spectral Radius

The spectral radius is

$$ \rho(A) = \max_i |\lambda_i|. $$

It controls:

Domain Meaning
Dynamical systems Stability
Recurrent networks Gradient growth
Numerical methods Convergence
Control theory System response

Differentiating spectral radius is difficult when multiple eigenvalues attain the maximum magnitude.

Implementation Considerations

A production AD implementation for eigendecomposition should track:

matrix symmetry
eigenvalue ordering
degeneracy handling
batch dimensions
dtype
complex vs real arithmetic
orthogonality conventions

The backward rule should explicitly define behavior when eigenvalues are repeated.

Many frameworks either:

Strategy Behavior
Return NaNs Signal undefined derivative
Use approximate stabilization Numerically smoother
Document undefined region User responsibility

No implementation can make fundamentally undefined derivatives well-defined.

Practical Guidance

Prefer:

Stable Objective Avoid
Eigenvalues Raw eigenvectors
Subspace projectors Arbitrary basis vectors
Spectral traces Unstable spectral coordinates
Orthogonally invariant losses Basis-dependent losses
Symmetric problems General nonsymmetric spectra

When possible, formulate objectives using:

$$ QQ^T $$

rather than $Q$ itself.

Summary

Eigenvalue problems reveal the geometric limits of automatic differentiation. Eigenvalues have stable first-order perturbations. Eigenvectors do not. Their derivatives contain inverse spectral gaps, making gradients unstable or undefined near repeated eigenvalues.

Differentiating eigendecompositions correctly requires respecting orthogonality constraints, spectral degeneracy, and basis ambiguity. Stable formulations should depend on invariant subspaces or spectral quantities rather than arbitrary eigenvector coordinates.