Hyper-Dual Numbers
Dual numbers compute first derivatives exactly. Truncated polynomial algebras extend this to higher-order derivatives, but practical higher-order differentiation introduces an...
Hyper-Dual Numbers
Dual numbers compute first derivatives exactly. Truncated polynomial algebras extend this to higher-order derivatives, but practical higher-order differentiation introduces an important problem: extracting second derivatives accurately without symbolic expansion or numerical cancellation.
Hyper-dual numbers solve this problem by introducing multiple nilpotent infinitesimal directions whose mixed products survive.
They provide an exact algebraic mechanism for computing:
- second derivatives
- mixed partial derivatives
- Hessians
without finite differences and without truncation error.
Motivation
Ordinary dual numbers satisfy:
$$ \varepsilon^2 = 0. $$
Evaluating
$$ f(x+\varepsilon) $$
produces:
$$ f(x)+f'(x)\varepsilon. $$
Only first-order information survives.
To recover second derivatives, one possibility is nested dual numbers or truncated polynomial algebras. However, those approaches may:
- increase implementation complexity
- require managing higher polynomial coefficients
- introduce perturbation confusion in nested systems
Hyper-dual numbers provide a cleaner construction for exact second-order differentiation.
The Hyper-Dual Algebra
Introduce two independent infinitesimal generators:
$$ \varepsilon_1,\varepsilon_2. $$
Require:
$$ \varepsilon_1^2 = 0 $$
$$ \varepsilon_2^2 = 0. $$
But preserve the mixed product:
$$ \varepsilon_1\varepsilon_2 \neq 0. $$
Also:
$$ (\varepsilon_1\varepsilon_2)^2 = 0. $$
A hyper-dual number has the form:
$$ a + b\varepsilon_1 + c\varepsilon_2 + d\varepsilon_1\varepsilon_2. $$
This algebra stores:
| Component | Meaning |
|---|---|
| $a$ | primal value |
| $b$ | first derivative in direction 1 |
| $c$ | first derivative in direction 2 |
| $d$ | mixed second derivative |
Why Mixed Products Matter
The key idea is that:
$$ (\varepsilon_1+\varepsilon_2)^2 = 2\varepsilon_1\varepsilon_2. $$
The square does not vanish completely because cross terms survive.
This allows second-order information to appear algebraically.
Taylor Expansion
For a smooth scalar function:
$$ f(x+h), $$
the second-order Taylor expansion is:
$$ f(x+h) = f(x) + f'(x)h + \frac12 f''(x)h^2. $$
Now substitute:
$$ h = a\varepsilon_1 + b\varepsilon_2. $$
Since:
$$ \varepsilon_1^2 = \varepsilon_2^2 = 0, $$
the square becomes:
$$ h^2 = 2ab\varepsilon_1\varepsilon_2. $$
Thus:
$$ f(x+h) = f(x) + f'(x)(a\varepsilon_1+b\varepsilon_2) + f''(x)ab\varepsilon_1\varepsilon_2. $$
The coefficient of:
$$ \varepsilon_1\varepsilon_2 $$
is exactly the second derivative.
Example
Let:
$$ f(x)=x^3. $$
Use the hyper-dual input:
$$ x+\varepsilon_1+\varepsilon_2. $$
Expand:
$$ (x+\varepsilon_1+\varepsilon_2)^3. $$
First compute:
$$ (x+h)^3 = x^3 + 3x^2h + 3xh^2 + h^3. $$
Since:
$$ h=\varepsilon_1+\varepsilon_2, $$
and:
$$ h^2 = 2\varepsilon_1\varepsilon_2, $$
while:
$$ h^3=0, $$
we obtain:
$$ x^3 + 3x^2(\varepsilon_1+\varepsilon_2) + 6x\varepsilon_1\varepsilon_2. $$
Thus:
| Coefficient | Value |
|---|---|
| $1$ | $x^3$ |
| $\varepsilon_1$ | $3x^2$ |
| $\varepsilon_2$ | $3x^2$ |
| $\varepsilon_1\varepsilon_2$ | $6x$ |
Since:
$$ f''(x)=6x, $$
the mixed coefficient gives the exact second derivative.
Multivariable Functions
Hyper-dual numbers naturally extend to multivariate functions.
Suppose:
$$ f : \mathbb{R}^n \to \mathbb{R}. $$
Choose two perturbation directions:
$$ u,v \in \mathbb{R}^n. $$
Evaluate:
$$ x + u\varepsilon_1 + v\varepsilon_2. $$
Then:
$$ f(x+u\varepsilon_1+v\varepsilon_2) $$
expands to:
$$ f(x) + Df_x(u)\varepsilon_1 + Df_x(v)\varepsilon_2 + u^T H_x v , \varepsilon_1\varepsilon_2. $$
The mixed coefficient gives the Hessian bilinear form:
$$ u^T H_x v. $$
This computes exact second-order directional derivatives.
Hessian Extraction
To compute a Hessian entry:
$$ \frac{\partial^2 f}{\partial x_i \partial x_j}, $$
seed:
$$ u=e_i, \quad v=e_j. $$
Then the coefficient of:
$$ \varepsilon_1\varepsilon_2 $$
is exactly:
$$ H_{ij}. $$
Repeated evaluation recovers the full Hessian matrix.
Example: Two Variables
Let:
$$ f(x,y)=x^2y+\sin(xy). $$
Choose perturbations:
$$ x \mapsto x+\varepsilon_1 $$
$$ y \mapsto y+\varepsilon_2. $$
Then:
$$ xy = xy + y\varepsilon_1 + x\varepsilon_2 + \varepsilon_1\varepsilon_2. $$
Mixed terms appear automatically.
Expanding the entire function produces coefficients involving:
$$ \varepsilon_1\varepsilon_2, $$
which equal:
$$ \frac{\partial^2 f}{\partial x\partial y}. $$
No symbolic differentiation is needed.
Exactness
Hyper-dual differentiation is exact up to floating-point arithmetic.
Unlike finite differences:
| Method | Error Source |
|---|---|
| Finite differences | truncation + cancellation |
| Symbolic differentiation | expression explosion |
| Hyper-dual numbers | floating-point only |
No step size is required.
No subtraction cancellation occurs.
The derivative structure emerges algebraically.
Algebraic Structure
The hyper-dual algebra can be written:
$$ \mathbb{R}[\varepsilon_1,\varepsilon_2] / (\varepsilon_1^2,\varepsilon_2^2). $$
Basis elements are:
$$ 1, \varepsilon_1, \varepsilon_2, \varepsilon_1\varepsilon_2. $$
Dimension is four.
Multiplication rules:
| Product | Result |
|---|---|
| $\varepsilon_1^2$ | $0$ |
| $\varepsilon_2^2$ | $0$ |
| $\varepsilon_1\varepsilon_2$ | survives |
| $(\varepsilon_1\varepsilon_2)^2$ | $0$ |
This carefully chosen nilpotent structure isolates second-order interactions.
Computational Interpretation
A hyper-dual number may be represented as:
type HyperDual struct {
Val float64
D1 float64
D2 float64
D12 float64
}
Components represent:
| Field | Meaning |
|---|---|
Val |
primal value |
D1 |
first derivative along direction 1 |
D2 |
first derivative along direction 2 |
D12 |
mixed second derivative |
Multiplication Rule
Suppose:
$$ x=(a,b,c,d) $$
and
$$ y=(p,q,r,s). $$
Then multiplication becomes:
$$ xy= ( ap, aq+bp, ar+cp, as+br+cq+dp ). $$
The mixed term obeys the second-order product rule automatically.
Example Implementation
func Mul(x, y HyperDual) HyperDual {
return HyperDual{
Val: x.Val * y.Val,
D1:
x.D1*y.Val +
x.Val*y.D1,
D2:
x.D2*y.Val +
x.Val*y.D2,
D12:
x.D12*y.Val +
x.D1*y.D2 +
x.D2*y.D1 +
x.Val*y.D12,
}
}
The D12 component contains all mixed second-order interactions.
Relation to Hessian-Vector Products
Hyper-dual numbers compute second-order directional derivatives naturally.
Given:
$$ u^T H v, $$
evaluate:
$$ x + u\varepsilon_1 + v\varepsilon_2. $$
The coefficient of:
$$ \varepsilon_1\varepsilon_2 $$
is the result.
This avoids explicit Hessian construction.
For large systems, Hessian-vector products are often preferable to dense Hessians.
Perturbation Confusion
Nested dual-number systems may accidentally mix perturbation symbols.
Hyper-dual numbers avoid this by explicitly separating infinitesimal generators:
$$ \varepsilon_1, \varepsilon_2. $$
Each perturbation direction remains algebraically distinct.
This improves correctness in higher-order implementations.
Relation to Truncated Polynomial Algebras
Hyper-dual numbers differ from ordinary truncated polynomial algebras.
Truncated polynomial algebra:
$$ \mathbb{R}[\varepsilon]/(\varepsilon^3) $$
keeps powers:
$$ 1,\varepsilon,\varepsilon^2. $$
Hyper-dual algebra instead keeps:
$$ 1, \varepsilon_1, \varepsilon_2, \varepsilon_1\varepsilon_2. $$
This distinction matters:
| Structure | Stores |
|---|---|
| Truncated polynomial | repeated derivatives |
| Hyper-dual | mixed derivatives |
Hyper-dual systems are particularly effective for Hessian computation.
Complexity
For $n$ variables:
- one forward dual pass computes one directional derivative
- one hyper-dual pass computes one second-order directional interaction
Dense Hessian construction still requires multiple evaluations.
However, the method remains exact and compositional.
Geometric Interpretation
Dual numbers represent tangent vectors.
Hyper-dual numbers represent interacting tangent directions.
The mixed product:
$$ \varepsilon_1\varepsilon_2 $$
captures curvature.
First-order infinitesimals describe local linear geometry.
Second-order mixed infinitesimals describe local quadratic geometry.
Hyper-dual numbers therefore encode second-order local structure.
Summary
Hyper-dual numbers extend dual numbers by introducing multiple independent nilpotent directions whose mixed products survive.
The algebra:
$$ \mathbb{R}[\varepsilon_1,\varepsilon_2] / (\varepsilon_1^2,\varepsilon_2^2) $$
produces exact second derivatives through ordinary program evaluation.
Key properties:
| Feature | Result |
|---|---|
| Independent infinitesimals | separate derivative directions |
| Mixed products survive | second-order information |
| No finite differences | exact differentiation |
| Local algebraic propagation | automatic Hessian computation |
| Structured nilpotency | stable higher-order AD |
Hyper-dual numbers provide one of the cleanest exact formulations of second-order automatic differentiation.