Chapter 7. Dual Numbers and Algebraic Structures

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...

Algebra of Dual Numbers

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of carrying only a value, a dual number carries a value and its first-order variation.

A dual number has the form

$$ a + b\varepsilon $$

where $a,b \in \mathbb{R}$, and $\varepsilon$ is a formal element satisfying

$$ \varepsilon^2 = 0 $$

but

$$ \varepsilon \neq 0. $$

The element $\varepsilon$ behaves like an infinitesimal direction. It is not a small real number. It is an algebraic marker that records first-order change and automatically deletes all second-order terms.

The Basic Algebra

Let

$$ x = a + b\varepsilon $$

and

$$ y = c + d\varepsilon. $$

Addition is componentwise:

$$ x + y = (a+c) + (b+d)\varepsilon. $$

Multiplication follows ordinary distributivity, together with the rule $\varepsilon^2 = 0$:

$$ xy = (a+b\varepsilon)(c+d\varepsilon) $$

$$ = ac + ad\varepsilon + bc\varepsilon + bd\varepsilon^2 $$

$$ = ac + (ad+bc)\varepsilon. $$

So the product rule is built into the multiplication law:

$$ (a,b)(c,d) = (ac, ad+bc). $$

This is already the core of automatic differentiation. The first component stores the primal value. The second component stores the derivative information.

Dual Numbers as Value-Derivative Pairs

In forward mode AD, we evaluate a function on a dual input

$$ x = a + \varepsilon. $$

More generally, if the input is seeded with tangent $b$, we write

$$ x = a + b\varepsilon. $$

For a smooth scalar function $f$, Taylor expansion gives

$$ f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon + \frac{1}{2}f''(a)b^2\varepsilon^2 + \cdots. $$

Since $\varepsilon^2 = 0$, all higher-order terms vanish:

$$ f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon. $$

Thus a single evaluation over dual numbers computes both the value and the directional derivative.

The rule is:

$$ f(a, b) = (f(a), f'(a)b). $$

For one input, choosing $b=1$ gives the ordinary derivative:

$$ f(a+\varepsilon) = f(a) + f'(a)\varepsilon. $$

Example

Let

$$ f(x) = x^3 + 2x. $$

Evaluate it at $x = 5 + \varepsilon$:

$$ (5+\varepsilon)^3 + 2(5+\varepsilon) $$

$$ = 125 + 75\varepsilon + 10 + 2\varepsilon $$

$$ = 135 + 77\varepsilon. $$

So

$$ f(5) = 135 $$

and

$$ f'(5) = 77. $$

Checking directly:

$$ f'(x) = 3x^2 + 2 $$

$$ f'(5) = 3 \cdot 25 + 2 = 77. $$

The derivative appears as the coefficient of $\varepsilon$.

Why $\varepsilon^2 = 0$ Matters

The rule $\varepsilon^2 = 0$ is what makes dual numbers represent first-order calculus. When multiplying perturbations, any second-order term disappears.

For example:

$$ (a+b\varepsilon)^2 = a^2 + 2ab\varepsilon + b^2\varepsilon^2 = a^2 + 2ab\varepsilon. $$

The coefficient of $\varepsilon$ is exactly the derivative of $x^2$ at $a$, applied to direction $b$:

$$ D(x^2)_a[b] = 2ab. $$

This pattern holds for every smooth elementary operation used in a program. Dual arithmetic forces each operation to carry both its value and its local linearization.

Division and Inverses

A dual number $a+b\varepsilon$ has a multiplicative inverse when $a \neq 0$.

We seek

$$ (a+b\varepsilon)^{-1} = c+d\varepsilon. $$

The product must equal $1$:

$$ (a+b\varepsilon)(c+d\varepsilon) = ac + (ad+bc)\varepsilon = 1. $$

So

$$ ac = 1 $$

and

$$ ad + bc = 0. $$

Hence

$$ c = \frac{1}{a} $$

and

$$ d = -\frac{b}{a^2}. $$

Therefore

$$ (a+b\varepsilon)^{-1} = \frac{1}{a} - \frac{b}{a^2}\varepsilon. $$

This corresponds to the derivative rule

$$ \frac{d}{dx}\frac{1}{x} = -\frac{1}{x^2}. $$

Division follows from multiplication by the inverse:

$$ \frac{a+b\varepsilon}{c+d\varepsilon} = (a+b\varepsilon) \left( \frac{1}{c} - \frac{d}{c^2}\varepsilon \right) $$

$$= \frac{a}{c} + \frac{bc-ad}{c^2}\varepsilon.$$

Elementary Functions

Elementary functions extend naturally to dual numbers. For a smooth function $f$,

$$ f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon. $$

This gives direct evaluation rules.

For sine:

$$ \sin(a+b\varepsilon) = \sin a + b\cos a \varepsilon. $$

For cosine:

$$ \cos(a+b\varepsilon) = \cos a - b\sin a \varepsilon. $$

For exponential:

$$ \exp(a+b\varepsilon) = \exp(a) + b\exp(a)\varepsilon. $$

For logarithm, assuming $a>0$:

$$ \log(a+b\varepsilon) = \log a + \frac{b}{a}\varepsilon. $$

For powers:

$$ (a+b\varepsilon)^n = a^n + n a^{n-1}b\varepsilon. $$

Each rule has the same shape: compute the primal value, then multiply the local derivative by the incoming tangent.

Dual Numbers and the Chain Rule

The chain rule is not added as an external algorithm. It follows from function composition over dual numbers.

Let

$$ h(x) = f(g(x)). $$

Evaluate at

$$ x = a + b\varepsilon. $$

First apply $g$:

$$ g(a+b\varepsilon) = g(a) + g'(a)b\varepsilon. $$

Then apply $f$:

$$ f(g(a) + g'(a)b\varepsilon) = f(g(a)) + f'(g(a))g'(a)b\varepsilon. $$

Therefore

type Dual struct {
    Val float64
    Dot float64
}
func Add(x, y Dual) Dual {
    return Dual{
        Val: x.Val + y.Val,
        Dot: x.Dot + y.Dot,
    }
}
func Mul(x, y Dual) Dual {
    return Dual{
        Val: x.Val * y.Val,
        Dot: x.Dot*y.Val + x.Val*y.Dot,
    }
}
func Sin(x Dual) Dual {
    return Dual{
        Val: math.Sin(x.Val),
        Dot: math.Cos(x.Val) * x.Dot,
    }
}
func F(x Dual) Dual {
    return Add(Mul(Mul(x, x), x), Mul(Const(2), x))
}
x := Dual{Val: 5, Dot: 1}
Dual{Val: 135, Dot: 77}

$$ h'(a)b = f'(g(a))g'(a)b. $$

This is exactly the chain rule.

Dual numbers turn the chain rule into ordinary evaluation. A program written over real numbers can often be lifted to dual numbers by replacing each primitive operation with its dual-number version.

Computational Interpretation

In an implementation, a dual number is usually represented as a pair:

Here Val is the primal value, and Dot is the tangent.

Addition:

Multiplication:

Sine:

A function written against these operations computes derivatives automatically.

For example:

With input

the result is

The same execution computes the primal value and the derivative.

Multiple Inputs

For a function

$$ f : \mathbb{R}^n \to \mathbb{R}, $$

a dual number can propagate one directional derivative at a time. Each input receives a primal value and a tangent seed.

For example, let

$$ f(x,y) = xy + \sin x. $$

To compute the derivative in direction

$$ (v_x, v_y), $$

evaluate

$$ x = a + v_x\varepsilon $$

and

$$ y = b + v_y\varepsilon. $$

Then

$$ xy = ab + (av_y + bv_x)\varepsilon $$

and

$$ \sin x = \sin a + v_x\cos a\varepsilon. $$

So

$$ f(x,y) = ab + \sin a + (av_y + bv_x + v_x\cos a)\varepsilon. $$

The coefficient of $\varepsilon$ is

$$ Df_{(a,b)}[v_x,v_y] = b v_x + a v_y + v_x\cos a. $$

Equivalently,

$$ Df_{(a,b)}[v] = \nabla f(a,b) \cdot v. $$

Forward mode naturally computes Jacobian-vector products.

Relation to Forward Mode AD

Forward mode AD is dual-number evaluation generalized to programs.

Each program variable carries two components:

$$ \text{variable} = (\text{value}, \text{tangent}). $$

Each primitive instruction updates both components.

For a program statement

$$ z = x \cdot y, $$

the lifted statement is

$$ z_{\text{val}} = x_{\text{val}} y_{\text{val}} $$

$$ z_{\text{dot}} = x_{\text{dot}} y_{\text{val}} + x_{\text{val}} y_{\text{dot}}. $$

For

$$ z = \sin x, $$

the lifted statement is

$$ z_{\text{val}} = \sin(x_{\text{val}}) $$

$$ z_{\text{dot}} = \cos(x_{\text{val}})x_{\text{dot}}. $$

This local transformation is enough. The global derivative emerges from executing the transformed program.

Algebraic Summary

The dual numbers form a commutative algebra over the real numbers:

$$ \mathbb{D} = \mathbb{R}[\varepsilon]/(\varepsilon^2). $$

This notation means: take polynomials in $\varepsilon$, but identify every term containing $\varepsilon^2$ or higher powers with zero.

Every dual number has a unique form:

$$ a + b\varepsilon. $$

The real part $a$ is the value. The dual part $b$ is the first-order coefficient.

This small algebra is powerful because it encodes first-order differential calculus directly into arithmetic. In forward mode AD, differentiation becomes evaluation in the algebra of dual numbers.