Taylor Mode AD

Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.

Taylor Mode AD

Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.

Instead of tracking only values and first derivatives, Taylor mode tracks entire local polynomial expansions.

For a scalar function

$$ f(x), $$

expanded around point $x_0$,

$$ f(x_0 + h) = f(x_0) + f'(x_0)h + \frac{f''(x_0)}{2!}h^2 + \frac{f^{(3)}(x_0)}{3!}h^3 + \cdots $$

Taylor mode computes these coefficients automatically.

This gives direct access to higher-order derivatives.

Basic Idea

Forward mode propagates first-order tangent information:

$$ x + \epsilon v. $$

Taylor mode propagates higher-order polynomial structure:

$$ x(t) = x_0 + x_1 t + x_2 t^2 + \cdots + x_p t^p. $$

Each coefficient represents derivative information.

Typically:

$$ x_k = \frac{1}{k!} \frac{d^k x}{dt^k}(0). $$

The program is evaluated symbolically over truncated power series rather than over ordinary numbers.

Truncated Polynomial Algebra

Taylor mode works in the algebra:

$$ \mathbb{R}[t]/(t^{p+1}), $$

the ring of polynomials truncated at degree $p$.

A Taylor object has form:

$$ a_0 + a_1 t + a_2 t^2 + \cdots + a_p t^p. $$

Addition is coefficient-wise:

$$ (a+b)_k = a_k + b_k. $$

Multiplication uses convolution:

$$ (ab)k = \sum{i=0}^k a_i b_{k-i}. $$

Higher-order derivatives emerge from polynomial coefficient propagation.

Example: Polynomial Function

Let

$$ f(x) = x^3. $$

Suppose:

$$ x(t) = x_0 + x_1 t + x_2 t^2. $$

Expand:

$$ f(x(t)) = (x_0 + x_1 t + x_2 t^2)^3. $$

Keeping terms up to degree 2:

$$= x_0^3 + 3x_0^2x_1 t + (3x_0^2x_2 + 3x_0x_1^2)t^2.$$

The coefficients are:

Degree Coefficient
$t^0$ $x_0^3$
$t^1$ $3x_0^2x_1$
$t^2$ $3x_0^2x_2 + 3x_0x_1^2$

The second-order term contains the quadratic interaction naturally.

No explicit Hessian rules are required. The polynomial algebra produces the correct structure automatically.

Recovering Derivatives

If we choose:

$$ x(t) = x_0 + t, $$

then:

$$ f(x_0+t) = \sum_{k=0}^p \frac{f^{(k)}(x_0)}{k!} t^k. $$

Therefore:

Coefficient Meaning
$t^0$ function value
$t^1$ first derivative
$t^2$ second derivative divided by $2!$
$t^3$ third derivative divided by $3!$

and so on.

Taylor mode computes all derivatives up to order $p$ simultaneously.

Comparison with Nested Forward Mode

Nested forward mode computes higher derivatives by nesting dual numbers repeatedly.

Taylor mode computes all orders together in one algebraic object.

Method Representation
nested duals repeated first-order structure
Taylor mode truncated polynomial
hyper-duals structured second-order dual algebra
symbolic differentiation explicit algebraic manipulation

Taylor mode avoids some nesting complexity and perturbation tagging issues.

Elementary Functions

Taylor mode requires propagation rules for elementary functions.

Suppose:

$$ y(t) = \exp(x(t)). $$

If

$$ x(t) = \sum_{k=0}^p x_k t^k, $$

then

$$ y(t) = \sum_{k=0}^p y_k t^k $$

must satisfy:

$$ y'(t) = y(t)x'(t). $$

Matching coefficients gives recurrence relations for the Taylor coefficients.

Similarly:

Function Defining relation
$\exp x$ $y' = yx'$
$\sin x$ $y' = \cos(x)x'$
$\cos x$ $y' = -\sin(x)x'$
$\log x$ $y'x = x'$

Taylor mode implementations derive coefficient recurrences from these identities.

Recurrence Computation

Taylor coefficients are usually computed recursively.

Suppose:

$$ x(t) = \sum_{k=0}^p x_k t^k. $$

For multiplication:

$$ z(t) = x(t)y(t), $$

the coefficient recurrence is:

$$ z_k = \sum_{i=0}^k x_i y_{k-i}. $$

This is discrete convolution.

For division:

$$ z(t) = \frac{x(t)}{y(t)}, $$

coefficients satisfy:

$$ x_k = \sum_{i=0}^k y_i z_{k-i}. $$

Solving recursively gives $z_k$.

Taylor mode therefore becomes a coefficient propagation system.

Directional Higher Derivatives

Taylor mode naturally computes directional higher derivatives.

Suppose:

$$ x(t) = x_0 + tv. $$

Then:

$$ f(x_0 + tv) = \sum_{k=0}^p \frac{1}{k!} D^k f(x_0)[v,\ldots,v] t^k. $$

The coefficients correspond to repeated directional derivatives.

This is valuable because full higher-order derivative tensors are usually too large to materialize.

Taylor mode computes directional expansions efficiently.

Example: Second Directional Derivative

Let

$$ f(x,y)=x^2y. $$

Choose direction:

$$ v=(v_1,v_2). $$

Define:

$$ x(t)=x+tv_1, \quad y(t)=y+tv_2. $$

Then:

$$ f(x(t),y(t)) = (x+tv_1)^2(y+tv_2). $$

Expand:

$$= x^2y + (2xyv_1 + x^2v_2)t + (yv_1^2 + 2xv_1v_2)t^2 + v_1^2v_2 t^3.$$

The coefficient of $t^2$ gives:

$$ \frac{1}{2} D^2f(x,y)[v,v]. $$

Taylor mode extracts this automatically.

Complexity

Suppose:

Symbol Meaning
$C$ cost of evaluating original function
$p$ Taylor order

Then Taylor mode often costs roughly:

$$ O(p^2 C) $$

because coefficient propagation involves convolution-like operations across orders.

More optimized methods can reduce some operations using FFT-like techniques or specialized recurrences.

Still, high-order derivatives become expensive quickly.

Memory Structure

Taylor mode stores coefficients for every intermediate variable.

If the original program has $N$ intermediates and Taylor order $p$, storage is roughly:

$$ O(pN). $$

This is usually better than explicitly materializing high-order derivative tensors.

The representation is compact because it stores directional polynomial structure rather than full multidimensional derivative arrays.

Taylor Mode vs Reverse Mode

Taylor mode and reverse mode have different strengths.

Property Taylor mode Reverse mode
gradients moderate excellent
high-order directional derivatives excellent difficult
full Hessians possible possible
high-order tensors structured memory-heavy
implementation complexity algebraic tape-heavy
nested differentiation simpler harder
scalar-output optimization weaker strong

Taylor mode is particularly attractive for moderate-order derivatives and local series expansions.

Taylor Models

Taylor mode connects naturally to Taylor models used in validated numerics.

A Taylor model represents a function as:

$$ P(x) + R(x), $$

where:

Component Meaning
$P(x)$ polynomial approximation
$R(x)$ rigorous remainder bound

This allows:

Capability Purpose
interval arithmetic rigorous bounds
verified ODE solving guaranteed solutions
uncertainty propagation bounded error
validated optimization correctness guarantees

Taylor-mode AD provides the polynomial component automatically.

Differential Equation Solvers

Taylor mode is especially useful for ODE solvers.

Suppose:

$$ \frac{dx}{dt} = f(x). $$

A Taylor-series integrator computes:

$$ x(t+h) = x(t) + hx'(t) + \frac{h^2}{2!}x''(t) + \cdots $$

Automatic differentiation computes the required derivatives recursively.

This yields high-order integrators with strong local accuracy.

Taylor integrators are widely used in celestial mechanics, validated numerics, and stiff dynamical systems.

Sparse Higher-Order Structure

Full higher-order derivative tensors are enormous.

For dimension $n$:

Order Tensor size
1 $n$
2 $n^2$
3 $n^3$
4 $n^4$

Taylor mode avoids constructing these directly.

Instead it propagates compressed directional structure through coefficient series.

This is often the only practical way to obtain high-order derivative information.

Compiler Representation

A compiler-based Taylor-mode system may lower each variable into coefficient arrays:

x[0] = primal
x[1] = first-order coefficient
x[2] = second-order coefficient
...
x[p] = pth-order coefficient

Operations become coefficient recurrences.

For multiplication:

for k in 0..p:
    z[k] = sum(x[i] * y[k-i] for i in 0..k)

The transformed program resembles polynomial arithmetic.

Numerical Stability

High-order Taylor coefficients can become unstable.

Common problems include:

Problem Cause
coefficient explosion factorial growth
cancellation alternating signs
truncation instability insufficient order
floating point amplification repeated convolutions
stiff dynamics rapidly varying derivatives

Scaling and normalization techniques are often required.

Practical Uses

Taylor mode appears in:

Domain Usage
celestial mechanics high-order integration
validated numerics rigorous bounds
dynamical systems local flow expansion
chaos analysis sensitivity growth
PDE solvers local expansions
perturbation methods asymptotic analysis
scientific computing repeated directional derivatives

It is less common in large neural network training because reverse-mode gradients dominate that workload.

Conceptual Interpretation

Forward mode propagates tangent vectors.

Taylor mode propagates local polynomial geometry.

Instead of asking:

$$ \text{How does the function change infinitesimally?} $$

Taylor mode asks:

$$ \text{What local power series does the function induce?} $$

The program becomes a transformation over truncated polynomial algebras rather than over ordinary scalar values.