Verified Differentiation

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of...

Verified Differentiation

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of elementary operations.

However, practical AD systems are large compiler and runtime systems containing:

Component Complexity
graph transformations compiler correctness
tensor kernels numerical precision
mutation handling state consistency
checkpointing recomputation logic
custom gradients user-defined semantics
distributed execution synchronization correctness

A small mistake may silently produce incorrect gradients.

Verified differentiation studies how to prove that derivative computations are correct with respect to precise mathematical and program semantics.

This area connects automatic differentiation with:

Field Role
formal verification proof of correctness
theorem proving machine-checked reasoning
programming language semantics rigorous program meaning
numerical analysis floating-point guarantees
compiler verification transformation correctness

The goal is stronger guarantees than empirical testing alone.

What Does Correctness Mean?

A derivative system may appear correct while violating important properties.

There are several possible notions of correctness.

Mathematical correctness

The produced derivative matches the mathematical derivative of the represented function.

Program correctness

The transformed program correctly computes the derivative of the original program.

Numerical correctness

Floating-point execution approximates the exact derivative within known bounds.

Semantic correctness

Compiler optimizations preserve derivative meaning.

These notions are related but distinct.

The Central Property

Suppose a program computes

$$ f : X \to Y. $$

An AD transform produces another program

$$ Df : X \times X \to Y \times Y $$

for forward mode.

Correctness informally means:

$$ Df(x,\dot{x}) = (f(x), J_f(x)\dot{x}). $$

For reverse mode, correctness means the generated adjoint computes the transpose Jacobian action:

$$ \bar{x} = J_f(x)^T \bar{y}. $$

A verified AD system attempts to prove these properties formally.

Why Testing Is Not Enough

AD systems are difficult to validate purely through testing.

Finite-difference checks are common:

$$ \frac{ f(x+h)-f(x-h) }{ 2h }. $$

But finite differences have limitations:

Problem Consequence
truncation error inaccurate reference
floating-point cancellation instability
non-smooth functions undefined comparison
high dimensions expensive checking

A system may pass numerical tests while still violating formal derivative semantics.

Formal verification aims to eliminate such ambiguity.

Formal Semantics of Programs

Verification requires precise semantics for the source language.

A language semantics defines:

Construct Meaning
arithmetic numerical interpretation
control flow execution behavior
recursion fixed-point meaning
mutation state transition
exceptions error propagation

Without rigorous semantics, it is impossible to prove that differentiation preserves meaning.

Differentiable programming therefore depends heavily on programming language theory.

Forward Mode Verification

Forward mode is conceptually simpler.

A forward transform augments each value with a tangent:

$$ x \mapsto (x,\dot{x}). $$

Elementary operations are lifted.

Example:

$$ (x,\dot{x}) + (y,\dot{y}) = (x+y,\dot{x}+\dot{y}). $$

Multiplication becomes:

$$ (x,\dot{x})(y,\dot{y}) = (xy,x\dot{y}+y\dot{x}). $$

Verification proves that the transformed operations preserve the derivative interpretation.

The proof usually proceeds structurally over program syntax.

Dual Number Semantics

Forward mode is often verified through dual numbers.

Define:

$$ a+b\varepsilon, \qquad \varepsilon^2=0. $$

Evaluating a function on dual numbers gives:

$$ f(x+\dot{x}\varepsilon) = f(x)+f'(x)\dot{x}\varepsilon. $$

Thus the tangent component corresponds exactly to the derivative.

Formal verification may prove:

Property Meaning
algebra preservation operations respect dual structure
compositionality chain rule correctness
soundness extracted tangent equals derivative

Dual-number semantics provide a clean mathematical model for forward AD.

Reverse Mode Verification

Reverse mode is harder.

The backward pass depends on:

Issue Difficulty
execution trace runtime dependency
mutation overwritten values
control flow dynamic structure
memory reuse aliasing
checkpointing recomputation correctness

A reverse transform effectively constructs a second program that computes cotangent propagation.

Verification must prove that this program computes the transpose Jacobian action correctly.

Pullback Semantics

Reverse mode is often formalized through pullbacks.

Given:

$$ f : X \to Y, $$

reverse mode computes:

$$ f^* : Y^* \to X^*. $$

This maps output cotangents back to input cotangents.

The correctness condition is:

$$ f^*(\bar{y}) = J_f(x)^T\bar{y}. $$

Verification then becomes proof that generated pullback programs implement the correct linear maps.

SSA Verification

Many compiler AD systems operate on SSA intermediate representations.

Example:

x1 = ...
x2 = ...
x3 = mul(x1, x2)

The reverse transform generates:

x1_bar += x2 * x3_bar
x2_bar += x1 * x3_bar

Verification proves:

Property Meaning
data dependency preservation correct adjoint flow
linearity cotangent accumulation validity
optimization safety transformed IR preserves semantics

Compiler verification becomes part of differentiation verification.

Verified Compiler Passes

An optimizing compiler may transform code:

x * 2

into:

x + x

Ordinary semantics remain equivalent.

Derivative semantics must also remain equivalent.

A verified differentiable compiler must prove that optimizations preserve both:

Semantics Requirement
primal semantics same outputs
derivative semantics same derivatives

This significantly increases compiler complexity.

Floating-Point Verification

Real AD systems use floating-point arithmetic.

Floating-point derivatives differ from exact real derivatives because of:

Issue Example
rounding representation error
cancellation subtractive instability
overflow infinite gradients
underflow zero gradients

Formal verification may include error bounds:

$$ |\hat{g}-g| \leq \epsilon. $$

Here:

Symbol Meaning
g exact derivative
ĝ computed derivative

Verified numerical analysis attempts to bound such errors rigorously.

Differentiating Non-Smooth Programs

Many programs contain non-differentiable operations:

Operation Problem
max kink
abs undefined derivative at zero
sort permutation discontinuity
branching path discontinuity

AD systems usually adopt conventions:

Convention Example
subgradient ReLU derivative at zero
one-sided derivative branch convention
arbitrary selection implementation-defined

Verification must specify which generalized derivative notion is intended.

Otherwise correctness claims are ambiguous.

Verified Higher-Order Differentiation

Nested differentiation introduces subtle problems.

Example:

grad(grad(f))

Potential failures include:

Failure Cause
perturbation confusion tangent aliasing
tape reuse incorrect adjoints
mixed-mode mismatch semantic inconsistency

Verified higher-order systems formally separate derivative levels and tangent spaces.

Proof Assistants

Formal verification often uses theorem provers.

Examples include:

System Purpose
Coq constructive proofs
Lean dependent type verification
Isabelle higher-order logic
Agda type-theoretic proofs

These systems allow machine-checked correctness proofs for differentiation transformations.

A proof assistant can verify:

Target Example
derivative rule chain rule implementation
compiler pass SSA adjoint generation
algebraic law linearity
numerical property error bound

Machine-checked proofs reduce reliance on informal reasoning.

Categorical Semantics

Category theory provides abstract semantics for differentiation.

A differentiable computation may be modeled as:

Concept Interpretation
morphism program
tangent functor forward derivative
cotangent structure reverse derivative
composition chain rule

Differential categories formalize differentiation algebraically.

Verification then becomes proof that transformations satisfy categorical laws.

This gives very general semantic guarantees.

Verified Tensor Systems

Tensor compilers introduce additional correctness issues.

A tensor transformation may involve:

Transformation Risk
fusion changed evaluation order
parallelization race conditions
layout optimization indexing errors
distributed reduction synchronization bugs

Verified differentiation must account for tensor algebra semantics as well as scalar calculus.

Probabilistic Verification

Probabilistic differentiation adds stochastic semantics.

Verification questions include:

Question Meaning
estimator unbiasedness expected gradient correctness
variance bounds optimization stability
expectation interchange legality of differentiation
Monte Carlo convergence asymptotic validity

The derivative itself becomes a random variable.

Formal reasoning now combines probability theory with calculus.

Verified Scientific Computing

Scientific simulation increasingly depends on differentiable solvers.

Verification targets include:

System Concern
PDE solver adjoint consistency
ODE integrator discretization correctness
optimization layer KKT derivative validity
differentiable physics conservation laws

A small gradient error may invalidate optimization results in scientific applications.

This makes verification especially important in high-stakes domains.

Certified Gradients

One goal is certified gradients.

Instead of returning only:

$$ \nabla f(x), $$

the system may return:

$$ (\hat{g},\epsilon), $$

where:

Quantity Meaning
ĝ computed gradient
ε proven error bound

This resembles interval arithmetic and verified numerical computation.

Security and Reliability

Incorrect gradients may create security and reliability risks.

Examples include:

Domain Risk
autonomous systems unsafe optimization
medical systems unstable training
financial optimization incorrect sensitivity
scientific inference invalid conclusions

Verification therefore has practical as well as theoretical importance.

Differentiation as a Certified Transformation

A verified differentiable compiler treats AD as a semantics-preserving program transformation.

The compiler must prove:

$$ \text{meaning}(Df) = D(\text{meaning}(f)). $$

This equation captures the central goal.

Differentiation of the program should equal differentiation of the mathematical function represented by the program.

Failure Modes

Verification itself has limitations.

Incorrect specifications

A proof may verify the wrong property.

Incomplete semantics

The language meaning may omit runtime behavior.

Floating-point mismatch

Real execution may differ from mathematical models.

Trusted kernel assumptions

Proof assistants rely on trusted foundations.

Scalability

Large tensor systems are difficult to verify completely.

Thus verification improves confidence but does not eliminate all risk.

Conceptual Shift

Classical AD assumes derivative rules are implemented correctly.

Verified differentiation treats derivative computation itself as an object requiring proof.

The derivative becomes a formally certified computational artifact.

This shifts AD from:

Classical view Verified view
practical algorithm formally verified transformation
empirical testing mathematical proof
heuristic confidence machine-checked correctness

The goal is not only fast gradients, but trustworthy gradients.

Summary

Verified differentiation studies formal correctness of automatic differentiation systems.

This includes proof of derivative semantics, compiler transformations, adjoint generation, floating-point behavior, higher-order differentiation, and stochastic gradient estimators.

The field connects automatic differentiation with theorem proving, formal semantics, compiler verification, and numerical analysis.

As differentiable systems become infrastructure for science, engineering, and autonomous decision systems, verified differentiation becomes increasingly important for reliability, safety, and mathematical trustworthiness.