Verified Differentiation

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of elementary operations.

However, practical AD systems are large compiler and runtime systems containing:

Component	Complexity
graph transformations	compiler correctness
tensor kernels	numerical precision
mutation handling	state consistency
checkpointing	recomputation logic
custom gradients	user-defined semantics
distributed execution	synchronization correctness

A small mistake may silently produce incorrect gradients.

Verified differentiation studies how to prove that derivative computations are correct with respect to precise mathematical and program semantics.

This area connects automatic differentiation with:

Field	Role
formal verification	proof of correctness
theorem proving	machine-checked reasoning
programming language semantics	rigorous program meaning
numerical analysis	floating-point guarantees
compiler verification	transformation correctness

The goal is stronger guarantees than empirical testing alone.

What Does Correctness Mean?

A derivative system may appear correct while violating important properties.

There are several possible notions of correctness.

Mathematical correctness

The produced derivative matches the mathematical derivative of the represented function.

Program correctness

The transformed program correctly computes the derivative of the original program.

Numerical correctness

Floating-point execution approximates the exact derivative within known bounds.

Semantic correctness

Compiler optimizations preserve derivative meaning.

These notions are related but distinct.

The Central Property

Suppose a program computes

$$ f : X \to Y. $$

An AD transform produces another program

$$ Df : X \times X \to Y \times Y $$

for forward mode.

Correctness informally means:

$$ Df(x,\dot{x}) = (f(x), J_f(x)\dot{x}). $$

For reverse mode, correctness means the generated adjoint computes the transpose Jacobian action:

$$ \bar{x} = J_f(x)^T \bar{y}. $$

A verified AD system attempts to prove these properties formally.

Why Testing Is Not Enough

AD systems are difficult to validate purely through testing.

Finite-difference checks are common:

$$ \frac{ f(x+h)-f(x-h) }{ 2h }. $$

But finite differences have limitations:

Problem	Consequence
truncation error	inaccurate reference
floating-point cancellation	instability
non-smooth functions	undefined comparison
high dimensions	expensive checking

A system may pass numerical tests while still violating formal derivative semantics.

Formal verification aims to eliminate such ambiguity.

Formal Semantics of Programs

Verification requires precise semantics for the source language.

A language semantics defines:

Construct	Meaning
arithmetic	numerical interpretation
control flow	execution behavior
recursion	fixed-point meaning
mutation	state transition
exceptions	error propagation

Without rigorous semantics, it is impossible to prove that differentiation preserves meaning.

Differentiable programming therefore depends heavily on programming language theory.

Forward Mode Verification

Forward mode is conceptually simpler.

A forward transform augments each value with a tangent:

$$ x \mapsto (x,\dot{x}). $$

Elementary operations are lifted.

Example:

$$ (x,\dot{x}) + (y,\dot{y}) = (x+y,\dot{x}+\dot{y}). $$

Multiplication becomes:

$$ (x,\dot{x})(y,\dot{y}) = (xy,x\dot{y}+y\dot{x}). $$

Verification proves that the transformed operations preserve the derivative interpretation.

The proof usually proceeds structurally over program syntax.

Dual Number Semantics

Forward mode is often verified through dual numbers.

Define:

$$ a+b\varepsilon, \qquad \varepsilon^2=0. $$

Evaluating a function on dual numbers gives:

$$ f(x+\dot{x}\varepsilon) = f(x)+f'(x)\dot{x}\varepsilon. $$

Thus the tangent component corresponds exactly to the derivative.

Formal verification may prove:

Property	Meaning
algebra preservation	operations respect dual structure
compositionality	chain rule correctness
soundness	extracted tangent equals derivative

Dual-number semantics provide a clean mathematical model for forward AD.

Reverse Mode Verification

Reverse mode is harder.

The backward pass depends on:

Issue	Difficulty
execution trace	runtime dependency
mutation	overwritten values
control flow	dynamic structure
memory reuse	aliasing
checkpointing	recomputation correctness

A reverse transform effectively constructs a second program that computes cotangent propagation.

Verification must prove that this program computes the transpose Jacobian action correctly.

Pullback Semantics

Reverse mode is often formalized through pullbacks.

Given:

$$ f : X \to Y, $$

reverse mode computes:

$$ f^* : Y^* \to X^*. $$

This maps output cotangents back to input cotangents.

The correctness condition is:

$$ f^*(\bar{y}) = J_f(x)^T\bar{y}. $$

Verification then becomes proof that generated pullback programs implement the correct linear maps.

SSA Verification

Many compiler AD systems operate on SSA intermediate representations.

Example:

x1 = ...
x2 = ...
x3 = mul(x1, x2)

The reverse transform generates:

x1_bar += x2 * x3_bar
x2_bar += x1 * x3_bar

Verification proves:

Property	Meaning
data dependency preservation	correct adjoint flow
linearity	cotangent accumulation validity
optimization safety	transformed IR preserves semantics

Compiler verification becomes part of differentiation verification.

Verified Compiler Passes

An optimizing compiler may transform code:

x * 2

into:

x + x

Ordinary semantics remain equivalent.

Derivative semantics must also remain equivalent.

A verified differentiable compiler must prove that optimizations preserve both:

Semantics	Requirement
primal semantics	same outputs
derivative semantics	same derivatives

This significantly increases compiler complexity.

Floating-Point Verification

Real AD systems use floating-point arithmetic.

Floating-point derivatives differ from exact real derivatives because of:

Issue	Example
rounding	representation error
cancellation	subtractive instability
overflow	infinite gradients
underflow	zero gradients

Formal verification may include error bounds:

$$ |\hat{g}-g| \leq \epsilon. $$

Here:

Symbol	Meaning
`g`	exact derivative
`ĝ`	computed derivative

Verified numerical analysis attempts to bound such errors rigorously.

Differentiating Non-Smooth Programs

Many programs contain non-differentiable operations:

Operation	Problem
max	kink
abs	undefined derivative at zero
sort	permutation discontinuity
branching	path discontinuity

AD systems usually adopt conventions:

Convention	Example
subgradient	ReLU derivative at zero
one-sided derivative	branch convention
arbitrary selection	implementation-defined

Verification must specify which generalized derivative notion is intended.

Otherwise correctness claims are ambiguous.

Verified Higher-Order Differentiation

Nested differentiation introduces subtle problems.

Example:

grad(grad(f))

Potential failures include:

Failure	Cause
perturbation confusion	tangent aliasing
tape reuse	incorrect adjoints
mixed-mode mismatch	semantic inconsistency

Verified higher-order systems formally separate derivative levels and tangent spaces.

Proof Assistants

Formal verification often uses theorem provers.

Examples include:

System	Purpose
Coq	constructive proofs
Lean	dependent type verification
Isabelle	higher-order logic
Agda	type-theoretic proofs

These systems allow machine-checked correctness proofs for differentiation transformations.

A proof assistant can verify:

Target	Example
derivative rule	chain rule implementation
compiler pass	SSA adjoint generation
algebraic law	linearity
numerical property	error bound

Machine-checked proofs reduce reliance on informal reasoning.

Categorical Semantics

Category theory provides abstract semantics for differentiation.

A differentiable computation may be modeled as:

Concept	Interpretation
morphism	program
tangent functor	forward derivative
cotangent structure	reverse derivative
composition	chain rule

Differential categories formalize differentiation algebraically.

Verification then becomes proof that transformations satisfy categorical laws.

This gives very general semantic guarantees.

Verified Tensor Systems

Tensor compilers introduce additional correctness issues.

A tensor transformation may involve:

Transformation	Risk
fusion	changed evaluation order
parallelization	race conditions
layout optimization	indexing errors
distributed reduction	synchronization bugs

Verified differentiation must account for tensor algebra semantics as well as scalar calculus.

Probabilistic Verification

Probabilistic differentiation adds stochastic semantics.

Verification questions include:

Question	Meaning
estimator unbiasedness	expected gradient correctness
variance bounds	optimization stability
expectation interchange	legality of differentiation
Monte Carlo convergence	asymptotic validity

The derivative itself becomes a random variable.

Formal reasoning now combines probability theory with calculus.

Verified Scientific Computing

Scientific simulation increasingly depends on differentiable solvers.

Verification targets include:

System	Concern
PDE solver	adjoint consistency
ODE integrator	discretization correctness
optimization layer	KKT derivative validity
differentiable physics	conservation laws

A small gradient error may invalidate optimization results in scientific applications.

This makes verification especially important in high-stakes domains.

Certified Gradients

One goal is certified gradients.

Instead of returning only:

$$ \nabla f(x), $$

the system may return:

$$ (\hat{g},\epsilon), $$

where:

Quantity	Meaning
`ĝ`	computed gradient
`ε`	proven error bound

This resembles interval arithmetic and verified numerical computation.

Security and Reliability

Incorrect gradients may create security and reliability risks.

Examples include:

Domain	Risk
autonomous systems	unsafe optimization
medical systems	unstable training
financial optimization	incorrect sensitivity
scientific inference	invalid conclusions

Verification therefore has practical as well as theoretical importance.

Differentiation as a Certified Transformation

A verified differentiable compiler treats AD as a semantics-preserving program transformation.

The compiler must prove:

$$ \text{meaning}(Df) = D(\text{meaning}(f)). $$

This equation captures the central goal.

Differentiation of the program should equal differentiation of the mathematical function represented by the program.

Failure Modes

Verification itself has limitations.

Incorrect specifications

A proof may verify the wrong property.

Incomplete semantics

The language meaning may omit runtime behavior.

Floating-point mismatch

Real execution may differ from mathematical models.

Trusted kernel assumptions

Proof assistants rely on trusted foundations.

Scalability

Large tensor systems are difficult to verify completely.

Thus verification improves confidence but does not eliminate all risk.

Conceptual Shift

Classical AD assumes derivative rules are implemented correctly.

Verified differentiation treats derivative computation itself as an object requiring proof.

The derivative becomes a formally certified computational artifact.

This shifts AD from:

Classical view	Verified view
practical algorithm	formally verified transformation
empirical testing	mathematical proof
heuristic confidence	machine-checked correctness

The goal is not only fast gradients, but trustworthy gradients.

Summary

Verified differentiation studies formal correctness of automatic differentiation systems.

This includes proof of derivative semantics, compiler transformations, adjoint generation, floating-point behavior, higher-order differentiation, and stochastic gradient estimators.

The field connects automatic differentiation with theorem proving, formal semantics, compiler verification, and numerical analysis.

As differentiable systems become infrastructure for science, engineering, and autonomous decision systems, verified differentiation becomes increasingly important for reliability, safety, and mathematical trustworthiness.