Signal Processing

Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an image, an audio waveform, a radar return, a sensor stream, or a multidimensional field.

Automatic differentiation is useful because many signal processing systems can be written as differentiable programs:

$$ x \to T_\theta(x) \to y \to L(y) $$

where $x$ is an input signal, $T_\theta$ is a transform or filter with parameters $\theta$, and $L$ is an objective. AD gives derivatives with respect to filter coefficients, model parameters, input samples, reconstruction variables, or calibration constants.

Linear Filtering

A discrete linear time-invariant filter has the form

$$ y[n] = \sum_{k} h[k]x[n-k], $$

where $h$ is the impulse response.

This is convolution:

$$ y = h * x. $$

If a loss depends on the filtered output,

$$ L = \ell(y), $$

then AD computes derivatives with respect to both the signal and the filter:

$$ \frac{\partial L}{\partial x}, \qquad \frac{\partial L}{\partial h}. $$

The reverse pass through convolution is another convolution-like operation. This is one reason convolutional neural networks fit naturally into AD systems: backpropagation through convolution is a standard signal processing operation.

Frequency-Domain Methods

Many signal processing algorithms use the Fourier transform.

$$ X[k] = \sum_{n=0}^{N-1} x[n]e^{-2\pi i kn/N}. $$

The inverse transform reconstructs the signal:

$$ x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]e^{2\pi i kn/N}. $$

Because the discrete Fourier transform is linear, its derivative is the transform itself. Reverse-mode differentiation through an FFT uses the corresponding adjoint transform.

This makes frequency-domain objectives easy to differentiate, for example:

$$ L(x) = |, |\operatorname{FFT}(x)| - a ,|^2. $$

Such objectives appear in phase retrieval, spectral matching, audio synthesis, diffraction imaging, and inverse scattering.

Complex-Valued Differentiation

Signal processing often uses complex numbers. AD systems must define how derivatives behave for complex-valued programs.

A complex signal can be treated as a pair of real signals:

$$ z = x + iy. $$

For real-valued losses,

$$ L : \mathbb{C}^n \to \mathbb{R}, $$

gradients are usually interpreted with respect to the real and imaginary parts. This avoids assuming that every operation is holomorphic. Many common operations, such as magnitude,

$$ |z| = \sqrt{x^2+y^2}, $$

are not complex analytic, but they are differentiable as real functions except at singular points.

Practical AD systems for signal processing should specify whether gradients use real-imaginary derivatives or Wirtinger-style notation.

Adaptive Filters

An adaptive filter updates its coefficients from data. A simple finite impulse response filter computes

$$ \hat y[n] = \sum_{k=0}^{K-1} w_k x[n-k]. $$

Given target signal $d[n]$, the error is

$$ e[n]=d[n]-\hat y[n]. $$

The least-squares objective is

$$ L(w) = \frac{1}{2} \sum_n e[n]^2. $$

AD gives

$$ \nabla_w L, $$

which can be used in gradient descent, stochastic gradient descent, or more specialized adaptive algorithms.

Traditional algorithms such as LMS can be viewed as hand-derived gradient methods. AD generalizes the same idea to complex filter structures where manual derivative derivation becomes tedious.

Inverse Problems in Signal Processing

Many signal processing tasks are inverse problems. We observe

$$ y = A x + \epsilon, $$

and want to recover $x$. Here $A$ may represent blur, downsampling, masking, compression, room acoustics, sensor geometry, or a Fourier sampling operator.

A common reconstruction objective is

$$ L(x) = \frac{1}{2}|Ax-y|^2 + \lambda R(x), $$

where $R(x)$ is a prior or regularizer.

AD supports reconstruction by providing gradients with respect to $x$. This allows the use of generic optimization methods even when $A$, $R$, or both are implemented as programs rather than matrices.

Examples include:

Task	Forward operator
Deblurring	Convolution with point-spread function
Super-resolution	Blur then downsample
Compressed sensing	Random projection or masked Fourier samples
MRI reconstruction	Fourier sampling on k-space
Audio dereverberation	Room impulse response convolution
Tomographic reconstruction	Projection operator

Differentiable Transforms

Many transforms are differentiable or piecewise differentiable:

Transform	AD treatment
FFT	Linear adjoint transform
DCT	Linear adjoint transform
Wavelet transform	Filter-bank derivatives
Short-time Fourier transform	Windowed linear transform
Mel filterbank	Matrix multiplication and nonlinear scaling
Cepstrum	FFT, log magnitude, inverse FFT
Convolution	Cross-correlation in reverse pass

This makes AD useful for end-to-end systems that combine classical signal processing with learned models.

For example, an audio model may compute:

waveform
    -> STFT
    -> magnitude
    -> mel filterbank
    -> log compression
    -> neural model
    -> loss

The whole pipeline can be differentiated, subject to care around zeros, logarithms, and magnitude operations.

Sparse and Structured Signals

Many signals have sparse structure. Compressed sensing uses the assumption that $x$ is sparse in some basis:

$$ x = \Psi \alpha, $$

where most entries of $\alpha$ are zero or near zero.

A typical objective is

$$ L(\alpha) = \frac{1}{2}|A\Psi\alpha-y|^2 + \lambda |\alpha|_1. $$

The $\ell_1$ norm is non-smooth at zero. AD can still compute subgradient-like values depending on implementation, but optimization requires care.

Smooth approximations are often used:

$$ |\alpha| \approx \sqrt{\alpha^2+\epsilon}. $$

This gives stable gradients while preserving sparsity pressure.

Differentiable Sampling

Sampling and resampling are central in signal processing. For a continuous signal approximation $x(t)$, resampling evaluates

$$ y[n]=x(\tau_n), $$

where $\tau_n$ may depend on parameters.

If interpolation is differentiable, then AD can compute derivatives with respect to both samples and sampling locations.

This is useful in:

image registration,
time warping,
differentiable rendering,
beamforming,
sensor calibration,
spatial transformer networks.

Nearest-neighbor sampling is discontinuous with respect to coordinates. Linear, cubic, or spline interpolation gives more useful derivatives.

Beamforming and Array Processing

Array processing combines measurements from multiple sensors. A beamformer output may be

$$ y(t) = \sum_m w_m x_m(t-\tau_m(\theta)), $$

where $w_m$ are weights and $\tau_m(\theta)$ are direction-dependent delays.

AD can compute derivatives with respect to:

Variable	Meaning
$w_m$	Beamforming weights
$\theta$	Source direction
Sensor positions	Array calibration
Signal samples	Input sensitivity

Differentiable beamforming appears in radar, sonar, microphone arrays, radio astronomy, and wireless communication.

State-Space Models and Kalman Filters

Signal processing often represents systems with state-space models:

$$ x_{t+1}=A x_t + B u_t + \eta_t, $$

$$ y_t=C x_t + \epsilon_t. $$

Kalman filtering estimates hidden states from noisy measurements. Its recursion is differentiable, provided matrix inversions and covariance updates are handled carefully.

AD can differentiate a Kalman filter with respect to:

transition matrices,
observation matrices,
noise covariance parameters,
initial state,
control parameters.

For long sequences, reverse-mode AD faces the same memory issue as recurrent neural networks. Checkpointing or custom smoother adjoints may be needed.

Phase Retrieval

Phase retrieval reconstructs a signal from magnitude-only measurements:

$$ y = |Ax|^2. $$

The phase is missing, so the inverse problem is nonconvex.

A loss may be

$$ L(x) = \frac{1}{2} | |Ax|^2 - y |^2. $$

AD computes gradients through the linear transform, magnitude, and squared magnitude. This allows gradient-based reconstruction methods, though initialization and nonconvexity remain major issues.

Learned Signal Processing

Modern systems often combine hand-designed transforms with learned components. Examples include:

learned denoisers,
neural vocoders,
differentiable codecs,
learned image reconstruction,
neural beamformers,
learned compression,
differentiable equalizers.

A useful design pattern is to keep classical signal-processing operators explicit and differentiable, then insert learned modules where the model lacks structure.

This gives a hybrid pipeline:

structured physical transform
    -> learned correction
    -> differentiable objective
    -> gradient-based training or inference

Numerical Issues

Signal processing pipelines contain operations that can produce unstable gradients.

Operation	Issue
$\log x$	Singular near zero
$	z
Phase angle	Discontinuous modulo $2\pi$
Hard thresholding	Zero or undefined gradients
Quantization	Discontinuous
Clipping	Saturated gradients
Sorting peaks	Non-smooth selection

Practical differentiable systems often replace hard operations with smooth approximations during training or optimization.

Examples include soft thresholding, smooth clipping, differentiable peak picking, and noise-aware objectives.

Custom Derivative Rules

Generic AD can differentiate most signal processing code, but custom rules are often better.

Component	Better derivative rule
FFT	Use known adjoint transform
Convolution	Use optimized correlation kernels
Linear solve	Use transpose solve
Interpolation	Define boundary derivative explicitly
Quantization	Use surrogate gradient if needed
STFT/ISTFT	Preserve window and overlap-add structure

Custom rules improve performance and make derivative semantics explicit.

Summary

Signal processing is a natural domain for automatic differentiation because many operators are linear, structured, and compositional. AD provides gradients through filters, transforms, reconstruction objectives, adaptive systems, beamformers, and state-space estimators.

The main difficulties come from complex-valued computations, non-smooth operations, sampling decisions, quantization, phase ambiguity, and long recurrent filters. Effective differentiable signal processing combines AD with known adjoint operators, stable numerical design, and carefully chosen smooth approximations.