Chapter 138. Tensor Decompositions

Chapter 138. Tensor Decompositions

138.1 Introduction

Tensor decompositions generalize matrix decompositions to multidimensional arrays.

A matrix has two modes: rows and columns. A tensor may have three, four, or more modes. For example, an image dataset may have modes for height, width, color channel, and image index. A recommender system may have modes for user, item, and time. A knowledge graph may have modes for subject, relation, and object.

Tensor decompositions express a large tensor using smaller factors. They are used for compression, denoising, latent factor modeling, feature extraction, and multidimensional data analysis. The two classical models are CP decomposition and Tucker decomposition, often viewed as higher-order analogues of matrix factorization methods such as SVD and PCA.

138.2 Tensors

A tensor is a multidimensional array.

A vector is a first-order tensor:

$$ x_i. $$

A matrix is a second-order tensor:

$$ A_{ij}. $$

A third-order tensor has entries

$$ \mathcal{X}_{ijk}. $$

More generally, an (N)-th order tensor has entries

$$ \mathcal{X}_{i_1 i_2 \cdots i_N}. $$

The number of indices is called the order of the tensor. Each index direction is called a mode.

For example, if

$$ \mathcal{X}\in \mathbb{R}^{I\times J\times K}, $$

then (\mathcal{X}) is a third-order tensor with three modes of sizes (I), (J), and (K).

138.3 Why Tensors Are Different from Matrices

Matrices already support many powerful decompositions:

Matrix decomposition Purpose
LU Solving linear systems
QR Orthogonalization and least squares
Eigendecomposition Spectral analysis
SVD Low-rank approximation

Tensors do not inherit all matrix properties directly.

For matrices, rank is well behaved. Every matrix has a best rank-(r) approximation under the Frobenius norm, given by truncated SVD.

For tensors, rank is more complicated. A best low-rank approximation may fail to exist in some models. Rank can be hard to compute. Decompositions may be unique in some cases and ill-conditioned in others.

Thus tensor decompositions are not merely matrix decompositions with more indices. They form a distinct theory.

138.4 Fibers and Slices

A fiber is obtained by fixing all tensor indices except one.

For a third-order tensor

$$ \mathcal{X}_{ijk}, $$

examples include:

Fiber type Fixed indices Varying index
Mode-1 fiber (j,k) fixed (i)
Mode-2 fiber (i,k) fixed (j)
Mode-3 fiber (i,j) fixed (k)

A slice is obtained by fixing one index.

For a third-order tensor:

Slice type Fixed index Result
Frontal slice (k) fixed Matrix in (i,j)
Lateral slice (j) fixed Matrix in (i,k)
Horizontal slice (i) fixed Matrix in (j,k)

Fibers generalize rows and columns. Slices generalize matrix subarrays.

138.5 Tensor Unfolding

Tensor unfolding, also called matricization or flattening, converts a tensor into a matrix.

For a third-order tensor

$$ \mathcal{X}\in\mathbb{R}^{I\times J\times K}, $$

the mode-1 unfolding is a matrix

$$ X_{(1)}\in\mathbb{R}^{I\times JK}. $$

It arranges all mode-1 fibers as columns.

Similarly:

$$ X_{(2)}\in\mathbb{R}^{J\times IK}, $$

and

$$ X_{(3)}\in\mathbb{R}^{K\times IJ}. $$

Unfolding is important because many tensor algorithms reduce parts of the problem to matrix operations. It allows SVD, least squares, and rank computations to be applied mode by mode.

138.6 Mode Products

The mode-(n) product multiplies a tensor by a matrix along one mode.

Let

$$ \mathcal{X}\in\mathbb{R}^{I_1\times\cdots\times I_N} $$

and let

$$ U\in\mathbb{R}^{J\times I_n}. $$

The mode-(n) product is written

$$ \mathcal{Y} = \mathcal{X}\times_n U. $$

The resulting tensor has size

$$ I_1\times\cdots\times I_{n-1}\times J\times I_{n+1}\times\cdots\times I_N. $$

In coordinates,

$$ \mathcal{Y}{i_1\cdots i{n-1}j i_{n+1}\cdots i_N} = \sum_{i_n=1}^{I_n} \mathcal{X}{i_1\cdots i_n\cdots i_N}U{j i_n}. $$

Mode products are the basic algebraic operation behind Tucker decompositions and multilinear transformations.

138.7 Rank-One Tensors

A rank-one tensor is an outer product of vectors.

For vectors

$$ a\in\mathbb{R}^I,\qquad b\in\mathbb{R}^J,\qquad c\in\mathbb{R}^K, $$

their outer product is the tensor

$$ a\circ b\circ c $$

with entries

$$ (a\circ b\circ c)_{ijk}=a_i b_j c_k. $$

This generalizes the rank-one matrix

$$ ab^T. $$

Rank-one tensors are the building blocks of CP decomposition.

138.8 CP Decomposition

The CP decomposition writes a tensor as a sum of rank-one tensors.

For a third-order tensor,

$$ \mathcal{X} \approx \sum_{r=1}^{R} \lambda_r a_r\circ b_r\circ c_r. $$

Here:

Symbol Meaning
(R) Number of components
(\lambda_r) Component weight
(a_r,b_r,c_r) Factor vectors

The CP model is also called CANDECOMP/PARAFAC.

In coordinates,

$$ \mathcal{X}{ijk} \approx \sum{r=1}^{R} \lambda_r a_{ir}b_{jr}c_{kr}. $$

The factor vectors are often collected into factor matrices:

$$ A=[a_1\ \cdots\ a_R], \quad B=[b_1\ \cdots\ b_R], \quad C=[c_1\ \cdots\ c_R]. $$

CP decomposition is useful when one wants a small number of latent components that jointly explain all modes.

138.9 Tensor Rank

The CP rank of a tensor is the smallest number (R) such that

$$ \mathcal{X} = \sum_{r=1}^{R} a_r\circ b_r\circ c_r. $$

This generalizes matrix rank.

However, tensor rank is much more difficult than matrix rank.

For matrices, rank can be computed by Gaussian elimination or SVD. For tensors, determining rank is generally hard. Tensor rank also depends more subtly on the scalar field. A tensor may have different rank over (\mathbb{R}) and over (\mathbb{C}).

The complexity of tensor rank is one reason tensor decomposition theory is technically difficult.

138.10 CP Uniqueness

CP decomposition can be unique under conditions where matrix factorization would not be.

This is one of the attractive features of CP models. A matrix factorization

$$ X\approx AB^T $$

has a rotational ambiguity:

$$ AB^T=(AQ)(BQ^{-T})^T $$

for any invertible matrix (Q).

CP decompositions can avoid this ambiguity because three or more modes constrain the components more strongly.

Uniqueness is usually meant up to:

  1. Permutation of components,
  2. Rescaling of factors within each component.

For example,

$$ a_r\circ b_r\circ c_r = (\alpha a_r)\circ(\beta b_r)\circ(\gamma c_r) $$

whenever

$$ \alpha\beta\gamma=1. $$

138.11 Tucker Decomposition

The Tucker decomposition writes a tensor as a smaller core tensor multiplied by factor matrices along each mode.

For a third-order tensor,

$$ \mathcal{X} \approx \mathcal{G} \times_1 A \times_2 B \times_3 C. $$

Here:

Symbol Meaning
(\mathcal{G}) Core tensor
(A,B,C) Factor matrices
(\times_n) Mode-(n) product

In coordinates,

$$ \mathcal{X}{ijk} \approx \sum{p=1}^{P} \sum_{q=1}^{Q} \sum_{r=1}^{R} \mathcal{G}{pqr} A{ip}B_{jq}C_{kr}. $$

The Tucker model separates mode-specific bases from the interactions among components. The core tensor records how components from different modes interact. Tucker decomposition is commonly described as a multilinear generalization of PCA, with a core tensor and factor matrices for the modes.

138.12 Tucker Rank

The Tucker rank of a tensor is the tuple of ranks of its unfoldings.

For a third-order tensor,

$$ \operatorname{rank}_{\mathrm{Tucker}}(\mathcal{X}) = (r_1,r_2,r_3), $$

where

$$ r_n=\operatorname{rank}(X_{(n)}). $$

Unlike CP rank, Tucker rank is a tuple rather than a single number.

This reflects the fact that each mode may have a different intrinsic dimension.

For example, an image dataset may require many spatial components but fewer lighting or viewpoint components.

138.13 CP versus Tucker

CP and Tucker decompositions serve different purposes.

Feature CP decomposition Tucker decomposition
Basic form Sum of rank-one tensors Core tensor times factor matrices
Rank parameter Single integer (R) Tuple ((r_1,\ldots,r_N))
Core tensor Diagonal or implicit Full smaller tensor
Interpretability Components often directly interpretable Factors plus interactions
Flexibility More constrained More flexible
Compression Strong when CP rank is small Strong when multilinear rank is small

CP is often preferred when one wants identifiable latent components. Tucker is often preferred when one wants compression, dimensional reduction, or mode-wise bases.

138.14 Higher-Order SVD

The higher-order singular value decomposition, or HOSVD, is a Tucker-type decomposition computed from SVDs of tensor unfoldings.

For each mode (n), compute an SVD of

$$ X_{(n)}. $$

Use the leading left singular vectors as the columns of the factor matrix

$$ U^{(n)}. $$

Then define the core by projecting the tensor onto these mode bases:

$$ \mathcal{G} = \mathcal{X} \times_1 (U^{(1)})^T \times_2 (U^{(2)})^T \cdots \times_N (U^{(N)})^T. $$

The approximation is then

$$ \mathcal{X} \approx \mathcal{G} \times_1 U^{(1)} \times_2 U^{(2)} \cdots \times_N U^{(N)}. $$

HOSVD is a direct and stable way to compute Tucker approximations.

138.15 Higher-Order Orthogonal Iteration

Higher-order orthogonal iteration, or HOOI, improves a Tucker approximation iteratively.

The method alternates over modes. For each mode, it fixes all other factor matrices and updates the current factor matrix using an SVD of a projected unfolding.

This is analogous to alternating least squares.

HOOI usually starts from HOSVD and then refines the factor matrices.

It is often more accurate than plain HOSVD for a fixed multilinear rank, but it is iterative and may converge to a local optimum.

138.16 Alternating Least Squares

Alternating least squares, or ALS, is a standard method for computing tensor decompositions.

For CP decomposition, ALS fixes all factor matrices except one, then solves a least squares problem for the remaining factor matrix. It cycles through the modes until convergence.

For a third-order CP model:

  1. Fix (B) and (C), solve for (A),
  2. Fix (A) and (C), solve for (B),
  3. Fix (A) and (B), solve for (C),
  4. Repeat.

ALS is simple and widely used, but it can be slow, sensitive to initialization, and affected by degeneracy.

Recent algorithms also use randomized sketching to reduce the cost of least squares subproblems for large tensors.

138.17 Tensor Train Decomposition

Tensor train decomposition represents a high-order tensor as a chain of smaller core tensors.

For an (N)-th order tensor,

$$ \mathcal{X}{i_1\cdots i_N} = G^{(1)}{i_1} G^{(2)}{i_2} \cdots G^{(N)}{i_N}, $$

where the product is matrix multiplication over hidden rank indices.

More explicitly, each core has size

$$ r_{n-1}\times I_n\times r_n, $$

with boundary ranks

$$ r_0=r_N=1. $$

Tensor trains are useful for very high-order tensors, where CP and Tucker may become too large.

Tensor train methods are implemented in modern tensor libraries, including SVD-based and cross-approximation variants.

138.18 Hierarchical Tucker Decomposition

Hierarchical Tucker decomposition organizes modes in a tree.

Instead of using one large core tensor as in Tucker decomposition, it recursively groups modes and represents interactions through smaller transfer tensors.

This avoids the exponential growth of the Tucker core in high order.

Hierarchical tensor formats are important in scientific computing, numerical PDEs, quantum many-body simulation, and high-dimensional approximation.

They trade a single global core for a hierarchy of smaller cores.

138.19 Nonnegative Tensor Factorization

Nonnegative tensor factorization imposes nonnegativity constraints on the factors.

If

$$ \mathcal{X}_{ijk}\ge0, $$

one may seek a decomposition with

$$ A\ge0,\qquad B\ge0,\qquad C\ge0. $$

This is useful when the data represents counts, intensities, probabilities, or other nonnegative quantities.

Examples include:

Data type Interpretation
Documents by words by time Topics over time
Users by items by contexts Recommender patterns
Images by pixels by conditions Parts-based features
Brain signals Nonnegative activations

Nonnegativity often improves interpretability.

138.20 Symmetric Tensor Decomposition

A symmetric tensor is invariant under permutations of its indices.

For example,

$$ \mathcal{X}{ijk} = \mathcal{X}{jik} = \mathcal{X}_{kij} $$

for all index permutations.

A symmetric CP decomposition has the form

$$ \mathcal{X} \approx \sum_{r=1}^R \lambda_r a_r\circ a_r\circ a_r. $$

Symmetric tensors arise from homogeneous polynomials, moments, cumulants, and higher-order derivatives.

For example, a cubic form

$$ p(x)=\sum_{ijk}\mathcal{X}_{ijk}x_i x_j x_k $$

corresponds to a symmetric third-order tensor.

138.21 Tensor Completion

Tensor completion estimates missing entries of a tensor.

Suppose only some entries of

$$ \mathcal{X} $$

are observed. The goal is to recover the full tensor by assuming low-rank structure.

This generalizes matrix completion.

Applications include:

Application Missing entries
Recommender systems User-item-time ratings
Sensor networks Missing measurements
Image and video Damaged pixels or frames
Medical data Incomplete patient measurements
Knowledge graphs Missing relations

Tensor completion uses the idea that multidimensional structure can constrain missing values more strongly than matrix structure alone.

138.22 Applications

Tensor decompositions appear in many areas.

Area Tensor modes
Chemometrics Sample, wavelength, time
Recommender systems User, item, context
Computer vision Image, illumination, pose, identity
Neuroscience Channel, time, trial
Natural language processing Word, context, document
Knowledge graphs Subject, relation, object
Scientific computing Spatial and parameter dimensions

The key advantage is that tensors preserve multiway structure. Flattening a tensor into a matrix can destroy relationships among modes.

138.23 Numerical Issues

Tensor decomposition algorithms face several numerical issues.

Issue Description
Initialization Different starts can give different solutions
Local minima Objective functions are nonconvex
Degeneracy Components may diverge while approximation improves
Scaling ambiguity Factors can be rescaled without changing the tensor
Permutation ambiguity Components can be reordered
Rank selection Correct rank may be unknown
Missing data Observed entries may be sparse or biased

These issues make diagnostics important. One usually checks reconstruction error, stability across runs, factor interpretability, and sensitivity to rank.

138.24 Relation to Linear Algebra

Tensor decompositions extend matrix decompositions but do not simply copy them.

Matrix concept Tensor analogue
Matrix rank CP rank, Tucker rank
SVD HOSVD, Tucker decomposition
Rank-one matrix Rank-one tensor
Low-rank approximation CP, Tucker, tensor train approximation
Matrix completion Tensor completion
PCA Multilinear PCA
Matrix factorization Multiway factorization

The central new feature is interaction among more than two modes. This creates richer models and more difficult mathematics.

138.25 Summary

Tensor decompositions express multidimensional arrays using smaller structured factors.

The main concepts are:

Concept Meaning
Tensor Multidimensional array
Mode Index direction
Fiber One-dimensional slice
Slice Lower-dimensional section
Unfolding Matrix view of a tensor
Mode product Multiplication along one mode
Rank-one tensor Outer product of vectors
CP decomposition Sum of rank-one tensors
Tucker decomposition Core tensor times factor matrices
Tucker rank Rank tuple of tensor unfoldings
HOSVD SVD-based Tucker decomposition
ALS Alternating least squares fitting
Tensor train Chain of small tensor cores
Tensor completion Recovery from missing entries

Tensor decompositions provide the multilinear extension of matrix factorization. They preserve multiway structure and give tools for compression, latent modeling, denoising, completion, and high-dimensional numerical computation.