Chapter 20

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Motivation for Attention

Sequence models often need to decide which parts of an input are relevant to a particular output.

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Additive attention was one of the first successful neural attention mechanisms. It was introduced for neural machine translation to allow a decoder to selectively focus on different encoder states during generation.

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Dot-Product Attention

Dot-product attention uses an inner product to measure how well a query matches a key.

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Self-Attention

Self-attention is attention applied within a single sequence.

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Cross-Attention

Cross-attention is attention between two different sequences or sources of information. The queries come from one sequence, while the keys and values come from another.

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Multi-Head Attention

Multi-head attention runs several attention operations in parallel. Each head has its own query, key, and value projections. The outputs of the heads are concatenated and projected back to the model dimension.

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Attention Complexity

Attention gives a model direct access between positions in a sequence.

Writes › Book › Deep Learning with PyTorch › Part V › Chapter 20 ›

Summary and Further Reading

Attention is a differentiable retrieval mechanism. A query asks for information, keys define where information can be found, and values carry the content returned to the model.

Sections

Motivation for Attention

Additive Attention

Dot-Product Attention

Self-Attention

Cross-Attention

Multi-Head Attention

Attention Complexity

Summary and Further Reading