Chapter 15

Writes › Book › Deep Learning with PyTorch › Part IV › Chapter 15 ›

Attention Mechanisms

Attention is a method for letting a model choose which parts of an input are most relevant when producing an output.

Writes › Book › Deep Learning with PyTorch › Part IV › Chapter 15 ›

Self-attention is attention applied within a single sequence. The same input supplies the queries, keys, and values. Each position builds a new representation by reading from other positions in the same sequence.

Writes › Book › Deep Learning with PyTorch › Part IV › Chapter 15 ›

Multi-Head Attention

Multi-head attention runs several attention operations in parallel.

Writes › Book › Deep Learning with PyTorch › Part IV › Chapter 15 ›

Positional Encoding

Self-attention compares tokens by content. By itself, it has no built-in notion of token order.

Writes › Book › Deep Learning with PyTorch › Part IV › Chapter 15 ›

Transformer Encoders

A transformer encoder is a stack of layers that maps an input sequence to a contextual sequence representation.

Writes › Book › Deep Learning with PyTorch › Part IV › Chapter 15 ›

Transformer Decoders

A transformer decoder maps a partial output sequence to predictions for the next token or next output step.

Writes › Book › Deep Learning with PyTorch › Part IV › Chapter 15 ›

Efficient Attention Methods

Standard self-attention compares every token with every other token. For a sequence of length $T$, this produces a $T \times T$ attention matrix. The cost grows quadratically with sequence length.

Sections

Attention Mechanisms

Self-Attention

Multi-Head Attention

Positional Encoding

Transformer Encoders

Transformer Decoders

Efficient Attention Methods