Chapter 22

Writes › Book › Deep Learning with PyTorch › Part VI › Chapter 22 ›

Statistical Language Models

A language model assigns probabilities to sequences of tokens. The tokens may be words, subwords, characters, bytes, or other discrete symbols. In the classical setting, a sentence is represented as a finite sequence

Writes › Book › Deep Learning with PyTorch › Part VI › Chapter 22 ›

Neural Language Models

Statistical language models estimate probabilities from discrete counts.

Writes › Book › Deep Learning with PyTorch › Part VI › Chapter 22 ›

Autoregressive Modeling

Autoregressive modeling is the dominant formulation for modern language generation. The model predicts the next token from previous tokens. Repeating this prediction step produces a sequence.

Writes › Book › Deep Learning with PyTorch › Part VI › Chapter 22 ›

Masked Language Modeling

Masked language modeling trains a model to recover missing tokens from their surrounding context.

Writes › Book › Deep Learning with PyTorch › Part VI › Chapter 22 ›

Tokenization Systems

A language model does not read raw text directly. It reads tokens. Tokenization is the process that maps a string of text into a sequence of discrete symbols, and later maps generated symbols back into text.