Chapter 30

Writes › Book › Deep Learning with PyTorch › Part IX › Chapter 30 ›

Adversarial Examples

An adversarial example is an input that has been deliberately modified so that a model makes a wrong prediction, while the modification is small enough that a human observer still sees the original object.

Writes › Book › Deep Learning with PyTorch › Part IX › Chapter 30 ›

Distribution Shift

A distribution shift occurs when the data seen at deployment differs from the data used during training.

Writes › Book › Deep Learning with PyTorch › Part IX › Chapter 30 ›

Saliency Maps

A saliency map is a visualization that assigns an importance score to each part of an input.

Writes › Book › Deep Learning with PyTorch › Part IX › Chapter 30 ›

Attribution Methods

Attribution methods assign credit or blame to parts of an input, hidden representation, neuron, feature, or training example for a model output.

Writes › Book › Deep Learning with PyTorch › Part IX › Chapter 30 ›

Mechanistic Interpretability

Mechanistic interpretability studies neural networks by treating them as learned computational systems.

Writes › Book › Deep Learning with PyTorch › Part IX › Chapter 30 ›

Model Editing

Model editing modifies a trained model so that it changes a specific behavior while preserving most other behaviors.

Sections

Adversarial Examples

Distribution Shift

Saliency Maps

Attribution Methods

Mechanistic Interpretability

Model Editing