Chapter 30 sections from Deep Learning with PyTorch.
6 items
An adversarial example is an input that has been deliberately modified so that a model makes a wrong prediction, while the modification is small enough that a human observer still sees the original object.
A distribution shift occurs when the data seen at deployment differs from the data used during training.
A saliency map is a visualization that assigns an importance score to each part of an input.
Attribution methods assign credit or blame to parts of an input, hidden representation, neuron, feature, or training example for a model output.
Mechanistic interpretability studies neural networks by treating them as learned computational systems.
Model editing modifies a trained model so that it changes a specific behavior while preserving most other behaviors.