Chapter 10 sections from Deep Learning with PyTorch.
7 items
A neural network begins training with parameters that have not yet been learned from data.
Deep networks train by sending information in two directions.
Batch normalization is a layer that normalizes activations using statistics computed from a mini-batch.
Layer normalization is a normalization method that normalizes features within each individual example.
Batch normalization and layer normalization are the two most common normalization layers, but they do not cover every setting well.
Residual connections allow a layer or block to add its input directly to its output. Instead of forcing a block to learn a complete transformation from scratch, the block learns a correction to the input.
Stable training means that a model can make steady progress without numerical collapse, uncontrolled gradients, or large oscillations in the loss.