Chapter 27 sections from Deep Learning with PyTorch.
8 items
Diffusion models are generative models built around a simple idea: learn to reverse a gradual corruption process.
The forward diffusion process gradually transforms data into noise.
Diffusion models can be understood from multiple mathematical viewpoints.
A diffusion model needs a rule for how noise increases during the forward process.
Early diffusion models operated directly in pixel space. A model generated images by iteratively denoising tensors such as
Text-to-image generation aims to synthesize images from natural language descriptions. A model receives a prompt such as:
Video diffusion extends image diffusion from still images to moving sequences. Instead of generating one image, the model generates a sequence of frames that should remain visually coherent over time.
Early diffusion systems used convolutional U-Nets as denoising networks. U-Nets worked well because images contain strong local structure, and convolutions efficiently model nearby spatial relationships.