Big update to "The Illustrated Stable Diffusion" post
https://jalammar.github.io/illustrated-stable-diffusion/…
14 new and updated visuals.
The biggest update is that forward diffusion is more precisely explained -- not as a process of steps (that are easy to confuse with de-noising steps).
-1-
Forward Diffusion is the process of making training examples by sampling an image, noise, and an amount of noise, and mixing them to create a training example.
-2-
Do this with lots of images and lots of noise samples & amounts, and there's a training dataset for your model -- the noise prediction Unet.
-3-
Your training steps for the Unet then follow the familiar supervised learning recipe:
1- Make prediction
2- Compare to label, calculate loss
3- Update model so it does better the next time
-4-
The post then goes on discussing how text prompts are added to the picture.
One prediction in the earlier draft has already happened (swapping CLIP for OpenCLIP).
-5-