Analysis Depot
The Analysis Depot is a repository for Jupyter notebooks we initially used to analyze subtasks and/or possible debugging for the actual code later run on the HPC Cluster. It now houses helpful and interesting in-depth descriptions, analyses, and visualizations of our implementations, training, sampling, experiments, and evaluation of our diffusion models and their neural backbone.
Background
Diffusion models (DMs) are a class of generative models that offer a unique approach to modeling complex data distributions by simulating a stochastic process, known as a diffusion process, that gradually transforms data from a simple initial distribution into a complex data distribution. More specifically, the simple distribution is given by Gaussian Noise which is iteratively denoised into coherent images through modeling the data distribution present in the training set.
Overview
The repository is divided into three sections pertaining to the type of diffusion model, namely, the unconditional, conditional, and latent DM. We train them on a variety of datasets to perform various generative tasks, covering unconditional, class-conditional image generation, inpainting, and latent Super-Resolution.
Each of these sections contains notebooks with explanations and equations for the class of the DM, the neural network UNet architecture backbone, dataloading, sampling, evaluation, etc. They provide our thought process and further analysis on our implementation.
Trained Models
All our models share AdamW as the optimizer, CosineAnnealingLR as the scheduler (starting at a learning rate of 0.0001 and decreasing to an eta_min of 1e-10), and are trained jointly with an EMA model for a decay value of 0.9999. The diffusion process is performed on a Markov chain of length T=1000.
Hyperparam. | LHQ-UDM | Bottleneck | No Attention | Celeb-UDM | Cosine Noise | True Variance | Class-CDM | No CFG | Inpaint-CDM | Small-CDM | LDM-16 | LDM-8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Task | Uncond. | Uncond. | Uncond. | Uncond. | Uncond. | Uncond. | Class Cond. | Class Cond. | Inpainting | Inpainting | SuperRes | SuperRes |
Dataset | LHQ | LHQ | LHQ | CelebAHQ | CelebAHQ | CelebAHQ | AFHQ | AFHQ | LHQ | LHQ | LHQ | LHQ |
Split | 80-20 | 80-20 | 80-20 | 90-10 | 90-10 | 90-10 | 90-10 | 90-10 | 90-10 | 90-10 | 90-10 | 90-10 |
Resolution | 128^2px | 128^2px | 128^2px | 128^2px | 128^2px | 128^2px | 128^2px | 128^2px | 128^2px | 128^2px | 512^2px | 512^2px |
Noise β_t | linear | linear | linear | linear | cosine | linear | linear | linear | linear | linear | linear | linear |
Variance σ^2 | same | same | same | same | same | true | same | same | same | same | same | same |
CFG | - | - | - | - | - | - | yes | no | - | - | - | - |
VQGAN-f | - | - | - | - | - | - | - | - | - | - | 16 | 8 |
z-shape | - | - | - | - | - | - | - | - | - | - | (256,32,32) | (256,64,64) |
Parameters | 37M | 37M | 34M | 37M | 37M | 37M | 37M | 37M | 37M | 11M | 496M | 137M |
Channel Mults. | [1,2,4,4,8] | [1,2,4,8,10] | [1,2,4,4,8] | [1,2,4,4,8] | [1,2,4,4,8] | [1,2,4,4,8] | [1,2,4,4,8] | [1,2,4,4,8] | [1,2,4,4,8] | [1,2,2,2,4] | [1,2,4,4,8] | [1,2,2,4,0] |
Attention | yes | yes | no | yes | yes | yes | yes | yes | yes | yes | no | yes |
RF* per Block | 7x7 | 3x3 | 7x7 | 7x7 | 7x7 | 7x7 | 7x7 | 7x7 | 7x7 | 7x7 | 7x7 | 7x7 |
Batch Size | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 6 | 8 |
Iters. | 450K | 225K | 225K | 506K | 506K | 506K | 445K | 468K | 427K | 225K | 1.75M | 1M |
Epochs | 200 | 100 | 100 | 600 | 600 | 600 | 950 | 1000 | 190 | 100 | 130 | 100 |
Cosine Steps^ | 2 | 1 | 1 | 3 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 |
*RF pertains to the receptive field; ^Cosine Steps represents the number of times the learning rate undergoes gradual reduction through cosine annealing.