CS180 Fun With Diffusion Models!

Part A: The Power of Diffusion models!

Part 0: Setup

Snowy Mountain Village / Hat Man / Rocket Ship

Seed

The caption I had was 'an oil painting of a snowy mountain village', 'a man wearing a hat', 'a rocket ship'.

The output quality is actually quite pristine, and it is a great match to the captions provided.

I used seed 180.

Part 1: Sampling Loops

Part 1.1: The Forward Process

Forward Campanile

Forward

The image shown here is the forward() function applied to the test_image (beautiful Campanile) at noise levels [250, 500, 750].

Part 1.2: Classical Denoising

Denoised Campanile

Classical Denoising

The image shown here is the side-by-side of the corresponding Gaussian-denoised versions of the 3 noisy test images from the previous part.

Part 1.3: One-Step Denoising

One-Step Denoised Campanile

One-Step Denoising

The image shown here is the UNet-applied denoised versions of the 3 noisy images from 1.2.

Part 1.4: Iterative Denoising

Iteratively Denoised Campanile

Iterative Denoising

The image shown here is the every 5th noisy image denoised (gradually becoming less noisy). The final predicted clean image was produced using iterative denoising.

Part 1.5: Diffusion Model Sampling (DMS)

Noisy!!

DMS

Pixels are less clear

Part 1.6: Classifier-Free Guidance (CFG)

Less noisy!!

CFG

Pixels are quite cleaar

Part 1.7: Image-to-image Translation

Part 1.7.1: Editing Web Images/h3>

The Dark Side of the Moon

Pink Floyd

I picked Pink Floyd's album, The Dark Side of the Moon, as the image I will edit.

The Dark Side of the Moon -- Edited

Pink Floyd Edited

The image I got here was edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

Editing Hand-Drawn Images/h3>

Sam Kim

Sam Kim

Here's my autograph.

Cat

Cat

And here's a cat I drew.

Edited Sam Kim and Cat

Sam and Cat Edited

And here are both drawings similarly edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

Part 1.7.2: Inpainting

Inpainted Campanile

Inpainted Campanile

The top part of the campanile here has a mask applied to it. I just covered the hole with a grey box.

Inpainted Ocean

Inpainted Ocean

The center of the image here has a circular mask applied to it. I also covered the hole with a grey box.

Inpainted Hat Man

Inpainted Hat Man

The diagonal of the image here has a mask applied to it. I covered this hole with a grey box too.

Part 1.7.3: Text-Conditional Image-to-image Translation

Rocket

Rocket

I was able to apply the noise to rocket at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

Car

Car

I was able to apply the noise to the Car at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

Cat

Cat

I was able to apply the noise to Cat at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

Part 1.8: Visual Anagrams/h3>

The Old Man and the Campfire

Old Fire Man

You can see here that there is an old man sitting upright, but when flipped, we see a campfire.

Snowy Village Skull

Snowy Skull

You can see here that there is a snowy mountain village, but when flipped, we see a skull.

The Bartender in Amalfi

Amalfi Bartender

While incorrectly labeled, we see here a hipster bartender standing upright, but when flipped, we see the beautiful Amalfi Coast

Part 1.10: Hybrid Images/h3>

Hit the SkullFall

SkullFall

You can see a hybrid image of a skull and a waterfall. From close, you can see the details that make up a waterfall, but from afar, you will see a skull.

Pencil or Amalfi Coast

Pencil Amalfi

You can see a hybrid image of a pencil and the Amalfi Coast. From close, you can see the details that make up the famous coastal town, but from afar, you will see a pencil (sort of).

Rocket Dog/h3> Dogcket

You can see a hybrid image of a dog and arocket. From close, you can see the details that make up a dog, but from afar, you will see a rocket.

Part B: Diffusion Models from Scratch!

1.2 UNet - Deliverables

Noising Visualization

Noising

Here is a visualization of the noising process with sigma values = [0.0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0].

Epoch 1

Epoch 1

Here is a visualization of the sample results on the test set after the first epoch.

Epoch 5

Epoch 5

Here is a visualization of the sample results on the test set after the fifth epoch.

Training loss curve plot

Loss

Here is a visualization of the training loss curve plot for the whole training process.

Sample results w/ out-of-distribution noise levels

Out-of-distribution

I've varied the sigma values, but kept the same image.

div class="image-container">

Cat

Cat

I was able to apply the noise to Cat at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

2.3 Time-conditioned UNet - Deliverables

Training loss curve - Time-conditioned UNet

Cat

I was able to apply the noise to Cat at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

2.5 Deliverables

I was unable to complete part 2.5.