CS180 Fun With Diffusion Models!

Part A: The Power of Diffusion models!

Part 0: Setup

Snowy Mountain Village / Hat Man / Rocket Ship

The caption I had was 'an oil painting of a snowy mountain village', 'a man wearing a hat', 'a rocket ship'.

The output quality is actually quite pristine, and it is a great match to the captions provided.

I used seed 180.

Part 1: Sampling Loops

Part 1.1: The Forward Process

Forward Campanile

The image shown here is the forward() function applied to the test_image (beautiful Campanile) at noise levels [250, 500, 750].

Part 1.2: Classical Denoising

Denoised Campanile

The image shown here is the side-by-side of the corresponding Gaussian-denoised versions of the 3 noisy test images from the previous part.

Part 1.3: One-Step Denoising

One-Step Denoised Campanile

The image shown here is the UNet-applied denoised versions of the 3 noisy images from 1.2.

Part 1.4: Iterative Denoising

Iteratively Denoised Campanile

The image shown here is the every 5th noisy image denoised (gradually becoming less noisy). The final predicted clean image was produced using iterative denoising.

Part 1.5: Diffusion Model Sampling (DMS)

Noisy!!

Pixels are less clear

Part 1.6: Classifier-Free Guidance (CFG)

Less noisy!!

Pixels are quite cleaar

Part 1.7: Image-to-image Translation

Part 1.7.1: Editing Web Images/h3>

The Dark Side of the Moon

I picked Pink Floyd's album, The Dark Side of the Moon, as the image I will edit.

The Dark Side of the Moon -- Edited

The image I got here was edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

Editing Hand-Drawn Images/h3>

Sam Kim

Here's my autograph.

Cat

And here's a cat I drew.

Edited Sam Kim and Cat

And here are both drawings similarly edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

Part 1.7.2: Inpainting

Inpainted Campanile

The top part of the campanile here has a mask applied to it. I just covered the hole with a grey box.

Inpainted Ocean

The center of the image here has a circular mask applied to it. I also covered the hole with a grey box.

Inpainted Hat Man

The diagonal of the image here has a mask applied to it. I covered this hole with a grey box too.

Part 1.7.3: Text-Conditional Image-to-image Translation

Rocket

I was able to apply the noise to rocket at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

Car

I was able to apply the noise to the Car at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

Cat

I was able to apply the noise to Cat at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

Part 1.8: Visual Anagrams/h3>

The Old Man and the Campfire

You can see here that there is an old man sitting upright, but when flipped, we see a campfire.

Snowy Village Skull

You can see here that there is a snowy mountain village, but when flipped, we see a skull.

The Bartender in Amalfi

While incorrectly labeled, we see here a hipster bartender standing upright, but when flipped, we see the beautiful Amalfi Coast

Part 1.10: Hybrid Images/h3>

Hit the SkullFall

You can see a hybrid image of a skull and a waterfall. From close, you can see the details that make up a waterfall, but from afar, you will see a skull.

Pencil or Amalfi Coast

You can see a hybrid image of a pencil and the Amalfi Coast. From close, you can see the details that make up the famous coastal town, but from afar, you will see a pencil (sort of).

Rocket Dog/h3>
You can see a hybrid image of a dog and arocket. From close, you can see the details that make up a dog, but from afar, you will see a rocket.

Part B: Diffusion Models from Scratch!

1.2 UNet - Deliverables

Noising Visualization

Here is a visualization of the noising process with sigma values = [0.0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0].

Epoch 1

Here is a visualization of the sample results on the test set after the first epoch.

Epoch 5

Here is a visualization of the sample results on the test set after the fifth epoch.

Training loss curve plot

Here is a visualization of the training loss curve plot for the whole training process.

Sample results w/ out-of-distribution noise levels

I've varied the sigma values, but kept the same image.

div class="image-container">

Cat

I was able to apply the noise to Cat at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

2.3 Time-conditioned UNet - Deliverables

Training loss curve - Time-conditioned UNet

I was able to apply the noise to Cat at the specified levels [1, 3, 5, 7, 10,20] but wasn't able to have it converge back to the Campanile.

2.5 Deliverables

I was unable to complete part 2.5.

CS180 Fun With Diffusion Models!

Part A: The Power of Diffusion models!

Part 0: Setup

Snowy Mountain Village / Hat Man / Rocket Ship

Part 1: Sampling Loops

Part 1.1: The Forward Process

Forward Campanile

Part 1.2: Classical Denoising

Denoised Campanile

Part 1.3: One-Step Denoising

One-Step Denoised Campanile

Part 1.4: Iterative Denoising

Iteratively Denoised Campanile

Part 1.5: Diffusion Model Sampling (DMS)

Noisy!!

Part 1.6: Classifier-Free Guidance (CFG)

Less noisy!!

Part 1.7: Image-to-image Translation

Part 1.7.1: Editing Web Images/h3> The Dark Side of the Moon I picked Pink Floyd's album, The Dark Side of the Moon, as the image I will edit. The Dark Side of the Moon -- Edited The image I got here was edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

The Dark Side of the Moon

The Dark Side of the Moon -- Edited

Editing Hand-Drawn Images/h3> Sam Kim Here's my autograph. Cat And here's a cat I drew. Edited Sam Kim and Cat And here are both drawings similarly edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

Sam Kim

Cat

Edited Sam Kim and Cat

Part 1.7.2: Inpainting

Inpainted Campanile

Inpainted Ocean

Inpainted Hat Man

Part 1.7.3: Text-Conditional Image-to-image Translation

Rocket

Car

Cat

The Old Man and the Campfire

Snowy Village Skull

The Bartender in Amalfi

Hit the SkullFall

Pencil or Amalfi Coast

Rocket Dog/h3> You can see a hybrid image of a dog and arocket. From close, you can see the details that make up a dog, but from afar, you will see a rocket.

Part B: Diffusion Models from Scratch!

1.2 UNet - Deliverables

Noising Visualization

Epoch 1

Epoch 5

Training loss curve plot

Sample results w/ out-of-distribution noise levels

Cat

2.3 Time-conditioned UNet - Deliverables

Training loss curve - Time-conditioned UNet

2.5 Deliverables

Part 1.7.1: Editing Web Images/h3>

The Dark Side of the Moon

I picked Pink Floyd's album, The Dark Side of the Moon, as the image I will edit.

The Dark Side of the Moon -- Edited

The image I got here was edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

Editing Hand-Drawn Images/h3>

Sam Kim

Here's my autograph.

Cat

And here's a cat I drew.

Edited Sam Kim and Cat

And here are both drawings similarly edited using CFG at noise levels [1, 3, 5, 7, 10, 20].

Rocket Dog/h3>
You can see a hybrid image of a dog and arocket. From close, you can see the details that make up a dog, but from afar, you will see a rocket.