image-to-image translation with conditional adversarial nets (upc reading group)

55
Image-to-Image Translation with Conditional Adversarial Networks Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros [GitHub ] [Arxiv ] Slides by Víctor Garcia [GDoc ] UPC Computer Vision Reading Group (25/11/2016)

Upload: xavier-giro

Post on 21-Apr-2017

926 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 2: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Index● Introduction ● State of the Art● Method

○ Network Architecture○ Losses

● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer

● Conclusions

Page 3: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

IntroductionImage → Image

GANs

Page 4: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Index● Introduction ● State of the Art

○ Image to Image

● Method● Experiments● Conclusions

Page 5: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to Image

CNN

Super-Resolution

Loss = MSE(Φ(Iin), Φ(Iout))

Page 6: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to Image

CNN

CNN

Super-Resolution

Image colorization

Loss = MSE(Φ(Iin), Φ(Iout))

Loss = CE(Φ(Iin), Φ(Iout)) weighted

Page 7: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to Image

Generator Global Loss ?

Page 8: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to Image

Generator

Discriminator

Generated Pairs

Real World

Ground Truth Pairs

Loss → BCE

Page 9: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to ImageSome works already use conditional GANs for Image to Image translation

CNN

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. (2 Weeks ago)

Loss1 = MSE_VGG(Φ(Iin), Φ(Iout))

Loss2 = Regularization

Loss3 = GAN Loss

Page 10: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to ImageSome works already use conditional GANs for Image to Image translation

CNN

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. (2 Weeks ago)

Loss1 = MSE_VGG(Φ(Iin), Φ(Iout))

Loss2 = Regularization

Loss3 = GAN Loss

Unsupervised cross-domain Image Generation (2 Weeks ago)

z

Page 11: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to ImageSome works already use conditional GANs for Image to Image translation

CNN

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. (2 Weeks ago)

Loss1 = MSE_VGG(Φ(Iin), Φ(Iout))

Loss2 = Regularization

Loss3 = GAN Loss

Unsupervised cross-domain Image Generation (2 Weeks ago)

z

Page 12: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

State of the Art - Image to Image

This paper solves the problem of Image to Image Translation using the same architecture for any task without need of handcrafting any Loss function.

z

Page 13: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Index● Introduction ● State of the Art● Method

○ Conditioned GANs○ Generator - Skip Network○ Discriminator - PatchGAN○ Optimization Losses

● Experiments● Conclusions

Page 14: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

GANsDiscriminator

True/False

GeneratorReal World

Page 15: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

GANsDiscriminator

True/False

GeneratorReal World

Page 16: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

GANsDiscriminator

True

Generator

Page 17: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

GANsDiscriminator

Generator

True/False

Real World

Page 18: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

GANsDiscriminator

True

Generator

Page 19: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

GANsDiscriminator

True/False

GeneratorReal World

Page 20: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

GANs - ConditionalDiscriminator

True/False

GeneratorReal World

Page 21: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Generator - Unet

Page 22: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Generator - Unet

Skip Connections

Page 23: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Discriminator - Patch GAN

1/0

256x256x3

Page 24: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Discriminator - Patch GAN1 1

1 1

0 0

0 0

512x512x3

● Faster● Training with larger Images● Equal or better results

NxNxdepth

Page 25: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Discriminator - Patch GANPixelGAN PatchGAN ImageGAN

Page 26: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Optimization LossesFor training they are only using two Losses:

● GAN Loss:

Page 27: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Optimization LossesFor training they are only using two Losses:

● GAN Loss:

● L1 Loss (Enforce correctness at Low Frequencies):

Page 28: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Optimization LossesFor training they are only using two Losses:

● GAN Loss:

● L1 Loss (Enforce correctness at Low Frequencies):

Page 29: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Index● Introduction ● State of the Art● Method● Experiments

○ Experiment types○ Evaluation Metrics○ Cityscapes○ Colorization○ Map <-> Aerial

● Conclusions

Page 30: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 31: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 32: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 33: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 34: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 35: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 36: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 37: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Experiments● Archirectural labels → photo, trained on Facades● Semantic labels <-> photo, on Cityscapes● Map <-> Aerial photo, from Google Maps● BW → Color photos, trained on Imagenet● Edges → Photo, trained on Handbags and Shoes● Sketch → Photo, human drawn sketches● Day → Night

Page 38: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Evaluation Metrics - Cityscapes

Evaluation for qualitative images is an open and difficult problem

For semantic labels <-> Photo in Cityscapes we are using:

FCN-Score

Page 39: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Evaluation Metrics - Colorization and Maps

Amazon Mekanical Turks

Is this picture Real ? Yes/No

Page 40: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Cityscapes - FCN Score

- FCN-score

Page 41: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Cityscapes - FCN Score

- FCN-score

Page 42: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Cityscapes - PatchGAN

PixelGAN PatchGAN ImageGANno-GAN

Page 43: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Cityscapes - Color DistributionL1 + pixelcGAN L1 + cGAN

Page 44: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Cityscapes - Autoencoder vs U-net

Page 45: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Image Colorization

L2 Classification

(rebal.)

L1+cGAN

Labeled as real

16.3% 27.8% 22.5%

Page 46: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Map to Aerial

L1 L1+cGAN

Labeled as real

0.8% 18.9%

Map to Aerial

512x512

Page 47: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Aerial to Map

L1 L1+cGAN

Labeled as real

2.8% 6.1%

Aerial to Map

Page 48: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Image Segmentation

L1 cGAN L1+cGAN

Per-pixel acc.

0.86 0.74 0.83

Per-class acc.

0.42 0.28 0.36

Class IOU 0.35 0.22 0.29

Page 49: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Other Experiments - Labels → Facades

Page 50: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Other Experiments - Day → Night

Page 51: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Other Experiments - Edges → Handbags

Page 52: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Other Experiments - Edges → Shoes

Page 53: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Other Experiments - Edges → Shoes

Page 54: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Conclusions

● Conditional Adversarial Networks are a promising approach for many image to image translation tasks.

● Using U-net as a generator has been a big improvement for forwarding low level features through the network and partially reconstructing it at the output.

● Using the Patch GAN Approach we can train and generate high resolution images

Page 55: Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)