sourcecycle-consistent adversarial networks, iccv 2017. ... (e.g., dog to cat) • can get confused...

56
Conditional generation S ource S ource

Upload: others

Post on 30-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Outline• General conditional GANs• BigGAN• Paired image-to-image translation• Unpaired image-to-image translation:

CycleGAN

Page 3: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Conditional generation• Suppose we want to condition the generation

of samples on discrete side information (label) !• How do we add ! to the basic GAN framework?

" #"(%)#("(%))%

Page 4: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Conditional generation• Suppose we want to condition the generation

of samples on discrete side information (label) !• How do we add ! to the basic GAN framework?

"# #(", !)

'(# ", ! , !)

! !'

Page 5: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Conditional generation• Example: simple network for generating

28 x 28 MNIST digits

M. Mirza and S. Osindero, Conditional Generative Adversarial Nets, arXiv 2014

Class label: 10-dim one-hot

vector

Noise vector

Generator

Figure source: F. Fleuret

Page 6: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Conditional generation• Example: simple network for generating

28 x 28 MNIST digits

M. Mirza and S. Osindero, Conditional Generative Adversarial Nets, arXiv 2014

Class label: 10-dim one-hot

vector

Noise vector

Discriminator

Figure source: F. Fleuret

Page 7: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Conditional generation• Example: simple network for generating

28 x 28 MNIST digits

M. Mirza and S. Osindero, Conditional Generative Adversarial Nets, arXiv 2014

Page 8: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Conditional generation• Another example: text-to-image synthesis

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, ICML 2016

LSTM text encoder Spatial replication, depth concatenation

Page 9: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Conditional generation• Another example: text-to-image synthesis

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, ICML 2016

Previously unseen captions (zero-shot

setting)

Captions seen in the training set

Page 10: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

BigGAN• Synthesize ImageNet images, conditioned on

class label, up to 512 x 512 resolution

A. Brock, J. Donahue, K. Simonyan, Large scale GAN training for high fidelity natural image synthesis, arXiv 2018

Page 11: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

BigGAN• Self-attention GAN to capture spatial structure

Page 12: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

BigGAN• Self-attention GAN to capture spatial structure• Spectral normalization for generator and discriminator• Conditioning the generator: conditional batch norm• Conditioning the discriminator: projection• 8x larger batch size, 50% more feature channels than

baseline• Hierarchical latent space: feed different “chunks” of noise

vector into multiple layers of the generator• Lots of other tricks (initialization, training, z sampling, etc.)• Training observed to be unstable, but good results are

achieved before collapse

Page 13: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

BigGAN• Results at 512 x 512 resolution

Page 14: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

BigGAN• Results at 512 x 512 resolution

Page 15: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

BigGAN

Nearest neighbors in pixel space

Nearest neighbors in

FC7 space

Page 16: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

BigGAN• Results at 512 x 512 resolution

Easy classes Difficult classes

Page 17: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation

P. Isola, J.-Y. Zhu, T. Zhou, A. Efros, Image-to-Image Translation with Conditional Adversarial Networks, CVPR 2017

Page 18: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Produce modified image ! conditioned on

input image " (note change of notation)• Generator receives ! as input• Discriminator receives an !, # pair and has to

decide whether it is real or fake

Page 19: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Generator architecture: U-Net

• Note: no ! used as input, transformation is basically deterministic

Page 20: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Generator architecture: U-Net

Figure source

Encode: convolution → BatchNorm → ReLU

Decode: transposed convolution → BatchNorm → ReLU

Page 21: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Generator architecture: U-Net

Effect of adding skip connections to the generator

Page 22: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Generator loss: GAN loss plus L1 reconstruction

penalty

!∗ = argmin"max#ℒ"$% !,, + . /&

0& − !(3&) '

Generated output ((*&) should be close to

ground truth target ,&

Page 23: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Generator loss: GAN loss plus L1 reconstruction

penalty

!∗ = argmin"max#ℒ"$% !,, + . /&

0& − !(3&) '

Page 24: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Discriminator: PatchGAN

• Given input image ! and second image ", decide whether " is a ground truth target or produced by the generator

Page 25: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Discriminator: PatchGAN

• Given input image ! and second image ", decide whether " is a ground truth target or produced by the generator

• Output is a 30 x 30 map where each value (0 to 1) represents the quality of the corresponding section of the output image

• Fully convolutional network, effective patch size canbe increased by increasing the depth

Figure source

Page 26: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation• Discriminator: PatchGAN

• Given input image ! and second image ", decide whether " is a ground truth target or produced by the generator

• Output is a 30 x 30 map where each value (0 to 1) represents the quality of the corresponding section of the output image

• Fully convolutional network, effective patch size can be increased by increasing the depth

Effect of discriminator patch size on generator output

Page 27: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Translating between maps and aerial photos

Page 28: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Translating between maps and aerial photos• Human study:

Page 29: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Semantic labels to scenes

Page 30: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Semantic labels to scenes• Evaluation: FCN score

• The higher the quality of the output, the better theFCN should do at recovering the original semantic labels

Page 31: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Scenes to semantic labels

Page 32: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Scenes to semantic labels• Accuracy is worse than that of regular FCNs

or generator with L1 loss

Page 33: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Semantic labels to facades

Page 34: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Day to night

Page 35: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• Edges to photos

Page 36: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Results• pix2pix demo

Page 37: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Image-to-image translation: Limitations• Visual quality could be improved• Requires !, # pairs for training• Does not model conditional distribution $(#|!), returns a single mode instead

Page 38: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Unpaired image-to-image translation• Given two unordered image collections ! and ",

learn to “translate” an image from one into the other and vice versa

J.-Y. Zhu, T. Park, P. Isola, A. Efros, Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks, ICCV 2017

Page 39: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Unpaired image-to-image translation• Given two unordered image collections ! and ",

learn to “translate” an image from one into the other and vice versa

J.-Y. Zhu, T. Park, P. Isola, A. Efros, Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks, ICCV 2017

Page 40: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN• Given: domains ! and "• Train two generators # and $ and two

discriminators %! and %"• # translates from $ to %, & translates from % to $• '! recognizes images from $, '" from %• We want &(#())) ≈ ) and #(&(,)) ≈ ,

Page 41: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Architecture• Generators:

• Discriminators: PatchGAN on 70 x 70 patches

Figure source

Page 42: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Loss• Requirements:

• ! translates from " to #, $ translates from # to "• %& recognizes images from ", %' from #• We want $(!())) ≈ ) and !($(,)) ≈ ,

• CycleGAN discriminator loss: LSGANℒ./0 %' = 23~56787(3) (%' , − 1); + 2=~56787(=) %' ! ) ;

ℒ./0 %& = 2=~56787(=) (%& ) − 1); + 23~56787(3) %& $ , ;

• CycleGAN generator loss:ℒ>?> !, $ = 2=~56787(=) %' ! ) − 1 ; + 23~56787(3) %& $ , − 1 ;

+ 2=~56787(=) $ ! ) − ) A + 23~56787(3) ! $ , − , A

Page 43: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN• Illustration of cycle consistency:

Page 44: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Results• Translation between maps and aerial photos

Page 45: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Results• Other pix2pix tasks

Page 46: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Results• Scene to labels and labels to scene

• Worse performance than pix2pix due to lack of paired training data

Page 47: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Results• Tasks for which paired data is unavailable

Page 48: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Results• Style transfer

Page 49: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Failure cases• Failure cases

Page 50: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Failure cases• Failure cases

Page 51: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

CycleGAN: Limitations• Cannot handle shape changes (e.g., dog to

cat)• Can get confused on images outside of the

training domains (e.g., horse with rider)• Cannot close the gap with paired translation

methods• Does not account for the fact that one

transformation direction may be morechallenging than the other

Page 52: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Multimodal image-to-image translation

J.Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, E. Shechtman, Toward Multimodal Image-to-Image Translation, NIPS 2017

Page 53: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Unsupervised image-to-image translation

M.-Y. Liu, T. Breuel, and J. Kautz, Unsupervised Image-to-Image Translation Networks, NIPS 2017

Page 54: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

M.-Y. Liu, T. Breuel, and J. Kautz, Unsupervised Image-to-Image Translation Networks, NIPS 2017

Unsupervised image-to-image translation

Page 55: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

M.-Y. Liu, T. Breuel, and J. Kautz, Unsupervised Image-to-Image Translation Networks, NIPS 2017

Unsupervised image-to-image translation

Page 56: SourceCycle-Consistent Adversarial Networks, ICCV 2017. ... (e.g., dog to cat) • Can get confused on images outside of the training domains (e.g., horse with rider) • Cannot close

Interesting New Yorker article

https://www.newyorker.com/magazine/2018/11/12/in-the-age-of-ai-is-seeing-still-believing