invertible conditional gans for image editing€¦ · complex image editing often requires human...

1
2. Conditional GANs (cGANs) Invertible Conditional GANs for image editing Guim Perarnau, Joost van de Weijer*, Bogdan Raducanu*, Jose M. Álvarez† * Computer Vision Center, Barcelona, Spain † Data61 @ CSIRO, Canberra, Australia [email protected], {joost,bogdan}@cvc.uab.es, [email protected] Bibliography [1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680. [2] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, International Conference on Learning Representations, 2016. [3] S. E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis”, International Conference on Machine Learning, 2016. Overview Problem Complex image editing often requires human supervision and professional image editing tools. How can we automatize these complex operations? Solution We propose Invertible Conditional GANs (IcGANs), a model that combines a conditional GAN with an encoder. How? 1. Generate realistic images via GANs. 2. Condition generated images with attributes. 3. Encode real images in order to reconstruct them with the desired changes. 3. Encoder 4. Invertible conditional GANs (IcGANs) 5. Results 6. Conclusions Now we can combine both cGAN and encoder to create an IcGAN. With the encoder, we can invert the generator and map an image into a high feature space z and y. In this space, we can arbitrarily change key aspects of the image and then reconstruct the modified image using the generator. Discriminator Generator Fake images Real images Fake? Real? Backpropagation GANs are composed of two networks, a generator and a discriminator. The generator is trained to fool the discriminator by creating realistic images, and the discriminator is trained not to be fooled by the generator. One way to evaluate these models is to directly see how visually appealing the generated samples are. Here we show some qualitative examples of what an IcGAN is capable of by playing with both latent space z and conditional information y. We fix z for every row and modify y for each column to obtain variations of real images. Interpolations between faces. Swapping two face attributes. Acknowledgments This work is funded by the Projects TIN2013-41751-P of the Spanish Ministry of Science and the CHIST ERA project PCIN-2015-226. Scheme of an IcGAN and how it is used. Example of complex editing operations. Blonde Smile Male Bangs Original Code available! https://github.com/Guim3/IcGAN 1. Generative Adversarial Networks (GANs) With cGANs, we add into the model conditional information y that describes some aspect of the data. This allows to control certain aspects of the generated images, e.g. generate a blonde woman with sunglasses. We refine cGANs by testing the optimal position in which y is inserted in the generator and discriminator. Full conv 1 Full conv 2 Full conv 3 Full conv 4 Full conv 5 100 Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Encoder Generator 1 0 1 1 0 female black hair brown hair make-up sunglasses Change vector 1 1 0 1 0 female black hair brown hair make-up sunglasses cGAN IcGAN Encoder Input Recons. Swap y Input Recons. Swap y We introduce IcGANs, which solves the problem of GANs lacking the ability to reconstruct real images while also allowing to explicitly control complex attributes of generated samples. We refine the performance of cGANS by inserting the conditional information at the input level for the generator and at the first layer for the discriminator. We evaluate several approaches to training an encoder, being two independent encoders and (IND) the best option. 0 10 20 30 40 50 60 70 Input Layer 1 Layer 2 Layer 3 Layer 4 F1-Score cGAN evaluation depending on y inserted position Discriminator Generator 0,35 0,4 0,45 0,5 Reconstruction loss Encoder type comparison SNG IND IND-COND Then, we train an encoder to reconstruct real images. It is trained after the cGAN and is composed of two sub-encoders: , which encodes an image to , and , which encodes an image to . . We test different strategies to make them interact and improve the encoding process: SNG: and are embedded in a single encoder. IND: and are trained separately. IND-COND: two independent encoders, where is conditioned on the output of . Higher is better Lower is better

Upload: others

Post on 21-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Invertible Conditional GANs for image editing€¦ · Complex image editing often requires human supervision and professional image editing tools. How can we automatize these complex

2. Conditional GANs (cGANs)

Invertible Conditional GANs for image editing

Guim Perarnau, Joost van de Weijer*, Bogdan Raducanu*, Jose M. Álvarez†

* Computer Vision Center, Barcelona, Spain † Data61 @ CSIRO, Canberra, Australia

[email protected], {joost,bogdan}@cvc.uab.es, [email protected]

Bibliography[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680.

[2] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, International Conference on Learning Representations, 2016.

[3] S. E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis”, International Conference on Machine Learning, 2016.

Overview

Problem

Complex image editing often requires human

supervision and professional image editing

tools. How can we automatize these complex

operations?

Solution

We propose Invertible Conditional GANs(IcGANs), a model that combines a conditional

GAN with an encoder.

How?

1. Generate realistic images via GANs.

2. Condition generated images with attributes.

3. Encode real images in order to reconstruct

them with the desired changes.

3. Encoder

4. Invertible conditional GANs (IcGANs) 5. Results

6. Conclusions

Now we can combine both cGAN and encoder to create an IcGAN. With the encoder, we can invert the

generator and map an image into a high feature space z and y. In this space, we can arbitrarily change key

aspects of the image and then reconstruct the modified image using the generator.

DiscriminatorGenerator

Fake images

Real images

Fake?

Real?

Backpropagation

GANs are composed of two networks, a generator and a discriminator. The generator is trained to

fool the discriminator by creating realistic images, and the discriminator is trained not to be fooled

by the generator.

One way to evaluate these models is to directly see how visually appealing the generated

samples are. Here we show some qualitative examples of what an IcGAN is capable of by

playing with both latent space z and conditional information y.

We fix z for every row and modify y for each column to obtain variations of real images.

Interpolations between faces.

Swapping two face attributes.

AcknowledgmentsThis work is funded by the Projects TIN2013-41751-P of the Spanish Ministry

of Science and the CHIST ERA project PCIN-2015-226.

Scheme of an IcGAN and how it is used.

Example of complex editing operations.

Blonde Smile MaleBangsOriginal

Code available! → https://github.com/Guim3/IcGAN

1. Generative Adversarial Networks (GANs)

With cGANs, we add into the model conditional information y that describes some aspect of the

data. This allows to control certain aspects of the generated images, e.g. generate a blonde

woman with sunglasses. We refine cGANs by testing the optimal position in which y is inserted in

the generator and discriminator.

Full conv 1 Full conv 2Full conv 3

Full conv 4

Full conv 5

100

Conv 1 Conv 2 Conv 3 Conv 4Conv 5

Encoder Generator

1

0

1

1

0

female

black hair

brown hair

make-up

sunglasses

Change vector

1

1

0

1

0

female

black hair

brown hair

make-up

sunglasses cGANIcGAN

Encoder

Input Recons. Swap y

Input Recons. Swap y• We introduce IcGANs, which solves the problem of GANs lacking the ability to reconstruct real images

while also allowing to explicitly control complex attributes of generated samples.

• We refine the performance of cGANS by inserting the conditional information 𝑦 at the input level for the

generator and at the first layer for the discriminator.

• We evaluate several approaches to training an encoder, being two independent encoders 𝐸𝑧 and 𝐸𝑦 (IND)

the best option.

0

10

20

30

40

50

60

70

Input Layer 1 Layer 2 Layer 3 Layer 4

F1-S

core

cGAN evaluation depending on y inserted position

Discriminator

Generator

0,35

0,4

0,45

0,5

Reco

nst

ruct

ion

lo

ss

Encoder type comparison

SNG IND IND-COND

Then, we train an encoder to reconstruct real images. It is trained after the cGAN and is composed

of two sub-encoders: 𝐸𝑧, which encodes an image to 𝑧, and 𝐸𝑦, which encodes an image to 𝑦′..

We test different strategies to make them interact and improve the encoding process:

• SNG: 𝐸𝑧 and 𝐸𝑦 are embedded in a single encoder.

• IND: 𝐸𝑧 and 𝐸𝑦 are trained separately.

• IND-COND: two independent encoders, where 𝐸𝑧is conditioned on the output of 𝐸𝑦.

Higher is better Lower is better