evolution of features in computer visionrm17770/mlreading/dcgan_pres.pdf · dcgan: unsupervised...

30
Evolution of features in Computer Vision Towards deep visual features for Generative Adversarial Networks

Upload: others

Post on 01-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Evolution of features in Computer Vision

Towards deep visual features for Generative Adversarial Networks

Page 2: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Computer vision

• Aim of CV imitate human sight, functionality of eye and brain

Page 3: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Features extraction

• Example of problem Human detection

• Histogram of gradients (HOG) Features Classification

Page 4: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Convolutional neural networks

• From hand-crafted features to CNN features

• Images are consecutively down-sampled

• Filters are consecutively increased

Page 5: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Typical CNNs used in Computer Vision

• AlexNet First CNN used in CV

• VGGNet Simple 3x3 filters

Page 6: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Typical CNNs used in Computer Vision

• GoogLeNet First inception network

Page 7: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Visualization of filters

• Use gradient ascent to maximise the activation of filters

Page 8: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Deepdream

• Random image Optimize for a class

Page 9: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Deepdream

• Iterate over its own output

Page 10: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

1

GAN: Generative Adversarial NetworksDCGAN: Unsupervised Representation Learning with Deep

Convolutional Generative Adversarial Networks

ConGAN: Conditional Generative Adversarial NetworksGAN-T2I: Generative Adversarial Text to Image Synthesis

InfoGAN: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

AAE: Adversarial Autoencoders

VAE/GAN: Autoencoding beyond pixels using a learned similarity metric

Page 11: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Unsupervised Representation Learning with Deep Convolutional Generative

Adversarial Networks

Unsupervised learning that works well to generate and discriminate in several image datasets

GANs are focused on the optimization of competing criteria proposed in:Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative

Adversarial Nets. Advances in Neural Information Processing Systems, 2672–2680.

Main contribution:"Extensive model exploration” to identify "a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models":

→ sampling GANs using CNN architectures is not trivial (scaling problem)

1Radford, L. Metz, and S. Chintala. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. Proceedings of the IEEE International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1511.06434v2original slides from http://kawahara.ca/deep-convolutional-generative-adversarial-nets-slides/

Page 13: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generated - Faces

3

Page 14: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generated - Albumn Covers

4

Page 15: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generated - ImageNet

5

Page 16: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generator G(·) input=random numbers,

output=generated image

Discriminator D(·) input=generated/real image,

output=prediction of real image

Real image, so goal is D(x)=1

Uniform noise vector (random numbers)

Generated image G(z)

Generated image, so goal is D(G(z))=0 6

Page 17: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generator G(·) input=random numbers,

output=generated image

Discriminator D(·) input=generated/real image,

output=prediction of real image

Real image, so goal is D(x)=1

Uniform noise vector (random numbers)

Generated image G(z)

Generated image, so goal is D(G(z))=0 7

SGD for each training iteration SGA for each step within a training iteration

Page 18: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generator G(·) input=random numbers,

output=generated image

Discriminator D(·) input=generated/real image,

output=prediction of real image

Real image, so goal is D(x)=1

Uniform noise vector (random numbers)

Generator Goal: Fool D(G(z))i.e., generate an image G(z) such that D(G(z)) is wrong.i.e., D(G(z)) = 1

Generated image G(z)

Generated image, so goal is D(G(z))=0 8

Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image

D(G(z))=0, where G(z) is a generated one

Page 19: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generator G(·) input=random numbers,

output=generated image

Discriminator D(·) input=generated/real image,

output=prediction of real image

Real image, so goal is D(x)=1

Uniform noise vector (random numbers)

Generated image G(z)

0. Conflicting goals

1.Both goals are unsupervised

2. Optimal when D(·)=0.5 (i.e., cannot tell the difference

between real and generated images) and G(z) learns the training images distribution

Generated image, so goal is D(G(z))=0 9

Generator Goal: Fool D(G(z))i.e., generate an image G(z) such that D(G(z)) is wrong.i.e., D(G(z)) = 1

Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image

D(G(z))=0, where G(z) is a generated one

Page 20: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Generator

Uniform noise vector (random numbers)

Fractionally-strided convolutions.8x8 input, 5x5 conv window = 16x16 output

Fully-connected layer (composed of weights) reshaped to have width, height and feature maps

Batch Normalization: normalize responses to have zero mean and unit variance over the entire mini-batch, preventing the

generator from collapsing all samples to a single point, which is a common failure mode observed in GANs.

Uses ReLU activation functions

Uses Tanh to scale generated image output between -1 and 1

10

Page 21: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Fractionally-strided convolutions

Input = 5x5 with zero-padding at border = 6x6

(stride=2)

Output = 3x3

Input = 3x3Interlace zero-padding with inputs = 7x7

(stride=2)

Output = 5x5

Filter size=3x3

Clear dashed squares = zero-padded inputs

Regular convolutions

More info: http://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers

https://www.reddit.com/r/MachineLearning/comments/4bbod7/which_deep_learning_frameworks_support_fractional/

https://github.com/vdumoulin/conv_arithmetic11

Page 22: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Discriminator

Real image

Generated

Goal: discriminate between real and generated images

i.e., D(x)=1, if x is a real imageD(G(z))=0, if G(z) is a generated image

Uses LeakyReLU activation functions

Batch Normalization

No max poolingReduces spatial dimensionality

through strided convolutions

Sigmoid (between 0-1)

Image from: Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., & Do, M. N. (2016). Semantic Image Inpainting with Perceptual and Contextual Losses. 12

Page 23: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Discriminator

Real image

Generated

Goal: discriminate between real and generated images

i.e., D(x)=1, if x is a real imageD(G(z))=0, if G(z) is a generated image

Uses LeakyReLU activation functions

Batch Normalization

No max poolingReduces spatial dimensionality

through strided convolutions

Sigmoid (between 0-1)

Image from: Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., & Do, M. N. (2016). Semantic Image Inpainting with Perceptual and Contextual Losses. 13

Strides

Input = 4x4 Filter = 3x3

stride=1output=2x2

Input = 5x5 Filter = 3x3

stride=2output=2x2

Larger strides reduces dimensionality faster

https://github.com/vdumoulin/conv_arithmetic

Page 24: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Interpolating between a series of random points in Z (Z=the 100-d random numbers)

Note the smooth transition between each scene

←the window slowly appearing

This indicates that the space learned has smooth transitions(and is not simply memorizing)

14

Page 25: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

(Top) Unmodified sample generated images

(Bottom) Samples generated after dropping out "window" concept. Some windows are removed or transformedThe overall scene stays the same, indicating the generator has separated objects (windows) from the scene

15

Page 26: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Find 3 exemplar images (e.g., 3 smiling women, 3 neutral women and 3 neutral men)

Average their Z vector

small uniform noise to the new

vector

All similar visual concepts

Generate an image based on this new vector Do simple vector arithmetic operations

Arithmetic in pixel space

16

Page 27: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Average 4 vectors from exemplar faces looking left and 4 looking right

Interpolate between the left and right vectors creates a "turn vector" 18

Page 28: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

http://karpathy.github.io/2011/04/27/manually-classifying-cifar10/

CIFAR-10

1) Train on ImageNet2) Get all the responses from the Discriminator's layers3) Max-pool each layer to get a 4x4 spatial grid4) Flatten to form a 1-D dimensional feature vector5) Train a regularized linear L2-SVM classifier for CIFAR-10

(note: while other approaches achieve higher performance, this network was not trained on CIFAR-10!)

19

Page 29: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

State of the art results

Results are not just from the modified architecture. Come from the GAN

StreetView House Numbers

Similar training procedure as before

20

Page 30: Evolution of features in Computer Visionrm17770/mlreading/dcgan_pres.pdf · DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ConGAN:

Summary

✔ Visualizations indicate that the Generator is learning something close to the true distribution of real images.

✔ Classification performance using the Discriminator features indicates that features learned are discriminative of the underlying classes.

➔ Unstability: as models are trained longer they sometimes collapse a subset of filters to a single oscillating mode.

➔ Further research on GANs: investigate properties of the learnt latent space.

21