deep learning behind prisma

42
Deep Learning behind Prisma ——Image style transfer with Convolutional Neural Network lostleaf

Upload: lostleaves

Post on 22-Jan-2018

1.628 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Deep Learning behind Prisma

Deep Learning behind Prisma ——Image style transfer with Convolutional Neural Network

lostleaf

Page 2: Deep Learning behind Prisma

Agenda• Introduce deep learning models for image style transfer via recent

papers • Prisma is kind of a stunt, but it should have used similar techniques

• Agenda

• A brief introduction to convolutional neural network

• Neural style

• Real-Time Style Transfer

Page 3: Deep Learning behind Prisma

Prisma

• An Russian mobile app

• Turns your photos into awesome artworks

• With Deep Learning!!!

Hotel Ukraine rendered by Prisma from Premier Medvedev’s Instagram

Page 4: Deep Learning behind Prisma

Image Style Transfer

+Arch Starry Night (van Gogh)

Arch painted by van Gogh

Page 5: Deep Learning behind Prisma

A brief introduction to convolutional neural network

Some of the images are from Prof. Feifei Li’s lecture notes

Page 6: Deep Learning behind Prisma

Neuron

• w: weight, b: bias

Page 7: Deep Learning behind Prisma

Activation function(common ones)

Thresholding, preferred in modern network structures

Slower: exponentials Harder to train: vanishing gradient

Activation function: nonlinear functions

Page 8: Deep Learning behind Prisma

Fully connected neural network

Page 9: Deep Learning behind Prisma

Convolution• The brown numbers in the

yellow part is called conventional kernel / filter

• Convolve the filter with the image: slide over the image spatially, computing dot products

• Right: A 3*3 convolution sums up the diagonals

From Prof. Andrew Ng’s UFLDL tutorial

Page 10: Deep Learning behind Prisma

Convolutional layerFilters always extend the full

depth of the input volume

Why *3? 3 channels: R, G & B

Page 11: Deep Learning behind Prisma

Convolutional layer

1 number: the result of taking a dot product between the filter and a small 5*5*3 chunk of the image

Page 12: Deep Learning behind Prisma

Convolutional layer

Transform with activation function f

f

Page 13: Deep Learning behind Prisma

Convolutional layer• A convolutional layer

consists of several filters

• For example, if we had 6 5*5 filters, we’ll get 6 separate activation maps

• Stack these up to get a tensor of size 28*28*6

• May add padding to obtain same output size

Page 14: Deep Learning behind Prisma

Why convolution?• Each value could be considered as an

output of a neuron

• Features of image data:

• pixels only related to small neighborhood (local connection)

• repeat pattern & content move around (weight sharing)

• Reduces the complexity and computation of neural network by utilizing natures of images 6

Page 15: Deep Learning behind Prisma

Pooling Layer• Right: max pooling for example

• Operate independently on every depth slice of the input

• Reduce the reduce the spatial size of activation map (reduce amount of parameters and computation)

• Increase the shift invariance

Page 16: Deep Learning behind Prisma

Case study1: MNIST & Lenet

• MNIST handwritten digits recognition

• “hello world” of deep learning

Page 17: Deep Learning behind Prisma

Lenet

LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

pooling pooling

Page 18: Deep Learning behind Prisma

Case study2: ImageNet & VggNet

• ImageNet: a large image dataset in thousands of classes

Page 19: Deep Learning behind Prisma

VggNet(Vgg19)

Image by Mark Chang

• Runner-up of Imagenet challenge 2014

• 19 trainable layers • 16 convolutional layers (3*3)

• 5 max pooling layers (2*2) • 3 fully connected layers

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

Page 20: Deep Learning behind Prisma

Typical architecture

• Convolutional part & Fully connected part

• [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOFTMAX

Page 21: Deep Learning behind Prisma

Neural style

Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015).

Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Page 22: Deep Learning behind Prisma

Intuition

• Convolutional neural networks well trained on large datasets (VggNet) could be a powerful feature extractor, like human brains

• Human painters are talented in combining content and style

Page 23: Deep Learning behind Prisma

Goal• Given a content image p and a style image a

• Find an image x that

• Similar to p in content

• Similar to a in style

≈ ? ≈p a

x

Page 24: Deep Learning behind Prisma

Formulation• Use Vgg19(Convolutional part) for feature extraction

• Two loss function

• Content loss: difference in content between x and p

• Style loss: difference in style between x and a

• Find an image x that minimize the weighted sum between content and style loss

Page 25: Deep Learning behind Prisma

How to find x

Image by Mark Chang

Page 26: Deep Learning behind Prisma

Some results

J.M.W. Turner

Vincent van Gogh Edvard Munch

Page 27: Deep Learning behind Prisma

con’d

Pablo Picasso Wassily Kandinsky

Page 28: Deep Learning behind Prisma

Balance content and style

• Weights of content and style: hyper parameter

• Search multiple combinations to satisfy personal aesthetic

Page 29: Deep Learning behind Prisma

Photorealistic style transfer

New York London

Page 30: Deep Learning behind Prisma

Drawbacks

• Iterative optimization

• Slow: 65s to render the 600 * 400 arch image with GTX 980M

• Power consuming: not acceptable for mobile apps like Prisma

Page 31: Deep Learning behind Prisma

Real-Time Style Transfer

Page 32: Deep Learning behind Prisma

Intuition

• Style transfer is essentially a image transformation problem: image in, image out

• Generative CNN’s are proved to be powerful in many other image transformation problems

Page 33: Deep Learning behind Prisma

Goal

• For a specific style image a, train a CNN that

• Accepts a content image p as input

• Outputs a synthesized image x has content similar to p and style similar to a

Page 34: Deep Learning behind Prisma

Generative CNN

• Pre trained VggNet for formulating the loss function • Style target: a fixed style image, e.g. starry night • Input image & content target: images sampled from a large dataset • Image Transform Net: fully convolutional network (and some fancy new staffs)

Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. "Perceptual losses for real-time style transfer and super-resolution." arXiv preprint arXiv:1603.08155 (2016).

Page 35: Deep Learning behind Prisma

Details & Improvements

• Image size 256 * 256

• Trained on a large image dataset for 4h with GTX Titan X

• 200 ~ 1000X rendering speedup

Page 36: Deep Learning behind Prisma

Some results

Page 37: Deep Learning behind Prisma

Con’d

Page 38: Deep Learning behind Prisma

Comparison

• Original neural style: hundreds of optimization iterations

• Generative CNN: tens of thousands of training iterations, one forward pass for synthesize

• Prisma's offline mode probably uses similar technologies

Page 39: Deep Learning behind Prisma

Parallel work — Texture Network

Ulyanov, Dmitry, et al. "Texture Networks: Feed-forward Synthesis of Textures and Stylized Images." arXiv preprint arXiv:1603.03417 (2016).

Page 40: Deep Learning behind Prisma

Take home• What make up a CNN

• Convolution, pooling, fully connected layer...

• How neural style works

• CNN for feature extraction & iterative optimization

• Fast style transfer

• Train a generative CNN for a specific style

Page 41: Deep Learning behind Prisma
Page 42: Deep Learning behind Prisma

Some open course resources• Introduction to Computer Vision, Udacity

• Deep Learning, Udacity

• Convolutional Neural Networks for Visual Recognition, Stanford CS231n *

• Deep Learning for Natural Language Processing, Stanford CS224d