lecture 7 convolutional neural networks -...

97
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 Lecture 7 Convolutional Neural Networks CMSC 35246

Upload: phambao

Post on 31-Mar-2018

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Lecture 7Convolutional Neural Networks

CMSC 35246: Deep Learning

Shubhendu Trivedi&

Risi Kondor

University of Chicago

April 17, 2017

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 2: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

We saw before:

x1 x2 x3 x4

y

A series of matrix multiplications:

x 7→W T1 x 7→ h1 = f(W T

1 x) 7→W T2 h1 7→ h2 = f(W T

2 h1) 7→W T

3 h2 7→ h3 = f(W T3 h3) 7→W T

4 h3 = y

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 3: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolutional Networks

Neural Networks that use convolution in place of generalmatrix multiplication in atleast one layer

Next:

• What is convolution?• What is pooling?• What is the motivation for such architectures (remember

LeNet?)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 4: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

LeNet-5 (LeCun, 1998)

The original Convolutional Neural Network model goes backto 1989 (LeCun)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 5: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

AlexNet (Krizhevsky, Sutskever, Hinton 2012)

ImageNet 2012 15.4% error rate

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 6: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolutional Neural Networks

Figure: Andrej Karpathy

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 7: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Now let’s deconstruct them...

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 8: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Grayscale Image

Kernel

w1 w2 w3

w4 w5 w6

w7 w8 w9

Feature Map

Convolve image with kernel having weights w (learned bybackpropagation)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 9: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 10: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 11: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 12: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 13: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 14: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 15: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 16: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 17: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 18: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 19: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 20: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 21: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 22: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 23: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 24: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 25: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 26: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 27: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 28: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 29: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 30: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 31: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 32: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 33: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 34: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 35: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 36: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 37: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 38: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 39: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolution

wTx

What is the number of parameters?

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 40: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Output Size

We used stride of 1, kernel with receptive field of size 3 by 3

Output size:N −KS

+ 1

In previous example: N = 6,K = 3, S = 1, Output size = 4

For N = 8,K = 3, S = 1, output size is 6

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 41: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Zero Padding

Often, we want the output of a convolution to have the samesize as the input. Solution: Zero padding.

In our previous example:

0 0 0 0 0 0 0 0

0

0

0

0

0

0

00 0 0 0 0 0 0 0

0

0

0

0

0

0

0

Common to see convolution layers with stride of 1, filters ofsize K, and zero padding with K−1

2 to preserve size

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 42: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Learn Multiple Filters

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 43: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Learn Multiple Filters

If we use 100 filters, we get 100 feature maps

Figure: I. Kokkinos

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 44: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

In General

We have only considered a 2-D image as a running example

But we could operate on volumes (e.g. RGB Images would bedepth 3 input, filter would have same depth)

Image from Wikipedia

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 45: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

In General: Output Size

For convolutional layer:

• Suppose input is of size W1 ×H1 ×D1

• Filter size is K and stride S• We obtain another volume of dimensions W2 ×H2 ×D2

• As before:

W2 =W1 −K

S+ 1 and H2 =

H1 −KS

+ 1

• Depths will be equal

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 46: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolutional Layer Parameters

Example volume: 28× 28× 3 (RGB Image)100 3× 3 filters, stride 1What is the zero padding needed to preserve size?Number of parameters in this layer?For every filter: 3× 3× 3 + 1 = 28 parametersTotal parameters: 100× 28 = 2800

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 47: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Figure: Andrej Karpathy

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 48: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Non-Linearity

max{0,wTx}

After obtaining feature map, apply an elementwisenon-linearity to obtain a transformed feature map (same size)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 49: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Figure: Andrej Karpathy

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 50: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 51: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling

max{ai}

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 52: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling

max{ai}

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 53: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling

max{ai}

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 54: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling

max{ai}

Other options: Average pooling, L2-norm pooling, randompooling

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 55: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling

We have multiple feature maps, and get an equal number ofsubsampled maps

This changes if cross channel pooling is done

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 56: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

So what’s left: Fully Connected Layers

Figure: Andrej Karpathy

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 57: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

LeNet-5

Filters are of size 5× 5, stride 1

Pooling is 2× 2, with stride 2

How many parameters?

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 58: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

AlexNet

Input image: 227 X 227 X 3

First convolutional layer: 96 filters with K = 11 applied withstride = 4

Width and height of output: 227−114 + 1 = 55

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 59: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

AlexNet

Number of parameters in first layer?

11 X 11 X 3 X 96 = 34848

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 60: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

AlexNet

Next layer: Pooling with 3 X 3 filters, stride of 2

Size of output volume: 27

Number of parameters?

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 61: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

AlexNet

Popularized the use of ReLUs

Used heavy data augmentation (flipped images, random cropsof size 227 by 227)

Parameters: Dropout rate 0.5, Batch size = 128, Weightdecay term: 0.0005 ,Momentum term α = 0.9, learning rate η= 0.01, manually reduced by factor of ten on monitoringvalidation loss.

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 62: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Short Digression: How do the features look like?

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 63: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Layer 1 filters

This and the next few illustrations are from Rob Fergus

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 64: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Layer 2 Patches

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 65: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Layer 2 Patches

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 66: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Layer 3 Patches

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 67: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Layer 3 Patches

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 68: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Layer 4 Patches

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 69: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Layer 4 Patches

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 70: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Evolution of Filters

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 71: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Evolution of Filters

Caveat?

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 72: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Back to Architectures

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 73: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

ImageNet 2013

Was won by a network similar to AlexNet (Matthew Zeilerand Rob Fergus)

Changed the first convolutional layer from 11 X 11 with strideof 4, to 7 X 7 with stride of 2

AlexNet used 384, 384 and 256 layers in the next threeconvolutional layers, ZF used 512, 1024, 512

ImageNet 2013: 14.8 % (reduced from 15.4 %) (top 5 errors)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 74: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

VGGNet(Simonyan and Zisserman, 2014)

Best model: Column D.

Error: 7.3 % (top five error)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 75: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

VGGNet(Simonyan and Zisserman, 2014)

Total number of parameters: 138 Million (calculate!)

Memory (Karpathy): 24 Million X 4 bytes ≈ 93 MB per image

For backward pass the memory usage is doubled per image

Observations:

• Early convolutional layers take most memory• Most parameters are in the fully connected layers

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 76: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Going Deeper

Figure: Kaiming He, MSR

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 77: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Network in Network

M. Lin, Q. Chen, S. Yan, Network in Network, ICLR 2014

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 78: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Google LeNet

C.

Szegedy et al, Going Deeper With Convolutions, CVPR 2015

Error: 6.7 % (top five error)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 79: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

The Inception Module

Parallel paths with different receptive field sizes - capturesparse patterns of correlation in stack of feature maps

Also include auxiliary classifiers for ease of training

Also note 1 by 1 convolutions

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 80: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Google LeNet

C. Szegedy et al, Going Deeper With Convolutions, CVPR 2015

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 81: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Google LeNet

Has 5 Million or 12X fewer parameters than AlexNet

Gets rid of fully connected layers

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 82: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Inception v2, v3

C. Szegedy et al, Rethinking the Inception Architecture for Computer Vision, CVPR 2016

Use Batch Normalization during training to reducedependence on auxiliary classifiers

More aggressive factorization of filters

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 83: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Why do CNNs make sense? (Brain Stuff next time)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 84: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Convolutions: Motivation

Convolution leverages four ideas that can help ML systems:

• Sparse interactions• Parameter sharing• Equivariant representations• Ability to work with inputs of variable size

Sparse Interactions

• Plain Vanilla NN (y ∈ Rn, x ∈ Rm): Need matrixmultiplication y = Wx to compute activations for eachlayer (every output interacts with every input)

• Convolutional networks have sparse interactions bymaking kernel smaller than input

• =⇒ need to store fewer parameters, computing outputneeds fewer operations (O(m× n) versus O(k × n))

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 85: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Motivation: Sparse Connectivity

x1 x2 x3 x4 x5 x6

h1 h2 h3 h4 h5 h6

Fully connected network: h3 is computed by full matrixmultiplication with no sparse connectivity

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 86: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Motivation: Sparse Connectivity

x1 x2 x3 x4 x5 x6

h1 h2 h3 h4 h5 h6

Kernel of size 3, moved with stride of 1

h3 only depends on x2, x3, x4

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 87: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Motivation: Sparse Connectivity

x1 x2 x3 x4 x5 x6

h1 h2 h3 h4 h5 h6

s1 s2 s3 s4 s5 s6

Connections in CNNs are sparse, but units in deeper layers areconnected to all of the input (larger receptive field sizes)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 88: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Motivation: Parameter Sharing

Plain vanilla NN: Each element of W is used exactly once tocompute output of a layer

In convolutional networks, parameters are tied: weight appliedto one input is tied to value of a weight applied elsewhere

Same kernel is used throughout the image, so instead learninga parameter for each location, only a set of parameters islearnt

Forward propagation remains unchanged O(k × n)Storage improves dramatically as k � m,n

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 89: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Motivation: Equivariance

Let’s first formally define convolution:

s(t) = (x ∗ w)(t) =∫x(a)w(t− a)da

In Convolutional Network terminology x is referred to asinput, w as the kernel and s as the feature map

Discrete Convolution:

S(i, j) = (I ∗K)(i, j) =∑m

∑n

I(m,n)K(i−m, j − n)

Convolution is commutative, thus:

S(i, j) = (I ∗K)(i, j) =∑m

∑n

I(i−m, j − n)K(m,n)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 90: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Aside

The latter is usually more straightforward to implement in MLlibraries (less variation in range of valid values of m and n)

Neither are usually used in practice in Neural Networks

Libraries implement Cross Correlation, same as convolution,but without flipping the kernel

S(i, j) = (I ∗K)(i, j) =∑m

∑n

I(i+m, j + n)K(m,n)

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 91: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Motivation: Equivariance

Equivariance: f is equivariant to g if f(g(x)) = g(f(x))

The form of parameter sharing used by CNNs causes eachlayer to be equivariant to translation

That is, if g is any function that translates the input, theconvolution function is equivariant to g

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 92: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Motivation: Equivariance

Implication: While processing time series data, convolutionproduces a timeline that shows when different featuresappeared (if an event is shifted in time in the input, the samerepresentation will appear in the output)

Images: If we move an object in the image, its representationwill move the same amount in the output

This property is useful when we know some local function isuseful everywhere (e.g. edge detectors)

Convolution is not equivariant to other operations such aschange in scale or rotation

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 93: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling: Motivation

Pooling helps the representation become slightly invariant tosmall translations of the input

Reminder: Invariance: f(g(x)) = f(x)

If input is translated by small amount: values of most pooledoutputs don’t change

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 94: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling: Invariance

Figure: Goodfellow et al.

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 95: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Pooling

Invariance to local translation can be useful if we care moreabout whether a certain feature is present rather than exactlywhere it is

Pooling over spatial regions produces invariance to translation,what if we pool over separately parameterized convolutions?

Features can learn which transformations to become invariantto (Example: Maxout Networks, Goodfellow et al 2013)

One more advantage: Since pooling is used for downsampling,it can be used to handle inputs of varying sizes

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 96: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Next time

More Architectures

Variants on the CNN idea

More motivation

Group Equivariance

Equivariance to Rotation

Lecture 7 Convolutional Neural Networks CMSC 35246

Page 97: Lecture 7 Convolutional Neural Networks - …ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture7_flat.pdf · Convolutional Networks Neural Networks that use convolution in place of

Quiz!

Lecture 7 Convolutional Neural Networks CMSC 35246