all you want to know about cnnsfidler/teaching/2015/slides/... · if f ∈ b then af ∈ b for all...

79
All You Want To Know About CNNs Yukun Zhu

Upload: others

Post on 28-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

All You Want To Know About CNNsYukun Zhu

Page 2: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning

Page 3: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning

Image from http://imgur.com/

Page 4: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning

Image from http://imgur.com/

Page 5: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning

Image from http://imgur.com/

Page 6: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning

Image from http://imgur.com/

Page 7: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning

Image from http://imgur.com/

Page 8: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning

Image from http://imgur.com/

Page 9: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning in Vision

33.4

DPM(2010)

Object detection performance, PASCAL VOC 2010

Page 10: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

Object detection performance, PASCAL VOC 2010

Page 11: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

Object detection performance, PASCAL VOC 2010

Page 12: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

Object detection performance, PASCAL VOC 2010

62.9

RCNN*(Oct 2014)

Page 13: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

67.2

segRCNN(Jan 2015)

Object detection performance, PASCAL VOC 2010

62.9

RCNN*(Oct 2014)

Page 14: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Deep Learning in Vision

33.4

DPM(2010)

40.4

segDPM(2014)

53.7

RCNN(2014)

67.2

segRCNN(Jan 2015)

Object detection performance, PASCAL VOC 2010

62.9

RCNN*(Oct 2014)

70.8

Fast RCNN(Jun 2015)

Page 15: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

A Neuron

Image from http://cs231n.github.io/neural-networks-1/

Page 16: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

A Neuron in Neural Network

Image from http://cs231n.github.io/neural-networks-1/

Page 17: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Activation Functions● Sigmoid: f(x) = 1 / (1 + e

-x

)

● ReLU: f(x) = max(0, x)

● Leaky ReLU: f(x) = max(ax, x)

● Maxout: f(x) = max(w

0

x + b

0

, w

1

x + b

1

)

● and many others…

Page 18: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Neural Network (MLP)

Image modified from http://cs231n.github.io/neural-networks-1/

The network simulates a function y = f(x; w)

Page 19: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-3.00

-2.00

-3.00

Page 20: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

Page 21: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

Page 22: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

Page 23: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00

Page 24: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00

Page 25: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37

Page 26: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37

Page 27: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Forward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

Page 28: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Loss FunctionLoss function measures how well prediction matches true value

Commonly used loss function:

● Squared loss: (y - y’)

2

● Cross-entropy loss: -sum

i

(y

i

’ * log(y

i

))

● and many others

Page 29: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Loss FunctionDuring training, we would like to minimize the total loss on a set

of training data

● We want to find w* = argmin{sum

i

[loss(f(x

i

; w), y

i

)]}

Page 30: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Loss FunctionDuring training, we would like to minimize the total loss on a set

of training data

● We want to find w* = argmin{sum

i

[loss(f(x

i

; w), y

i

)]}

● Usually we use gradient based approach

○ w

t+1

= w

t

- a∇w

Page 31: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00

Page 32: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53

f = 1/xdf/dx = -1/x2

Page 33: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53

f = x + 1df/dx = 1

Page 34: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.20

f = ex

df/dx = ex

Page 35: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

f = -xdf/dx = -1

Page 36: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

0.20

0.20

f = x + adf/dx = 1

Page 37: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

0.20

0.20

0.20

0.20

Page 38: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Backward Computation

Image and code modified from http://cs231n.github.io/optimization-2/

f(x

0

, x

1

) = 1 / (1 + exp(-(w

0

x

0

+ w

1

x

1

+ w

2

)))

x

0

x

1

1

sigmoid

2.00

-1.00

-2.00

-3.00

-2.00

-3.00

6.00

4.00

1.00 -1.00 0.37 1.37 0.73

1.00-0.53-0.53-0.200.20

0.20

0.20

0.200.40

-0.20

-0.40

-0.60

0.20

f = axdf/dx = a

Page 39: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Why NNs?

Page 40: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Universal Approximation TheoremA feed-forward network with a single hidden layer containing a

finite number of neurons, can approximate continuous functions

on compact subsets of R

n

, under mild assumptions on the

activation function.

https://en.wikipedia.org/wiki/Universal_approximation_theorem

Page 41: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Stone’s Theorem● Suppose X is a compact Hausdorff space and B is a

subalgebra in C(X, R) such that:

○ B separates points.

○ B contains the constant function 1.

○ If f ∈ B then af ∈ B for all constants a ∈ R.

○ If f, g ∈ B, then f + g, max{f, g} ∈ B.

● Then every continuous function defined on C(X, R) can be

approximated as closely as desired by functions in B

Page 42: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Why CNNs?

Page 43: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Problems of MLP in VisionFor input as a 10 * 10 image:

● A 3 layer MLP with 200 hidden units contains ~100k

parameters

For input as a 100 * 100 image:

● A 1 layer MLP with 20k hidden units contains ~200m

parameters

Page 44: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Can We Do Better?

Page 45: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Can We Do Better?

Page 46: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Can We Do Better?

Page 47: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Can We Do Better?

Page 48: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Can We Do Better?Based on such observation, MLP can be improved in two ways:

● Locally connected instead of fully connected

● Sharing weights between neurons

We achieve those by using convolution neurons

Page 49: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Convolutional Layers

Image from http://cs231n.github.io/convolutional-networks/

Page 50: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Convolutional Layers

Image from http://cs231n.github.io/convolutional-networks/. See this page for an excellent example of convolution.

depth

height

width

Page 51: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Pooling Layers

Image from http://cs231n.github.io/convolutional-networks/

Page 52: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Pooling Layers Example: Max Pooling

Image from http://cs231n.github.io/convolutional-networks/

Page 53: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Pooling LayersCommonly used pooling layers:

● Max pooling

● Average pooling

Why pooling layers?

● Reduce activation dimensionality

● Robust against tiny shifts

Page 54: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

CNN Architecture: An Example

Image from http://cs231n.github.io/convolutional-networks/

Page 55: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Layer Activations for CNNs

Image modified from http://cs231n.github.io/convolutional-networks/

Conv:1 ReLU:1 Conv:2 ReLU:2 MaxPool:1 Conv:3

Page 56: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Layer Activations for CNNs

Image modified from http://cs231n.github.io/convolutional-networks/

MaxPool:2 Conv:5 ReLU:5 Conv:6 ReLU:6 MaxPool:3

Page 57: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Learnt Weights for CNNs: First Conv Layer of AlexNet

Image from http://cs231n.github.io/convolutional-networks/

Page 58: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Why CNNs Work Now?

Page 59: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Convolutional Neural Networks● Faster heterogeneous parallel computing

○ CPU clusters, GPUs, etc.

● Large dataset

○ ImageNet: 1.2m images of 1,000 object classes

○ CoCo: 300k images of 2m object instances

● Improvements in model architecture

○ ReLU, dropout, inception, etc.

Page 60: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

AlexNet

Krizhevsky, Alex, et al. "Imagenet classification with deep convolutional neural networks." NIPS 2012

Page 61: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

GoogLeNet

Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).

Page 62: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Quiz# of parameters for the first conv layer of AlexNet?

Page 63: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Quiz# of parameters if the first layer is fully-connected?

Page 64: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

QuizGiven a convolution operation written as

f(x

3x3

; w

3x3

, b) = sum

i,j

(x

i,j

w

i,j

) + b

Can you derive its gradients (df/dx, df/dw, df/db)?

Page 65: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Ready to Build Your Own Networks?

Page 66: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs ● Know your data, clean your data, and normalize your data

○ A common trick: subtract the mean and divide by its std.

Image from http://cs231n.github.io/neural-networks-2/

Page 67: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs ● Augment your data

Page 68: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs ● Organize your data:

○ Keep training data balanced

○ Shuffle data before batching

● Feed your data in the correct way

○ Image channel order

○ Tensor storage order

Page 69: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs

First Order, in order. First Order, out of order.

Page 70: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs Common tensor storage order:

● BDRC

○ Used in Caffe, Torch, Theano, supported by CuDNN

○ Pros: faster for convolution (FFT, memory access)

● BRCD

○ Used in TensorFlow, limited support by CuDNN

○ Pros: Fast batch normalization, easier batching

Page 71: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs Designing model architecture

● Convolution, max pooling, then fully connected layers

● Nonlinearity

○ Stay away from sigmoid (except for output)

○ ReLU preferred

○ Leaky ReLU after

○ Use Maxout if most ReLU units die (have zero activation)

Page 72: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs Setting parameters

● Weights

○ Random initialization with proper variance

● Biases

○ For ReLU we prefer a small positive bias to activate ReLU

Page 73: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs Setting hyperparameters

● Learning Rate / Momentum (Δw

t*

= Δw

t

+ mΔw

t-1

)

○ Decrease learning rate while training

○ Setting momentum to 0.8 - 0.9

● Batch Size

○ For large dataset: set to whatever fits your memory

○ For smaller dataset: find a tradeoff between instance

randomness and gradient smoothness

Page 74: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs Monitoring your training:

● Split your dataset to training, validation and test

○ Optimize your hyperparameter in val and evaluate on test

○ Keep track of training and validation loss during training

○ Do early stopping if training and validation loss diverge

○ Loss doesn’t tell you all. Try precision, class-wise precision, and

more

Page 75: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs Borrow knowledge from another dataset

● Pre-train your CNN on a large dataset (e.g. ImageNet)

● Remove / reshape the last a few layers

● Fix the parameters of first a few layers, or make the learning

rate small for them

● Fine-tune the parameters on your own dataset

Page 76: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Tips and Tricks for CNNs Debugging

● import unittest, not import pdb

● Check your gradient [**deprecated**]

● Make your model large enough, and try overfitting training

● Check gradient norms, weight norms, and activation norms

Page 77: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Talk is Cheap, Show Me Some Code

Page 78: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Image from http://www.linkresearchtools.com/

Page 79: All You Want To Know About CNNsfidler/teaching/2015/slides/... · If f ∈ B then af ∈ B for all constants a ∈ R. If f, g ∈ B, then f + g, max{f, g} ∈ B. Then every continuous

Fully Convolutional Networks

Long, Jonathan, et al. "Fully convolutional networks for semantic segmentation." arXiv preprint arXiv:1411.4038 (2014).