learning functions and neural networks ii 24-787 lecture 9 luoting fu spring 2012

Learning Functions and Neural Networks II

24-787 Lecture 9

Luoting Fu Spring 2012

2

Previous lecture

Applications

Physiological basis

Demos

Perceptron

fH(x)

Input 0 Input 1

W0 W1

+

Output

Wb

Y = u(W0X0 + W1X1 + Wb)

Y

X0 X1

Δ Wi = η (Y0-Y) Xi

x

fH

3

In this lecture• Multilayer perceptron

(MLP)

– Representation– Feed forward– Back-propagation

• Break

• Case studies• Milestones & forefront 2

4

Perceptron

ABCD⋮Z

A 400-26 perceptron

© Springer

5

XORExclusive OR

6

Root cause

Consider a 2-1 perceptron,

𝑦=𝜎 (𝑤1𝑥1+𝑤2𝑥2+𝑤0 )

Let 𝑦=0.5 ,

𝑤1𝑥1+𝑤2 𝑥2

fH(x)

Input 0 Input 1

W0 W1

+

Output

Wb

7

A single perceptron is limited to learning linearly separable cases.

Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.

9

Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314

An MLP can learn any continuous function.

A single perceptron is limited to learning linearly separable cases (linear function).

10

How’s that relevant?

Function approximation Intelligence

Waveform

Words

Recognition

The road ahead SpeedBearing

Wheel turnPedal depression

Regression

∈

14

2

¿

17

∞

19

Matrix representation

�⃗�∈ℝ𝐷

𝑤( 1)∈ℝ𝑀×𝐷

𝑧=h(𝑤 (1 ) �⃗�)∈ℝ𝑀

𝑤( 2)∈ℝ𝐾×𝑀

𝑦=𝜎 (𝑤( 2 ) �⃗�)∈ℝ𝐾

20

Knowledge learned by anMLP is encoded in its layers of weights.

21

What does it learn?• Decision boundary perspective

22

What does it learn?• Highly non-linear decision boundaries

23

What does it learn?• Real world decision boundaries

24

Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314

An MLP can learn any continuous function.

Think Fourier.

25

What does it learn?• Weight perspective

An 64-M-3 MLP

�⃗�∈ℝ𝐷

𝑤( 1)∈ℝ𝑀×𝐷

𝑧=h(𝑤 (1 ) �⃗�)∈ℝ𝑀

𝑤( 2)∈ℝ𝐾×𝑀

𝑦=𝜎 (𝑤( 2 ) �⃗�)∈ℝ𝐾

26

How does it learn?

• From examples

• By back propagation

0 1 2 3 4

5 6 7 8 9 Polar bear Not a polar bear

27

Back propagation

28

Gradient descent

“epoch”

30

Back propagation

31

Back propagation• Steps

Think about this:What happens when you train a 10-layer MLP?

32

Overfitting and cross-validation

Learning curveerror

33

Break

34

Design considerations

• Learning task• X - input• Y - output• D• M• K• #layers• Training epochs• Training data

– #– Source

35

Case study 1: digit recognition

28

28

An 768-1000-10 MLP

36

Case study 1: digit recognition

37

Milestones: a race to 100% accuracy on MNIST

38


CLASSIFIER ERROR RATE (%) Reported by

Perceptron 12.0 LeCun et al. 1998

2-layer NN, 1000 hidden units 4.5 LeCun et al. 1998

5-layer Convolutional net 0.95 LeCun et al. 1998

5-layer Convolutional net 0.4 Simard et al. 2003

6-layer NN 784-2500-2000-1500-1000-500-10 (on GPU) 0.35 Ciresan et al. 2010

See full list at http://yann.lecun.com/exdb/mnist/

http://yann.lecun.com/exdb/publis/index.html






http://research.microsoft.com/~patrice/publi.html



http://arxiv.org/abs/1003.0358



http://yann.lecun.com/exdb/mnist/

http://yann.lecun.com/exdb/mnist/

39


40


41

Case study 2: sketch recognition

?

42

Case study 2: sketch recognition• Convolutional neural network

Convolution Sub-sampling Product Matrices Element of a vector

Or

ScopeTransf. Fun.GainSumSine wave…

(LeCun, 1998)

43



44

45

Case study 3: autonomous driving

Pomerleau, 1995

46

Case study 4: sketch beautification

Orbay and Kara, 2011

47


48


49

Research forefront• Deep belief network

– Critique, or classify– Create, synthesize

Demo at: http://www.cs.toronto.edu/~hinton/adi/index.htm

50

In summary

1.Powerful machinery2.Feed-forward3.Back propagation4.Design considerations

learning functions and neural networks ii 24-787 lecture 9 luoting fu spring 2012

Documents