learning functions and neural networks ii 24-787 lecture 9 luoting fu spring 2012

50
Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

Upload: misael-tipton

Post on 15-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

Learning Functions and Neural Networks II

24-787 Lecture 9

Luoting Fu Spring 2012

Page 2: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

2

Previous lecture

Applications

Physiological basis

Demos

Perceptron

fH(x)

Input 0 Input 1

W0 W1

+

Output

Wb

Y = u(W0X0 + W1X1 + Wb)

Y

X0 X1

Δ Wi = η (Y0-Y) Xi

x

fH

Page 3: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

3

In this lecture• Multilayer perceptron

(MLP)

– Representation– Feed forward– Back-propagation

• Break

• Case studies• Milestones & forefront 2

Page 4: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

4

Perceptron

ABCD⋮Z

A 400-26 perceptron

© Springer

Page 5: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

5

XORExclusive OR

Page 6: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

6

Root cause

Consider a 2-1 perceptron,

𝑦=𝜎 (𝑤1𝑥1+𝑤2𝑥2+𝑤0 )

Let   𝑦=0.5 ,

𝑤1𝑥1+𝑤2 𝑥2

fH(x)

Input 0 Input 1

W0 W1

+

Output

Wb

Page 7: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

7

A single perceptron is limited to learning linearly separable cases.

Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.

Page 8: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

8

Page 9: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

9

Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314

An MLP can learn any continuous function.

A single perceptron is limited to learning linearly separable cases (linear function).

Page 10: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

10

How’s that relevant?

Function approximation Intelligence

Waveform

Words

Recognition

The road ahead SpeedBearing

Wheel turnPedal depression

Regression

Page 11: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

11

Page 12: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

12

0

Page 13: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

13

1

Page 14: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

14

2

¿

Page 15: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

15

3

Page 16: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

16

3

Page 17: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

17

Page 18: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

18

Page 19: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

19

Matrix representation

�⃗�∈ℝ𝐷

𝑤( 1)∈ℝ𝑀×𝐷

𝑧=h(𝑤 (1 ) �⃗�)∈ℝ𝑀

𝑤( 2)∈ℝ𝐾×𝑀

𝑦=𝜎 (𝑤( 2 ) �⃗�)∈ℝ𝐾

Page 20: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

20

Knowledge learned by anMLP is encoded in its layers of weights.

Page 21: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

21

What does it learn?• Decision boundary perspective

Page 22: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

22

What does it learn?• Highly non-linear decision boundaries

Page 23: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

23

What does it learn?• Real world decision boundaries

Page 24: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

24

Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314

An MLP can learn any continuous function.

Think Fourier.

Page 25: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

25

What does it learn?• Weight perspective

An 64-M-3 MLP

�⃗�∈ℝ𝐷

𝑤( 1)∈ℝ𝑀×𝐷

𝑧=h(𝑤 (1 ) �⃗�)∈ℝ𝑀

𝑤( 2)∈ℝ𝐾×𝑀

𝑦=𝜎 (𝑤( 2 ) �⃗�)∈ℝ𝐾

Page 26: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

26

How does it learn?

• From examples

• By back propagation

0 1 2 3 4

5 6 7 8 9 Polar bear Not a polar bear

Page 27: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

27

Back propagation

Page 28: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

28

Gradient descent

“epoch”

Page 29: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

29

Page 30: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

30

Back propagation

Page 31: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

31

Back propagation• Steps

Think about this:What happens when you train a 10-layer MLP?

Page 32: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

32

Overfitting and cross-validation

Learning curveerror

Page 33: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

33

Break

Page 34: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

34

Design considerations

• Learning task• X - input• Y - output• D• M• K• #layers• Training epochs• Training data

– #– Source

Page 35: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

35

Case study 1: digit recognition

28

28

An 768-1000-10 MLP

Page 36: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

36

Case study 1: digit recognition

Page 37: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

37

Milestones: a race to 100% accuracy on MNIST

Page 38: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

38

Milestones: a race to 100% accuracy on MNIST

CLASSIFIER ERROR RATE (%) Reported by

Perceptron 12.0 LeCun et al. 1998

2-layer NN, 1000 hidden units 4.5 LeCun et al. 1998

5-layer Convolutional net 0.95 LeCun et al. 1998

5-layer Convolutional net 0.4 Simard et al. 2003

6-layer NN 784-2500-2000-1500-1000-500-10 (on GPU) 0.35 Ciresan et al. 2010

See full list at http://yann.lecun.com/exdb/mnist/

Page 39: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

39

Milestones: a race to 100% accuracy on MNIST

Page 40: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

40

Milestones: a race to 100% accuracy on MNIST

Page 41: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

41

Case study 2: sketch recognition

?

Page 42: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

42

Case study 2: sketch recognition• Convolutional neural network

Convolution Sub-sampling Product Matrices Element of a vector

Or

ScopeTransf. Fun.GainSumSine wave…

(LeCun, 1998)

Page 43: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

43

Case study 2: sketch recognition

Page 44: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

Case study 2: sketch recognition

44

Page 45: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

45

Case study 3: autonomous driving

Pomerleau, 1995

Page 46: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

46

Case study 4: sketch beautification

Orbay and Kara, 2011

Page 47: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

47

Case study 4: sketch beautification

Page 48: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

48

Case study 4: sketch beautification

Page 49: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

49

Research forefront• Deep belief network

– Critique, or classify– Create, synthesize

Demo at: http://www.cs.toronto.edu/~hinton/adi/index.htm

Page 50: Learning Functions and Neural Networks II 24-787 Lecture 9 Luoting Fu Spring 2012

50

In summary

1.Powerful machinery2.Feed-forward3.Back propagation4.Design considerations