learning functions and neural networks ii 24-787 lecture 9 luoting fu spring 2012
TRANSCRIPT
Learning Functions and Neural Networks II
24-787 Lecture 9
Luoting Fu Spring 2012
2
Previous lecture
Applications
Physiological basis
Demos
Perceptron
fH(x)
Input 0 Input 1
W0 W1
+
Output
Wb
Y = u(W0X0 + W1X1 + Wb)
Y
X0 X1
Δ Wi = η (Y0-Y) Xi
x
fH
3
In this lecture• Multilayer perceptron
(MLP)
– Representation– Feed forward– Back-propagation
• Break
• Case studies• Milestones & forefront 2
4
Perceptron
ABCD⋮Z
A 400-26 perceptron
© Springer
5
XORExclusive OR
6
Root cause
Consider a 2-1 perceptron,
𝑦=𝜎 (𝑤1𝑥1+𝑤2𝑥2+𝑤0 )
Let 𝑦=0.5 ,
𝑤1𝑥1+𝑤2 𝑥2
fH(x)
Input 0 Input 1
W0 W1
+
Output
Wb
7
A single perceptron is limited to learning linearly separable cases.
Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.
8
9
Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314
An MLP can learn any continuous function.
A single perceptron is limited to learning linearly separable cases (linear function).
10
How’s that relevant?
Function approximation Intelligence
Waveform
Words
Recognition
The road ahead SpeedBearing
Wheel turnPedal depression
Regression
∈
11
12
0
13
1
14
2
¿
15
3
16
3
17
∞
18
19
Matrix representation
�⃗�∈ℝ𝐷
𝑤( 1)∈ℝ𝑀×𝐷
𝑧=h(𝑤 (1 ) �⃗�)∈ℝ𝑀
𝑤( 2)∈ℝ𝐾×𝑀
𝑦=𝜎 (𝑤( 2 ) �⃗�)∈ℝ𝐾
20
Knowledge learned by anMLP is encoded in its layers of weights.
21
What does it learn?• Decision boundary perspective
22
What does it learn?• Highly non-linear decision boundaries
23
What does it learn?• Real world decision boundaries
24
Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314
An MLP can learn any continuous function.
Think Fourier.
25
What does it learn?• Weight perspective
An 64-M-3 MLP
�⃗�∈ℝ𝐷
𝑤( 1)∈ℝ𝑀×𝐷
𝑧=h(𝑤 (1 ) �⃗�)∈ℝ𝑀
𝑤( 2)∈ℝ𝐾×𝑀
𝑦=𝜎 (𝑤( 2 ) �⃗�)∈ℝ𝐾
26
How does it learn?
• From examples
• By back propagation
0 1 2 3 4
5 6 7 8 9 Polar bear Not a polar bear
27
Back propagation
28
Gradient descent
“epoch”
29
30
Back propagation
31
Back propagation• Steps
Think about this:What happens when you train a 10-layer MLP?
32
Overfitting and cross-validation
Learning curveerror
33
Break
34
Design considerations
• Learning task• X - input• Y - output• D• M• K• #layers• Training epochs• Training data
– #– Source
35
Case study 1: digit recognition
28
28
An 768-1000-10 MLP
36
Case study 1: digit recognition
37
Milestones: a race to 100% accuracy on MNIST
38
Milestones: a race to 100% accuracy on MNIST
CLASSIFIER ERROR RATE (%) Reported by
Perceptron 12.0 LeCun et al. 1998
2-layer NN, 1000 hidden units 4.5 LeCun et al. 1998
5-layer Convolutional net 0.95 LeCun et al. 1998
5-layer Convolutional net 0.4 Simard et al. 2003
6-layer NN 784-2500-2000-1500-1000-500-10 (on GPU) 0.35 Ciresan et al. 2010
See full list at http://yann.lecun.com/exdb/mnist/
39
Milestones: a race to 100% accuracy on MNIST
40
Milestones: a race to 100% accuracy on MNIST
41
Case study 2: sketch recognition
?
42
Case study 2: sketch recognition• Convolutional neural network
Convolution Sub-sampling Product Matrices Element of a vector
Or
ScopeTransf. Fun.GainSumSine wave…
(LeCun, 1998)
43
Case study 2: sketch recognition
Case study 2: sketch recognition
44
45
Case study 3: autonomous driving
Pomerleau, 1995
46
Case study 4: sketch beautification
Orbay and Kara, 2011
47
Case study 4: sketch beautification
48
Case study 4: sketch beautification
49
Research forefront• Deep belief network
– Critique, or classify– Create, synthesize
Demo at: http://www.cs.toronto.edu/~hinton/adi/index.htm
50
In summary
1.Powerful machinery2.Feed-forward3.Back propagation4.Design considerations