introduction to machine learning10701/slides/9_multilayer_perceptron.pdf · 2017-09-28 ·...
TRANSCRIPT
Introduction to Machine Learning
Multilayer Perceptron
Barnabás Póczos
2
The Multilayer Perceptron
3
Multilayer Perceptron
4
Dean A. Pomerleau, Carnegie Mellon University, 1989
ALVINN: AN AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK
Training: using simulated road generator
5
Gradient Descent
We want to solve:
6
Starting Point
7
Starting Point
8
Fixed step size can be too big
9
Fixed step size can be too small
10
11
12
Character Recognition with MLP
Matlab: appcr1
13
The network
Noise-free input:
26 different letters ofsize 7x5
14
Noisy inputs
15
% Create MLP
hiddenlayers=[10, 25];
net1 = feedforwardnet(hiddenlayers);
net1 = configure(net1,X,T);
%View
view(net1);
%Train
net1 = train(net1,X,T);
%Test
Y1 = net1(Xtest);
Matlab MLP Training
16
Prediction errors
▪ Network 1 was trained on clean images
▪ Network 2 was trained on noisy images. 30 noisy copies of each letter are created
17
The Backpropagation Algorithm
18
Multilayer Perceptron
19
The gradient of the error
20
Notation
21
Some observations
22
The backpropagated error
23
The backpropagated error
Lemma
24
The backpropagated error
Therefore,
25
The backpropagation algorithm
26
27
28
29
30
31
32
33
What functions can multilayer perceptrons represent?
34
Perceptrons cannot represent the XOR function
f(0,0)=1, f(1,1)=1, f(0,1)=0, f(1,0)=0
What functions can multilayer perceptrons represent?
35
“Solve 7-th degree equation using continuous functions of two parameters.”
Related conjecture:
Let f be a function of 3 arguments such that
Prove that f cannot be rewritten as a composition of finitely many functions of two arguments.
Another rewritten form:
Prove that there is a nonlinear continuous system of three variables that cannot be decomposed with finitely many functions of two variables.
Hilbert’s 13th Problem
1902: 23 “most important” problems in mathematics
The 13th Problem:
Conjecture: It can’t be solved…
36
f(x,y,z)=Φ1(ψ1(x), ψ2(y))+Φ2(c1ψ3(y)+c2ψ4(z),x)
x
z
y
Σ
Φ1
Φ2
ψ1
ψ2
ψ3
ψ4
Σ
f(x,y,z)
c1
c2
Function decompositions
37
1957, Arnold disproves Hilbert’s conjecture.
Function decompositions
38
Issues: This statement is not constructive.
Function decompositions
Corollary:
39
Kur Hornik, Maxwell Stinchcombe and Halber White: “Multilayer feedforward networks are universal approximators”, Neural Networks, Vol:2(3), 359-366, 1989
Universal Approximators
Definition: ΣN(g) neural network with 1 hidden layer:
Theorem:
Definition:
40
Theorem: (Blum & Li, 1991)
Universal Approximators
Definition:
Formal statement:
41
Integral approximation in 1-dim:
i
jii
Proof
xi
xi xi
Integral approximation in 2-dim:
GOAL:
42
xi
xi xi
The indicator function of Xi polygon can be learned by this neural network:
1 if x is in Xi
-1 otherwise
Proof
The weighted linear combination of these indicator functions will be a
good approximation of the original function f
GOAL:
43
Proof
This linear equation can also be solved.
44
Thanks for your attention!