introduction to machine learning10701/slides/9_multilayer_perceptron.pdf · 2017-09-28 ·...

Introduction to Machine Learning

Multilayer Perceptron

Barnabás Póczos

2

The Multilayer Perceptron

3


4

Dean A. Pomerleau, Carnegie Mellon University, 1989

ALVINN: AN AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK

Training: using simulated road generator

5

Gradient Descent

We want to solve:

6

Starting Point

7

Starting Point

8

Fixed step size can be too big

9

Fixed step size can be too small

12

Character Recognition with MLP

Matlab: appcr1

13

The network

Noise-free input:

26 different letters ofsize 7x5

14

Noisy inputs

15

% Create MLP

hiddenlayers=[10, 25];

net1 = feedforwardnet(hiddenlayers);

net1 = configure(net1,X,T);

%View

view(net1);

%Train

net1 = train(net1,X,T);

%Test

Y1 = net1(Xtest);

Matlab MLP Training

16

Prediction errors

▪ Network 1 was trained on clean images

▪ Network 2 was trained on noisy images. 30 noisy copies of each letter are created

17

The Backpropagation Algorithm

18


19

The gradient of the error

20

Notation

21

Some observations

22

The backpropagated error

23


Lemma

24


Therefore,

25

The backpropagation algorithm

33

What functions can multilayer perceptrons represent?

34

Perceptrons cannot represent the XOR function

f(0,0)=1, f(1,1)=1, f(0,1)=0, f(1,0)=0

What functions can multilayer perceptrons represent?

35

“Solve 7-th degree equation using continuous functions of two parameters.”

Related conjecture:

Let f be a function of 3 arguments such that

Prove that f cannot be rewritten as a composition of finitely many functions of two arguments.

Another rewritten form:

Prove that there is a nonlinear continuous system of three variables that cannot be decomposed with finitely many functions of two variables.

Hilbert’s 13th Problem

1902: 23 “most important” problems in mathematics

The 13th Problem:

Conjecture: It can’t be solved…

36

f(x,y,z)=Φ1(ψ1(x), ψ2(y))+Φ2(c1ψ3(y)+c2ψ4(z),x)

x

z

y

Σ

Φ1

Φ2

ψ1

ψ2

ψ3

ψ4

Σ

f(x,y,z)

c1

c2

Function decompositions

37

1957, Arnold disproves Hilbert’s conjecture.


38

Issues: This statement is not constructive.


Corollary:

39

Kur Hornik, Maxwell Stinchcombe and Halber White: “Multilayer feedforward networks are universal approximators”, Neural Networks, Vol:2(3), 359-366, 1989

Universal Approximators

Definition: ΣN(g) neural network with 1 hidden layer:

Theorem:

Definition:

40

Theorem: (Blum & Li, 1991)

Universal Approximators

Definition:

Formal statement:

41

Integral approximation in 1-dim:

i

jii

Proof

xi

xi xi

Integral approximation in 2-dim:

GOAL:

42

xi

xi xi

The indicator function of Xi polygon can be learned by this neural network:

1 if x is in Xi

-1 otherwise

Proof

The weighted linear combination of these indicator functions will be a

good approximation of the original function f

GOAL:

43

Proof

This linear equation can also be solved.

44

Thanks for your attention!

introduction to machine learning10701/slides/9_multilayer_perceptron.pdf · 2017-09-28 ·...

Documents