warm-up example (1)

24
acek 2001,3 010597001 Soft Computing, Lecture 7 Slide 1 Warm-up example (1) How many hidden layers would you use? How many hidden units per layer? How many connections would your net have? How would you select the initial weights of the connections? When would you stop the iterations of the error back propagation algorithm? Having the well-known XOR problem and a NN for its approximation, answer the following questions:

Upload: avian

Post on 31-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Warm-up example (1). How many hidden layers would you use? How many hidden units per layer? How many connections would your net have? How would you select the initial weights of the connections? When would you stop the iterations of the error back propagation algorithm?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 1

Warm-up example (1)

• How many hidden layers would you use?

• How many hidden units per layer?

• How many connections would your net have?

• How would you select the initial weights of the connections?

• When would you stop the iterations of the error back propagation algorithm?

Having the well-known XOR problem and a NN for its approximation, answer the following questions:

Page 2: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 2

Warm-up example (2)

• If the updated weights after an iteration of the error back propagation procedure are almost identical to the weights before that iteration but the output is not the desired one?

• If the number of iterations exceeds a pre-defined threshold?

• If the output error seems to be increasing instead of decreasing?

What would you do if the trained neural net does not generate the desired outputs and behaves as follows:

Page 3: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 3

Item-by-item learning (sequential)

for epoch = 1:num_epochs

for t = 1:numSamples

% forward pass% backward

pass

endend

for epoch = 1:num_epochs

% shuffle training data

for t = 1:numSamples

% forward pass% backward pass

endend

perm = randperm( numSamples );

x = x( perm );

d = d( perm );

Page 4: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 4

Batch learningbs = # % batch size for epoch = 1:num_epochs for s = 1:bs:numSamples % zero in_batch sums here for b = 1:bs t = s + b - 1 % forward pass % backward pass

% update in_batch sums based on BP (deltas) end

% update weights and biases here Wi = Wi - LR * (sumWi / bs); % etc. end

Page 5: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 5

Generalization

• Overfitting, network pruning

(c) The MathWorks (Matlab help)

Page 6: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 6

Strategies

• Regularization° 1) "trainbr"

» The Bias/Variance Dilemma

° 2) Specific adjustment of weights» many techniques suggested, e.g.

net.performFcn=’msreg’ + corresponding parameters

MSE_REG = A * MSE + (1-A) * MSW

MSW = 1/N [SUM (W^2)]

» Decreases weights and biases. 

• Early stopping° 3 sets (training, validation, testing; 40:30:30)

Page 7: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 7

Early stopping

• After some training, calculate the validation error° synaptic weights fixed

• Continue either with training or testing

Number of epoch

MSEValidationsample

Trainingsample

Early stopping point

Page 8: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 8

Bayesian regularization

(c) The MathWorks (Matlab help)

Page 9: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 9

Early stopping

(c) The MathWorks (Matlab help)

Page 10: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 10

Matlab example 1/4The goal is to can determine serum cholesterol levels from measurements of spectral content of a blood sample. There are 264 patients for which we have measurements of 21 wavelengths of the spectrum. For the same patients we also have measurements of hdl, ldl, and vldl cholesterol levels, based on serum separation.

load choles_all[pn,meanp,stdp,tn,meant,stdt] = prestd(p,t);

[ptrans,transMat] = prepca(pn,0.001);[R,Q] = size(ptrans) [R = 4, Q = 264]

iitst = 2:4:Q;iival = 4:4:Q;iitr = [1:4:Q 3:4:Q];val.P = ptrans(:,iival); val.T = tn(:,iival);test.P = ptrans(:,iitst); test.T = tn(:,iitst);ptr = ptrans(:,iitr); ttr = tn(:,iitr);

Page 11: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 11

Matlab example 2/4net = newff(minmax(ptr),[5 3],{'tansig' 'purelin'},'trainlm');[net,tr]=train(net,ptr,ttr,[],[],val,test);TRAINLM, Epoch 0/100, MSE 3.11023/0, Gradient 804.959/1e-10TRAINLM, Epoch 15/100, MSE 0.330295/0, Gradient 104.219/1e-10TRAINLM, Validation stop.

plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf)legend('Training','Validation','Test',-1);ylabel('Squared Error'); xlabel('Epoch')

an = sim(net,ptrans);a = poststd(an,meant,stdt);for i=1:3 figure(i) [m(i),b(i),r(i)] = postreg(a(i,:),t(i,:));end

Page 12: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 12

Matlab example 3/4

(c) The MathWorks (Matlab help)

Page 13: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 13

Matlab example 4/4

hdl, R=0.886 ldl, R=0.862

vldl, R=0.563 (c) The MathWorks (Matlab help)

Page 14: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 14

Cover’s separability theorem

• A pattern classification cast in high dimensional space nonlinearly is more likely to be linearly separable than in a low dimension space

X

OO

X

X

XX

X

X O

O

X

X=(x1 , x2 )

OO

O

2) :functions basis of (#

0

],[)(

02211

21

axaxa

xxX

4) :functions basis of (#

0

],,,[)(

0224

2132211

2122

21

axaxaxaxa

xxxxX

5) :functions basis of #(

0

],,,,[)(

0215224

2132211

212122

21

axxaxaxaxaxa

xxxxxxX

Page 15: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 15

Radial Basis Function (RBF) networks

Gaussian basis function, s=0.5, 1.0, 1.5

radbas(n) = exp(-n^2)

Architecture:

Page 16: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 16

Structure of RBF Networks

• Input layer

• Hidden layer ° Hidden units provide a set of basis function

° The higher dimension, the more linearly separable (meaning with the linear combination of basis functions)

• Output layer° Linear combination of hidden functions

Page 17: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 17

XOR example

x1 x2 y

0 00 11 01 1

(x) 2(x) y'

0.13 10.36 0.360.36 0.36

1 0.13

x1

x2

(x)

(x)

?

Page 18: Warm-up example         (1)

(c) The MathWorks (Matlab help)

This makes the trick

Page 19: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 19

RBF, well-estimated

RBF in Matlabnet = newrbe(P,T,SPREAD)net = newrb(P,T,GOAL,SPREAD)

Page 20: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 20

RBF, too few BF

Page 21: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 21

RBF, too small stdev

Page 22: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 22

RBF, too large stdev

Page 23: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 23

NN taxonomy 1/2

1) Paradigm° Supervised

° Unsupervised

2) Learning Rule° Error-correction

° Memory-based

° Hebbian

° Competitive

° Boltzman According to:

Jain,A.K. and Mao,J. (1996). Artificial Neural Networks: A Tutorial, IEEE Computer, vol.29, N: 3, pp.31-44.

Page 24: Warm-up example         (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 24

NN taxonomy 2/2

3) Learning Algorithm° Perceptron

° BP

° Kohonen SOM, ...

4) Network Architecture° FF

° REC

5) Task° Pattern classification

° Time-series modeling, ....