warm-up example (1)

Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 1

Warm-up example (1)

• How many hidden layers would you use?

• How many hidden units per layer?

• How many connections would your net have?

• How would you select the initial weights of the connections?

• When would you stop the iterations of the error back propagation algorithm?

Having the well-known XOR problem and a NN for its approximation, answer the following questions:


Warm-up example (2)

• If the updated weights after an iteration of the error back propagation procedure are almost identical to the weights before that iteration but the output is not the desired one?

• If the number of iterations exceeds a pre-defined threshold?

• If the output error seems to be increasing instead of decreasing?

What would you do if the trained neural net does not generate the desired outputs and behaves as follows:


Item-by-item learning (sequential)

for epoch = 1:num_epochs

for t = 1:numSamples

% forward pass% backward

pass

endend

for epoch = 1:num_epochs

% shuffle training data

for t = 1:numSamples

% forward pass% backward pass

endend

perm = randperm( numSamples );

x = x( perm );

d = d( perm );


Batch learningbs = # % batch size for epoch = 1:num_epochs for s = 1:bs:numSamples % zero in_batch sums here for b = 1:bs t = s + b - 1 % forward pass % backward pass

% update in_batch sums based on BP (deltas) end

% update weights and biases here Wi = Wi - LR * (sumWi / bs); % etc. end


Generalization

• Overfitting, network pruning

(c) The MathWorks (Matlab help)


Strategies

• Regularization° 1) "trainbr"

» The Bias/Variance Dilemma

° 2) Specific adjustment of weights» many techniques suggested, e.g.

net.performFcn=’msreg’ + corresponding parameters

MSE_REG = A * MSE + (1-A) * MSW

MSW = 1/N [SUM (W^2)]

» Decreases weights and biases.

• Early stopping° 3 sets (training, validation, testing; 40:30:30)


Early stopping

• After some training, calculate the validation error° synaptic weights fixed

• Continue either with training or testing

Number of epoch

MSEValidationsample

Trainingsample

Early stopping point


Bayesian regularization



Early stopping



Matlab example 1/4The goal is to can determine serum cholesterol levels from measurements of spectral content of a blood sample. There are 264 patients for which we have measurements of 21 wavelengths of the spectrum. For the same patients we also have measurements of hdl, ldl, and vldl cholesterol levels, based on serum separation.

load choles_all[pn,meanp,stdp,tn,meant,stdt] = prestd(p,t);

[ptrans,transMat] = prepca(pn,0.001);[R,Q] = size(ptrans) [R = 4, Q = 264]

iitst = 2:4:Q;iival = 4:4:Q;iitr = [1:4:Q 3:4:Q];val.P = ptrans(:,iival); val.T = tn(:,iival);test.P = ptrans(:,iitst); test.T = tn(:,iitst);ptr = ptrans(:,iitr); ttr = tn(:,iitr);


Matlab example 2/4net = newff(minmax(ptr),[5 3],{'tansig' 'purelin'},'trainlm');[net,tr]=train(net,ptr,ttr,[],[],val,test);TRAINLM, Epoch 0/100, MSE 3.11023/0, Gradient 804.959/1e-10TRAINLM, Epoch 15/100, MSE 0.330295/0, Gradient 104.219/1e-10TRAINLM, Validation stop.

plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf)legend('Training','Validation','Test',-1);ylabel('Squared Error'); xlabel('Epoch')

an = sim(net,ptrans);a = poststd(an,meant,stdt);for i=1:3 figure(i) [m(i),b(i),r(i)] = postreg(a(i,:),t(i,:));end


Matlab example 3/4



Matlab example 4/4

hdl, R=0.886 ldl, R=0.862

vldl, R=0.563 (c) The MathWorks (Matlab help)


Cover’s separability theorem

• A pattern classification cast in high dimensional space nonlinearly is more likely to be linearly separable than in a low dimension space

X

OO

X

X

XX

X

X O

O

X

X=(x1 , x2 )

OO

O

2) :functions basis of (#

0

],[)(

02211

21

axaxa

xxX

4) :functions basis of (#

0

],,,[)(

0224

2132211

2122

21

axaxaxaxa

xxxxX

5) :functions basis of #(

0

],,,,[)(

0215224

2132211

212122

21

axxaxaxaxaxa

xxxxxxX


Radial Basis Function (RBF) networks

Gaussian basis function, s=0.5, 1.0, 1.5

radbas(n) = exp(-n^2)

Architecture:


Structure of RBF Networks

• Input layer

• Hidden layer ° Hidden units provide a set of basis function

° The higher dimension, the more linearly separable (meaning with the linear combination of basis functions)

• Output layer° Linear combination of hidden functions


XOR example

x1 x2 y

0 00 11 01 1

(x) 2(x) y'

0.13 10.36 0.360.36 0.36

1 0.13

x1

x2

(x)

(x)

?


This makes the trick


RBF, well-estimated

RBF in Matlabnet = newrbe(P,T,SPREAD)net = newrb(P,T,GOAL,SPREAD)


RBF, too few BF


RBF, too small stdev


RBF, too large stdev


NN taxonomy 1/2

1) Paradigm° Supervised

° Unsupervised

2) Learning Rule° Error-correction

° Memory-based

° Hebbian

° Competitive

° Boltzman According to:

Jain,A.K. and Mao,J. (1996). Artificial Neural Networks: A Tutorial, IEEE Computer, vol.29, N: 3, pp.31-44.


NN taxonomy 2/2

3) Learning Algorithm° Perceptron

° BP

° Kohonen SOM, ...

4) Network Architecture° FF

° REC

5) Task° Pattern classification

° Time-series modeling, ....

warm-up example (1)

Documents