extreme learning machine:theory and applications
DESCRIPTION
20120315TRANSCRIPT
Extreme learning machine:Theory and applicationsG.-B. Huang, Q.-Y. Zhu, and C.-K. SiewNeurocomputing, 2006
Presenter: James Chou2012/03/15
Outline
Introduction Single-hidden layer feed-forward neural networks Neural Network Mathematical Model Back Propagation algorithm ELM Mathematical Model Performance Evaluation Conclusion
2
Introduction
For past decades, gradient descent based methods have mainly been used in many learning algorithms of feed-forward neural networks.
Traditionally, all the parameters of the feed-forward neural networks need to tune iterative and need a very long time to learn.
When the input weights and the hidden layer biases are randomly assigned, SLFNs (single-hidden layer feed-forward neural networks) can be simply considered as a linear system and the output weights (linking the hidden layer to the output layer) can be computed through simple generalized inverse operation.
3
Introduction (Cont.)
Based on this idea, this paper proposes a simple learning algorithm for SLFNs called extreme learning.
Different from traditional learning algorithms the extreme learning algorithm not only provide the smaller training error but also the better performance.
4
Single-hidden layer feed-forwardneural networks
F(. ) is activation function Hard Limiter function
Sigmoid function
5
)(1
N
iii xFOutput
xwhen
xwhenxf
,0
,1)(
xexf
1
1)(
θ is the threshold
Single-hidden layer feed-forwardneural networks (Cont.)
6
G() is activation functionL is number of hidden layer nodes
Neural Network Mathematical Model
7
Neural Network Mathematical Model (Cont.)
8
If ε = 0 , meanFL(x) = f(x) = T , T is known targetand Cost function = 0
Neural Network Mathematical Model (Cont.)
Mathematical Model is βH = T From linear algebra view point If hidden layer have 20 nodes and total 1000
training datas.β How to calculate the big size inverse matrix is a
traditional issue.We try to calculate the inverse matrix of 5000*50,
but the PC crashes.
9
Back Propagation algorithm
BP algorithm is the classic gradient base algorithm to find the best weight vectors and minimize the cost function.
10
η is Leaming Rate
Demo BP algorithm!
ELM Mathematical Model
H+ is the Moore-Penrose generalized inverse of hidden layer output matrix H.
H+ = (HTH)-1HT
11
ELM Mathematical Model (Cont.)
Moore-Penrose generalized inverse matrixThe application of linear algebra theorem.For a general linear system Ax = y, we say that is a
least-squares solution (l.s.s) if There · mean a norm in Euclidean space and ∥ ∥
A R∈ mxn and Y R∈ m.The resolution of a general linear system Ax=y,
where A may be singular and may even not be square, can be made very simple by the use of the Moore–Penrose generalized inverse.
12
ELM Mathematical Model (Cont.)
Mathematical Model is βH = T We can rewritten the formula as
β = H+T = (HTH)-1HTT If hidden layer have 20 nodes and total 1000
training datas.
13
Performance Evaluation
Regression of SinC Function15
Regression of SinC Function (Cont.)
100000 training data with 5-20% noise. 100000 testing data is noise free. The result of training 50 times in the
following table.
16
Noise TrainingTime_AVG(sec)
TrainingRMS_AVG TestingRMS_AVG
5% 0.6462 0.0113 2.201e-04=0.00022
10% 0.6306 0.0224 2.753e-04=0.00027
15% 0.6427 0.0334 8.336e-04=0.00083
20% 0.6452 0.0449 11.541e-04=0.00115
Demo ELM!
Real-World Regression Problems
17
Real-World Regression Problems (Cont.)
18
Real-World Regression Problems (Cont.)
19
Real-World Regression Problems (Cont.)
20
Real-World Very Large Complex Applications
21
Real Medical Diagnosis Application: Diabetes
22
Protein Sequence Classification23
Conclusion
Advantages ELM needs less training time compared to popular BP and
SVM/SVR. The prediction performance of ELM is usually a little better
than BP and close to SVM/SVR in many applications. Only need to turn the parameter L (hidden layer nodes). Nonlinear activation function still can work in ELM.
Disadvantages How to find the optimal soluction? Local minima issue. Easy Overfitting.
24