extreme learning machine:theory and applications

Extreme learning machine:Theory and applicationsG.-B. Huang, Q.-Y. Zhu, and C.-K. SiewNeurocomputing, 2006

Presenter: James Chou2012/03/15

Outline

Introduction Single-hidden layer feed-forward neural networks Neural Network Mathematical Model Back Propagation algorithm ELM Mathematical Model Performance Evaluation Conclusion

2

Introduction

For past decades, gradient descent based methods have mainly been used in many learning algorithms of feed-forward neural networks.

Traditionally, all the parameters of the feed-forward neural networks need to tune iterative and need a very long time to learn.

When the input weights and the hidden layer biases are randomly assigned, SLFNs (single-hidden layer feed-forward neural networks) can be simply considered as a linear system and the output weights (linking the hidden layer to the output layer) can be computed through simple generalized inverse operation.

3

Introduction (Cont.)

Based on this idea, this paper proposes a simple learning algorithm for SLFNs called extreme learning.

Different from traditional learning algorithms the extreme learning algorithm not only provide the smaller training error but also the better performance.

4

Single-hidden layer feed-forwardneural networks

F(． ) is activation function Hard Limiter function

Sigmoid function

5

)(1

N

iii xFOutput

xwhen

xwhenxf

,0

,1)(

xexf

1

1)(

θ is the threshold

Single-hidden layer feed-forwardneural networks (Cont.)

6

G() is activation functionL is number of hidden layer nodes

Neural Network Mathematical Model

7

Neural Network Mathematical Model (Cont.)

8

If ε = 0 , meanFL(x) = f(x) = T , T is known targetand Cost function = 0

Neural Network Mathematical Model (Cont.)

Mathematical Model is βH = T From linear algebra view point If hidden layer have 20 nodes and total 1000

training datas.β How to calculate the big size inverse matrix is a

traditional issue.We try to calculate the inverse matrix of 5000*50,

but the PC crashes.

9

Back Propagation algorithm

BP algorithm is the classic gradient base algorithm to find the best weight vectors and minimize the cost function.

10

η is Leaming Rate

Demo BP algorithm!

ELM Mathematical Model

H+ is the Moore-Penrose generalized inverse of hidden layer output matrix H.

H+ = (HTH)-1HT

11

ELM Mathematical Model (Cont.)

Moore-Penrose generalized inverse matrixThe application of linear algebra theorem.For a general linear system Ax = y, we say that is a

least-squares solution (l.s.s) if There · mean a norm in Euclidean space and ∥ ∥

A R∈ mxn and Y R∈ m.The resolution of a general linear system Ax=y,

where A may be singular and may even not be square, can be made very simple by the use of the Moore–Penrose generalized inverse.

12

ELM Mathematical Model (Cont.)

Mathematical Model is βH = T We can rewritten the formula as

β = H+T = (HTH)-1HTT If hidden layer have 20 nodes and total 1000

training datas.

13

Performance Evaluation

Regression of SinC Function15

Regression of SinC Function (Cont.)

100000 training data with 5-20% noise. 100000 testing data is noise free. The result of training 50 times in the

following table.

16

Noise TrainingTime_AVG(sec)

TrainingRMS_AVG TestingRMS_AVG

5% 0.6462 0.0113 2.201e-04=0.00022

10% 0.6306 0.0224 2.753e-04=0.00027

15% 0.6427 0.0334 8.336e-04=0.00083

20% 0.6452 0.0449 11.541e-04=0.00115

Demo ELM!

Real-World Regression Problems

17

Real-World Regression Problems (Cont.)

18


19


20

Real-World Very Large Complex Applications

21

Real Medical Diagnosis Application: Diabetes

22

Protein Sequence Classification23

Conclusion

Advantages ELM needs less training time compared to popular BP and

SVM/SVR. The prediction performance of ELM is usually a little better

than BP and close to SVM/SVR in many applications. Only need to turn the parameter L (hidden layer nodes). Nonlinear activation function still can work in ELM.

Disadvantages How to find the optimal soluction? Local minima issue. Easy Overfitting.

24

extreme learning machine:theory and applications

Technology