introduction to artificial neural networksweb.cecs.pdx.edu › ... › biometrics ›...

27
1 Michel Verleysen Introduction - 1 Introduction to Artificial Neural Networks v 2.3 – August 2002 Michel Verleysen Introduction - 2 Introduction to Artificial Neural Networks Why ANNs ? Biological inspiration Some examples of problems Historical background Neural computation feature extraction classification – regression – adaptive filtering – projection principle of learning (non-linearity, error criterion) difficulties: learning and generalization, the curse of dimensionality, overfitting and model complexity NN structure and learning rule How to use ANN’s?

Upload: others

Post on 08-Jun-2020

31 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

1

Michel Verleysen Introduction - 1

Introduction to Artificial Neural Networks

v 2.3 – August 2002

Michel Verleysen Introduction - 2

Introduction to Artificial Neural Networks

p Why ANNs ?p Biological inspirationp Some examples of problemsp Historical backgroundp Neural computationp feature extractionp classification – regression – adaptive filtering – projection

p principle of learning (non-linearity, error criterion)p difficulties: learning and generalization, the curse of

dimensionality, overfitting and model complexityp NN structure and learning rulep How to use ANN’s?

Page 2: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

2

Michel Verleysen Introduction - 3

Why Anns ?

Von Neumann’s computer (Human) braindeterminism fuzzy behavioursequence of instructions parallelismhigh speed slow speedrepetitive tasks adaptation to situationsprogramming learninguniqueness of solutions different solutionsex: matrix product ex: face recognition

Von Neumann’s computer (Human) braindeterminism fuzzy behavioursequence of instructions parallelismhigh speed slow speedrepetitive tasks adaptation to situationsprogramming learninguniqueness of solutions different solutionsex: matrix product ex: face recognition

Michel Verleysen Introduction - 4

Brain “Properties”

p Robust (cells die!) and fault-tolerantp Flexible (does not need C or java…)p Information is fuzzy, probabilistic, noisy, or inconsistentp Highly parallelp Learningp Slowp Small, compact, dissipates little powerp Auto-organizationp Non-unique behavior

Page 3: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

3

Michel Verleysen Introduction - 5

Neurosciences - Neuroengineering

From “Handbook of neural computation”, E. Fiesler, R. Beale eds., IOP and Oxford University Press, 1996, p. A2.2:3

Michel Verleysen Introduction - 6

Some examples of problems

From “DARPA Neural Network Study”, 1988

Page 4: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

4

Michel Verleysen Introduction - 7

Examples of tasks

p face recognitionp time-series predictionp process identificationp process controlp optical character recognitionp adaptive filteringp etc.

Michel Verleysen Introduction - 8

Historical Background

1940 – 1965 Hebb biological learning ruleMcCulloch & Pitts binary decision unitsRosenblatt Perceptron - learning

1969 Minsky & Papert limits to Perceptrons

1974 Werbos back-propagation

1980 – Hopfield feedback networksParker, LeCun, Rumelhart,McClelland

back-propagation

Kohonen Self-Organizing Maps

1940 – 1965 Hebb biological learning ruleMcCulloch & Pitts binary decision unitsRosenblatt Perceptron - learning

1969 Minsky & Papert limits to Perceptrons

1974 Werbos back-propagation

1980 – Hopfield feedback networksParker, LeCun, Rumelhart,McClelland

back-propagation

Kohonen Self-Organizing Maps

Page 5: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

5

Michel Verleysen Introduction - 9

Biological Neuron - Hebb’s Rule

p ‘Basic’ scheme

p Hebbs’ rule: how coupling between neurons (synaptic values) influences and is influenced by the operation performed by a group of neurons

Michel Verleysen Introduction - 10

Hebb’s rule

p “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”

weight

no change

p Symmetrical Hebb’s rule for artificial networks:p to avoid saturation

p because 1 and 0 are conventional

weight

weight

10

Page 6: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

6

Michel Verleysen Introduction - 11

Similarity levels

p “elements” levelp neurons sums & non-linear functionsp synapses multiplicationp other elements not useful for computation

p functional blocks levelp visual cortex pattern recognitionp sensory-motor cortex controlp…

p structure levelp distributed informationp ...

Michel Verleysen Introduction - 12

Artificial Neural Networks

p Artificial neural networks ARE NOT :p an attempt to understand biological systemsp an attempt to reproduce the behavior of a biological system

p Artificial neural networks ARE :p a set of tools aimed to solve “perceptive” problems (“perceptive”

<-> “rule-based”)

At least in the framework of this lecture, i.e. “Neural Computation”…

Page 7: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

7

Michel Verleysen Introduction - 13

Why “Artificial neural networks” ?

p Structure: many computation units in parallel(McCulloch & Pitts 1943)

p Learning rule (adaptation of parameters):similar to Hebb’s rule (1949)

inputs

outputsAnd that’s all...

Michel Verleysen Introduction - 14

Learning

p Give - many - examples (input-output pairs, or training samples)

p Compute the parameters to fit the examplesp Test the result!

inputs

outputs

Page 8: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

8

Michel Verleysen Introduction - 15

Why Neural Computation ?

p Hypothetical problem of optical character recognition (OCR)

p If: - image 256 x 256 pixels- 8-bits pixel values (256 gray levels)

then: - 10158000 different images

p Necessity to work with features !

Michel Verleysen Introduction - 16

Feature Extraction

p Example of feature in OCR: ratio height/width (x1) of the character

p Histogram of feature value

x1

“a” class“b” class

Page 9: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

9

Michel Verleysen Introduction - 17

Multi-dimensional Features

p Necessity to classify according to several, but not too muchfeatures

p Several: because of the overlap in histogram with 1 featurep Not too much: because classification methods are easier in small

dimensional spaces

x1

x2 C1

C2

Michel Verleysen Introduction - 18

What can be done with ANNs ?

p regression

p classification

p adaptive filtering

p projection0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5

Page 10: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

10

Michel Verleysen Introduction - 19

What can be done with ANNs ?

p regression

p classification

p adaptive filtering

p projection 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 1 2 3 4

Michel Verleysen Introduction - 20

Classification - Regression

regressionfeaturesOutput value(continuous variable)

x1

x2

y

classificationfeaturesClass label(discrete variable)

x1

x2 C1

C2

Page 11: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

11

Michel Verleysen Introduction - 21

What can be done with ANNs ?

p regression

p classification

p adaptive filtering

p projection

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.5 1 1.5 2 2.5 3 3.5

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5

Michel Verleysen Introduction - 22

What can be done with ANNs ?

p regression

p classification

p adaptive filtering

p projection

Page 12: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

12

Michel Verleysen Introduction - 23

Principle

Non-linear parametricregression model

outputs

_

inputs

desired outputs(error criteria)

Parameteradaptation

Michel Verleysen Introduction - 24

Principle

p Model: restrictions about the possible relation

Non-linear parametricregression model

outputs

_

inputs

desired outputs(error criteria)

Parameteradaptation

Page 13: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

13

Michel Verleysen Introduction - 25

Principle

p Parametric: learning parameters to adjust(contain the information)

Non-linear parametricregression model

outputs

_

inputs

desired outputs(error criteria)

Parameteradaptation

Michel Verleysen Introduction - 26

Principle

p Non-linear: more powerful than linear

Non-linear parametricregression model

outputs

_

inputs

desired outputs(error criteria)

Parameteradaptation

Page 14: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

14

Michel Verleysen Introduction - 27

Principle

Universal approximator !

Non-linear parametricregression model

outputs

_

inputs

desired outputs(error criteria)

Parameteradaptation

Michel Verleysen Introduction - 28

Why non-linear ?

p If world was linear:

Push on a flexible bar, and it will shrink…

(one equilibrium point,stability, linearity, etc.)

Page 15: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

15

Michel Verleysen Introduction - 29

Why non-linear ?

p But world is non-linear:

Push on a flexible bar, and it will bend, to the left or to the right !

(unstable equilibrium point,bifurcation, non-linearity, etc.)

Michel Verleysen Introduction - 30

Error criterion

p regressionp classificationp adaptive filtering

p projection

outputs

_

inputs

desiredoutputs

0

1

2

3

4

5

6

7

8

9

0 0.5 1 1.5 2 2.5 3 3.5

∑∂i

i2

Page 16: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

16

Michel Verleysen Introduction - 31

Error criterion

p regressionp classificationp adaptive filtering

p projection

outputs

_

inputs

desiredoutputs

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 1 2 3 4

Wrong classifications

Michel Verleysen Introduction - 32

Error criterion

p regressionp classificationp adaptive filtering

p projection

outputs

_

inputs

desiredoutputs

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14

Independencebetweenoutputs

Page 17: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

17

Michel Verleysen Introduction - 33

Error criterion

p regressionp classificationp adaptive filtering

p projection

outputs

_

inputs

desiredoutputs

Independencebetweenoutputs

Michel Verleysen Introduction - 34

Polynomial Curve Fitting

p Strong similarity between polynomial curve fitting and regression with neural networks

p polynoms

p neural networks

p Error criterion

p Polynoms: E is quadratic in w min(E) is the solution of a setof linear equations

Neural networks: E is non-linear in w local minima

∑=

=++++=M

j

jj

MM xwxwxwxwwy

0

2210 L

( )∑=

−=P

p

pp tyE1

2

( )xw ,fy =

desired output(target)

Page 18: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

18

Michel Verleysen Introduction - 35

Learning - generalization

p Learning: process of p finding parameters w whichpminimize the error function Ep given a set of input-output pairs (xp,tp)

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5

x

t

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5

x

t

w

Michel Verleysen Introduction - 36

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5

x

t

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5

x

t

Learning - generalization

p generalization: process ofp assigning an output value y top a new input xp given the parameters vector w

w

Page 19: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

19

Michel Verleysen Introduction - 37

Overfitting

p Compromise betweenpmodel complexity andpnumber of learning pairs

p Neural networks are puniversal approximation models withpa (relatively) low complexity

p But overfitting must be a serious concern !

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5

x

y

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5

x

y

Michel Verleysen Introduction - 38

Overfitting in classification

x1

x2 C1

C2

x1

x2 C1

C2

x1

x2 C1

C2

Page 20: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

20

Michel Verleysen Introduction - 39

Model complexity

p Compromise between performance and complexity:difficult to evaluate a priori

p Alternative: to control the effective complexity:

p new error function

p penalty term (example)

p But:pmany possible forms of penalty terms

p λ difficult to choose/adjust

Ωλ+= EE~

dxdx

yd2

2

2

21

Michel Verleysen Introduction - 40

The Curse of Dimensionality

p Histograms: the number of boxes is exponential with the dimension

p More features reduced performancesp If # data is limited dimension must be lowp Concept of intrinsic dimensionality of data

Page 21: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

21

Michel Verleysen Introduction - 41

Supervised - unsupervised learning

p Supervised learning: building an input-output relation known through examples (input-output pairs)

p Unsupervised learning: modeling a property of the input data (for example density)

p Reinforcement learning: outputs are good/bad (but how muchgood or bad is not know!)

Neuralnetwork

Observedphenomenon

output

Desiredoutputs

-

Neuralnetwork

Observedphenomenon

output

Michel Verleysen Introduction - 42

What is a neural network ?

p Model = structure + learning rule

p structure

p learning ruleHow to compute (estimate) the parameters of the network ?

p Several learning rules for one structurep Similar learning rules for several structures

Page 22: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

22

Michel Verleysen Introduction - 43

How to use ANNs

1) examine problem

Michel Verleysen Introduction - 44

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis

Page 23: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

23

Michel Verleysen Introduction - 45

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning

Michel Verleysen Introduction - 46

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning4) understand data

Page 24: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

24

Michel Verleysen Introduction - 47

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning4) understand data5) choose an ANN model

Michel Verleysen Introduction - 48

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning4) understand data5) choose an ANN model6) learn

Page 25: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

25

Michel Verleysen Introduction - 49

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning4) understand data5) choose an ANN model6) learn7) test the results

+

Michel Verleysen Introduction - 50

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning4) understand data5) choose an ANN model6) learn7) test the results

unsuccessful :

Page 26: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

26

Michel Verleysen Introduction - 51

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning4) understand data5) choose an ANN model6) learn7) test the results

successful !

Michel Verleysen Introduction - 52

How to use ANNs

1) examine problem2) formulate problem in terms of data analysis3) collect -many- data for learning4) understand data5) choose an ANN model6) learn7) test the results

successful !

Page 27: Introduction to Artificial Neural Networksweb.cecs.pdx.edu › ... › Biometrics › NN-introduction.pdf · Michel Verleysen Introduction - 12 Artificial Neural Networks p Artificial

27

Michel Verleysen Introduction - 53

Sources and References

p Some ideas developed in these slides come from:p Handbook of neural computation, E. Fiesler, R. Beale eds., IOP

and oxford university press, 1996p Neural networks for pattern recognition, C. Bishop, Clarendon

press, Oxford, 1995

p Other references:p Les Réseaux de Neurones Artificiels, F. Blayo, M. Verleysen,

Presses Universitaires de France, collection "Que sais-je?", Paris (1996), 126 pages. Short introduction in French, not sufficient, but easy to read.

p Pattern classification, R.O. Duda, P.E. Hart, D.G. Strok, second edition, Wiley, 2000 (a good book on pattern classification, including, but not limited to, neural networks).