classification iii

33
LOGO Classification III Lecturer: Dr. Bo Yuan E-mail: [email protected]

Upload: macha

Post on 23-Feb-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Classification III. Lecturer: Dr. Bo Yuan E-mail: [email protected]. Overview. Artificial Neural Networks. Biological Motivation. 10 11 : The number of neurons in the human brain 10 4 : The average number of connections of each neuron - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classification III

LOGO

Classification III

Lecturer Dr Bo Yuan

E-mail yuanbsztsinghuaeducn

Overview

Artificial Neural Networks

2

Biological Motivation

3

1011 The number of neurons in the human brain

104 The average number of connections of each neuron

10-3 The fastest switching times of neurons

10-10 The switching speeds of computers

10-1 The time required to visually recognize your mother

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A Using ANN to study and model biological learning processes

Group B Obtaining highly effective machine learning algorithms regardless of how closely

these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifon

iii

0

01)( 110

1 otherwisexwxwwif

xxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 2: Classification III

Overview

Artificial Neural Networks

2

Biological Motivation

3

1011 The number of neurons in the human brain

104 The average number of connections of each neuron

10-3 The fastest switching times of neurons

10-10 The switching speeds of computers

10-1 The time required to visually recognize your mother

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A Using ANN to study and model biological learning processes

Group B Obtaining highly effective machine learning algorithms regardless of how closely

these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifon

iii

0

01)( 110

1 otherwisexwxwwif

xxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 3: Classification III

Biological Motivation

3

1011 The number of neurons in the human brain

104 The average number of connections of each neuron

10-3 The fastest switching times of neurons

10-10 The switching speeds of computers

10-1 The time required to visually recognize your mother

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A Using ANN to study and model biological learning processes

Group B Obtaining highly effective machine learning algorithms regardless of how closely

these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifon

iii

0

01)( 110

1 otherwisexwxwwif

xxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 4: Classification III

Biological Motivation

The power of parallelism

The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons

The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations

Sequential machines vs Parallel machines

Group A Using ANN to study and model biological learning processes

Group B Obtaining highly effective machine learning algorithms regardless of how closely

these algorithms mimic biological processes4

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifon

iii

0

01)( 110

1 otherwisexwxwwif

xxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 5: Classification III

Neural Network Representations

5

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifon

iii

0

01)( 110

1 otherwisexwxwwif

xxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 6: Classification III

Robot vs Human

6

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifon

iii

0

01)( 110

1 otherwisexwxwwif

xxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 7: Classification III

Perceptrons

7

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xw

0

0

010

otherwise

xwifon

iii

0

01)( 110

1 otherwisexwxwwif

xxo nnn

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 8: Classification III

Power of Perceptrons

8

-08

05 05

-03

05 05

Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1

Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1

AND OR

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 9: Classification III

Error Surface

9

Error

w1

w2

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 10: Classification III

Gradient Descent

10

2)(21)(

Dd

dd otwE

nwE

wE

wEwE )(

10

iiiii w

Ewwherewww

Learning Rate

Batch Learning

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 11: Classification III

Delta Rule

11

)()(

)()(

)()(221

)(21

)(21

2

2

idDd

dd

ddiDd

dd

ddiDd

dd

Dddd

i

ddDdii

xot

xwtw

ot

otw

ot

otw

otww

E

iddDd

di xotw )(

xwxo )(

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 12: Classification III

Batch Learning

GRADIENT_DESCENT (training_examples η)

Initialize each wi to some small random value

Until the termination condition is met Do

Initialize each Δwi to zero

For each ltx tgt in training_examples Do

bull Input the instance x to the unit and compute the output o

bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi

For each linear unit weight wi Dobull wi larr wi + Δwi

12

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 13: Classification III

Stochastic Learning

13

iiiii xotwwherewww )(

For example if xi=08 η=01 t=1 and o=0

Δwi = η(t-o)xi = 01times(1-0) times 08 = 008

+

-

++

+

--

-+

+

-

-

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 14: Classification III

Stochastic Learning NAND

14

InputTarget

InitialWeights

OutputError Correction Final

WeightsIndividual Sum Final Output

x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2

x0 w0

x1 w1

x2 w2

C0+C1+C2

t-o LR x E

1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01

1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 15: Classification III

Multilayer Perceptron

15

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 16: Classification III

XOR

16

-

-

))(( pqqpqpqpqp

Input Output0 0 00 1 11 0 11 1 0

Cannot be separated by a single line

+

+p

q

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 17: Classification III

XOR

17

)()( qpqpqp

OR NAND

AND

OR

NAND-

+

+

-

p

q

p q

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 18: Classification III

Hidden Layer Representations

18

p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0

Input Hidden Output

- + +

-

AND

OR

NA

ND

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 19: Classification III

The Sigmoid Threshold Unit

19

sum

x1

x2

xn

w1

w2

wn

w0

x0=1

n

iii xwnet

0

neteneto

1

1)(

))(1()()(1

1)( yydyyd

ey

FunctionSigmoid

y

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 20: Classification III

Backpropagation Rule

20

ji

dji w

Ew

outputsk

kkd otwE 2)(21)(

bull xji = the i th input to unit j

bull wji = the weight associated with the i th input to unit j

bull netj = sumwjixji (the weighted sum of inputs for unit j )

bull oj= the output of unit j

bull tj= the target output of unit j

bull σ = the sigmoid function

bull outputs = the set of units in the final layer

bull Downstream (j ) = the set of units directly taking the output of unit j as inputs

jij

d

ji

j

j

d

ji

d xnetE

wnet

netE

wE

j

i

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 21: Classification III

Training Rule for Output Units

21

j

j

j

d

j

d

neto

oE

netE

outputskkk

jj

d otoo

E 2)(21

)(

)()(2

21

)(21 2

jj

j

jjjj

jjjj

d

ot

oot

ot

otoo

E

)1()(

jjj

j

j

j oonetnet

neto

)1()( jjjjj

d oootnetE

jijjjjji

dji xooot

wEw )1()(

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 22: Classification III

Training Rule for Hidden Units

22

)()(

)( )(

)1( jDownstreamk

jjkjkj

j

jDownstreamkkjk

jDownstreamk j

j

jDownstreamk j

kk

j

k

k

d

j

d

oowneto

w

neto

onet

netnet

netE

netE

)(

)1(jDownstreamkkjkjjj woo

jijji xw

k

dk net

E

k

j

jnet

k

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 23: Classification III

BP Framework

BACKPROPAGATION (training_examples η nin nout nhidden)

Create a network with nin inputs nhidden hidden units and nout output units

Initialize all network weights to small random numbers

Until the termination condition is met Do

For each ltx tgt in training_examples Do

bull Input the instance x to the network and computer the output o of every unit

bull For each output unit k calculate its error term δk

bull For each hidden unit h calculate its error term δh

bull Update each network weight wji 23

))(1( kkkkk otoo

outputsk

kkhhhh woo )1(

jijjijijiji xwwww

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 24: Classification III

More about BP Networks hellip

Convergence and Local Minima The search space is likely to be highly multimodal

May easily get stuck at a local solution

Need multiple trials with different initial weights

Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)

Usually better accuracy

Can do some advanced training (eg structure + parameter)

Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE

pp 1423-1447

Representational Power

Deep Learning24

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 25: Classification III

More about BP Networks hellip

Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary

Practical Considerations Momentum Adaptive learning rate

bull Small slow convergence easy to get stuckbull Large fast convergence unstable

25

)1()( nwxnw jijijji

Time

Error

Training

Validation

Weight

Error

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 26: Classification III

Beyond BP Networks

26Elman Network

XOR

0 1 1 0 0 0 1 1 0 1 0 1 hellip

1 0 0 1 hellip

In

Out

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 27: Classification III

Beyond BP Networks

27

Hopfield Network Energy Landscape of Hopfield Network

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 28: Classification III

Beyond BP Networks

28

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 29: Classification III

When does ANN work

Instances are represented by attribute-value pairs Input values can be any real values

The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes

The training samples may contain errors

Long training times are acceptable Can range from a few seconds to several hours

Fast evaluation of the learned target function may be required

The ability to understand the learned function is not important Weights are difficult for humans to interpret

29

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 30: Classification III

Reading Materials

Text Book

Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml

Online Demo

httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml

Online Tutorial

httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml

Wikipedia amp Google

30

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 31: Classification III

Review

What is the biological motivation of ANN

When does ANN work

What is a perceptron

How to train a perceptron

What is the limitation of perceptrons

How does ANN solve non-linearly separable problems

What is the key idea of Backpropogation algorithm

What are the main issues of BP networks

What are the examples of other types of ANN31

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 32: Classification III

Next Weekrsquos Class Talk

Volunteers are required for next weekrsquos class talk

Topic 1 Applications of ANN

Topic 2 Recurrent Neural Networks

Hints Robot Driving Character Recognition Face Recognition Hopfield Network

Length 20 minutes plus question time

32

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment
Page 33: Classification III

Assignment

Topic Training Feedforward Neural Networks

Technique BP Algorithm

Task 1 XOR Problem 4 input samples

bull 0 0 0bull 1 0 1

Task 2 Identity Function 8 input samples

bull 10000000 10000000bull 00010000 00010000bull hellip

Use 3 hidden units

Deliverables Report Code (any programming language with detailed comments)

Due Sunday 14 December

Credit 15 33

  • Classification III
  • Overview
  • Biological Motivation
  • Biological Motivation (2)
  • Neural Network Representations
  • Robot vs Human
  • Perceptrons
  • Power of Perceptrons
  • Error Surface
  • Gradient Descent
  • Delta Rule
  • Batch Learning
  • Stochastic Learning
  • Stochastic Learning NAND
  • Multilayer Perceptron
  • XOR
  • XOR (2)
  • Hidden Layer Representations
  • The Sigmoid Threshold Unit
  • Backpropagation Rule
  • Training Rule for Output Units
  • Training Rule for Hidden Units
  • BP Framework
  • More about BP Networks hellip
  • More about BP Networks hellip (2)
  • Beyond BP Networks
  • Beyond BP Networks (2)
  • Beyond BP Networks (3)
  • When does ANN work
  • Reading Materials
  • Review
  • Next Weekrsquos Class Talk
  • Assignment