classification iii
DESCRIPTION
Classification III. Lecturer: Dr. Bo Yuan E-mail: [email protected]. Overview. Artificial Neural Networks. Biological Motivation. 10 11 : The number of neurons in the human brain 10 4 : The average number of connections of each neuron - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/1.jpg)
LOGO
Classification III
Lecturer Dr Bo Yuan
E-mail yuanbsztsinghuaeducn
Overview
Artificial Neural Networks
2
Biological Motivation
3
1011 The number of neurons in the human brain
104 The average number of connections of each neuron
10-3 The fastest switching times of neurons
10-10 The switching speeds of computers
10-1 The time required to visually recognize your mother
Biological Motivation
The power of parallelism
The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons
The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations
Sequential machines vs Parallel machines
Group A Using ANN to study and model biological learning processes
Group B Obtaining highly effective machine learning algorithms regardless of how closely
these algorithms mimic biological processes4
Neural Network Representations
5
Robot vs Human
6
Perceptrons
7
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xw
0
0
010
otherwise
xwifon
iii
0
01)( 110
1 otherwisexwxwwif
xxo nnn
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 2: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/2.jpg)
Overview
Artificial Neural Networks
2
Biological Motivation
3
1011 The number of neurons in the human brain
104 The average number of connections of each neuron
10-3 The fastest switching times of neurons
10-10 The switching speeds of computers
10-1 The time required to visually recognize your mother
Biological Motivation
The power of parallelism
The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons
The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations
Sequential machines vs Parallel machines
Group A Using ANN to study and model biological learning processes
Group B Obtaining highly effective machine learning algorithms regardless of how closely
these algorithms mimic biological processes4
Neural Network Representations
5
Robot vs Human
6
Perceptrons
7
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xw
0
0
010
otherwise
xwifon
iii
0
01)( 110
1 otherwisexwxwwif
xxo nnn
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 3: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/3.jpg)
Biological Motivation
3
1011 The number of neurons in the human brain
104 The average number of connections of each neuron
10-3 The fastest switching times of neurons
10-10 The switching speeds of computers
10-1 The time required to visually recognize your mother
Biological Motivation
The power of parallelism
The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons
The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations
Sequential machines vs Parallel machines
Group A Using ANN to study and model biological learning processes
Group B Obtaining highly effective machine learning algorithms regardless of how closely
these algorithms mimic biological processes4
Neural Network Representations
5
Robot vs Human
6
Perceptrons
7
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xw
0
0
010
otherwise
xwifon
iii
0
01)( 110
1 otherwisexwxwwif
xxo nnn
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 4: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/4.jpg)
Biological Motivation
The power of parallelism
The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons
The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations
Sequential machines vs Parallel machines
Group A Using ANN to study and model biological learning processes
Group B Obtaining highly effective machine learning algorithms regardless of how closely
these algorithms mimic biological processes4
Neural Network Representations
5
Robot vs Human
6
Perceptrons
7
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xw
0
0
010
otherwise
xwifon
iii
0
01)( 110
1 otherwisexwxwwif
xxo nnn
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 5: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/5.jpg)
Neural Network Representations
5
Robot vs Human
6
Perceptrons
7
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xw
0
0
010
otherwise
xwifon
iii
0
01)( 110
1 otherwisexwxwwif
xxo nnn
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 6: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/6.jpg)
Robot vs Human
6
Perceptrons
7
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xw
0
0
010
otherwise
xwifon
iii
0
01)( 110
1 otherwisexwxwwif
xxo nnn
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 7: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/7.jpg)
Perceptrons
7
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xw
0
0
010
otherwise
xwifon
iii
0
01)( 110
1 otherwisexwxwwif
xxo nnn
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 8: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/8.jpg)
Power of Perceptrons
8
-08
05 05
-03
05 05
Input sum Output0 0 -08 00 1 -03 01 0 -03 01 1 03 1
Input sum Output0 0 -03 00 1 02 11 0 02 11 1 07 1
AND OR
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 9: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/9.jpg)
Error Surface
9
Error
w1
w2
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 10: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/10.jpg)
Gradient Descent
10
2)(21)(
Dd
dd otwE
nwE
wE
wEwE )(
10
iiiii w
Ewwherewww
Learning Rate
Batch Learning
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 11: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/11.jpg)
Delta Rule
11
)()(
)()(
)()(221
)(21
)(21
2
2
idDd
dd
ddiDd
dd
ddiDd
dd
Dddd
i
ddDdii
xot
xwtw
ot
otw
ot
otw
otww
E
iddDd
di xotw )(
xwxo )(
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 12: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/12.jpg)
Batch Learning
GRADIENT_DESCENT (training_examples η)
Initialize each wi to some small random value
Until the termination condition is met Do
Initialize each Δwi to zero
For each ltx tgt in training_examples Do
bull Input the instance x to the unit and compute the output o
bull For each linear unit weight wi Dondash Δwi larr Δwi + η(t-o)xi
For each linear unit weight wi Dobull wi larr wi + Δwi
12
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 13: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/13.jpg)
Stochastic Learning
13
iiiii xotwwherewww )(
For example if xi=08 η=01 t=1 and o=0
Δwi = η(t-o)xi = 01times(1-0) times 08 = 008
+
-
++
+
--
-+
+
-
-
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 14: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/14.jpg)
Stochastic Learning NAND
14
InputTarget
InitialWeights
OutputError Correction Final
WeightsIndividual Sum Final Output
x0 x1 x2 t w0 w1 w2 C0 C1 C2 S o E R w0 w1 w2
x0 w0
x1 w1
x2 w2
C0+C1+C2
t-o LR x E
1 0 0 1 0 0 0 0 0 0 0 0 1 +01 01 0 01 0 1 1 01 0 0 01 0 0 01 0 1 +01 02 0 011 1 0 1 02 0 01 02 0 0 02 0 1 +01 03 01 011 1 1 0 03 01 01 03 01 01 05 0 0 0 03 01 011 0 0 1 03 01 01 03 0 0 03 0 1 +01 04 01 011 0 1 1 04 01 01 04 0 01 05 0 1 +01 05 01 021 1 0 1 05 01 02 05 01 0 06 1 0 0 05 01 021 1 1 0 05 01 02 05 01 02 08 1 -1 -01 04 0 011 0 0 1 04 0 01 04 0 0 04 0 1 +01 05 0 01
1 1 0 1 08 -2 -1 08 -2 0 06 1 0 0 08 -2 -1threshold=05 learning rate=01
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 15: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/15.jpg)
Multilayer Perceptron
15
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 16: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/16.jpg)
XOR
16
-
-
))(( pqqpqpqpqp
Input Output0 0 00 1 11 0 11 1 0
Cannot be separated by a single line
+
+p
q
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 17: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/17.jpg)
XOR
17
)()( qpqpqp
OR NAND
AND
OR
NAND-
+
+
-
p
q
p q
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 18: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/18.jpg)
Hidden Layer Representations
18
p q OR NAND AND0 0 0 1 00 1 1 1 11 0 1 1 11 1 1 0 0
Input Hidden Output
- + +
-
AND
OR
NA
ND
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 19: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/19.jpg)
The Sigmoid Threshold Unit
19
sum
x1
x2
xn
w1
w2
wn
w0
x0=1
n
iii xwnet
0
neteneto
1
1)(
))(1()()(1
1)( yydyyd
ey
FunctionSigmoid
y
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 20: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/20.jpg)
Backpropagation Rule
20
ji
dji w
Ew
outputsk
kkd otwE 2)(21)(
bull xji = the i th input to unit j
bull wji = the weight associated with the i th input to unit j
bull netj = sumwjixji (the weighted sum of inputs for unit j )
bull oj= the output of unit j
bull tj= the target output of unit j
bull σ = the sigmoid function
bull outputs = the set of units in the final layer
bull Downstream (j ) = the set of units directly taking the output of unit j as inputs
jij
d
ji
j
j
d
ji
d xnetE
wnet
netE
wE
j
i
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 21: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/21.jpg)
Training Rule for Output Units
21
j
j
j
d
j
d
neto
oE
netE
outputskkk
jj
d otoo
E 2)(21
)(
)()(2
21
)(21 2
jj
j
jjjj
jjjj
d
ot
oot
ot
otoo
E
)1()(
jjj
j
j
j oonetnet
neto
)1()( jjjjj
d oootnetE
jijjjjji
dji xooot
wEw )1()(
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 22: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/22.jpg)
Training Rule for Hidden Units
22
)()(
)( )(
)1( jDownstreamk
jjkjkj
j
jDownstreamkkjk
jDownstreamk j
j
jDownstreamk j
kk
j
k
k
d
j
d
oowneto
w
neto
onet
netnet
netE
netE
)(
)1(jDownstreamkkjkjjj woo
jijji xw
k
dk net
E
k
j
jnet
k
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 23: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/23.jpg)
BP Framework
BACKPROPAGATION (training_examples η nin nout nhidden)
Create a network with nin inputs nhidden hidden units and nout output units
Initialize all network weights to small random numbers
Until the termination condition is met Do
For each ltx tgt in training_examples Do
bull Input the instance x to the network and computer the output o of every unit
bull For each output unit k calculate its error term δk
bull For each hidden unit h calculate its error term δh
bull Update each network weight wji 23
))(1( kkkkk otoo
outputsk
kkhhhh woo )1(
jijjijijiji xwwww
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 24: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/24.jpg)
More about BP Networks hellip
Convergence and Local Minima The search space is likely to be highly multimodal
May easily get stuck at a local solution
Need multiple trials with different initial weights
Evolving Neural Networks Black-box optimization techniques (eg Genetic Algorithms)
Usually better accuracy
Can do some advanced training (eg structure + parameter)
Xin Yao (1999) ldquoEvolving Artificial Neural Networksrdquo Proceedings of the IEEE
pp 1423-1447
Representational Power
Deep Learning24
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 25: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/25.jpg)
More about BP Networks hellip
Overfitting Tend to occur during later iterations Use validation dataset to terminate the training when necessary
Practical Considerations Momentum Adaptive learning rate
bull Small slow convergence easy to get stuckbull Large fast convergence unstable
25
)1()( nwxnw jijijji
Time
Error
Training
Validation
Weight
Error
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 26: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/26.jpg)
Beyond BP Networks
26Elman Network
XOR
0 1 1 0 0 0 1 1 0 1 0 1 hellip
1 0 0 1 hellip
In
Out
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 27: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/27.jpg)
Beyond BP Networks
27
Hopfield Network Energy Landscape of Hopfield Network
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 28: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/28.jpg)
Beyond BP Networks
28
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 29: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/29.jpg)
When does ANN work
Instances are represented by attribute-value pairs Input values can be any real values
The target output may be discrete-valued real-valued or a vector of several real- or discrete-valued attributes
The training samples may contain errors
Long training times are acceptable Can range from a few seconds to several hours
Fast evaluation of the learned target function may be required
The ability to understand the learned function is not important Weights are difficult for humans to interpret
29
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 30: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/30.jpg)
Reading Materials
Text Book
Richard O Duda et al Pattern Classification Chapter 6 John Wiley amp Sons Inc Tom Mitchell Machine Learning Chapter 4 McGraw-Hill httppagemifu-berlinderojasneuralindexhtmlhtml
Online Demo
httpneuronengwayneedusoftwarehtml httpwwwcbuedu~pongaihopfieldhopfieldhtml
Online Tutorial
httpwwwautonlaborgtutorialsneural13pdf httpwwwcscmueduafscscmueduusermitchellftpfaceshtml
Wikipedia amp Google
30
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 31: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/31.jpg)
Review
What is the biological motivation of ANN
When does ANN work
What is a perceptron
How to train a perceptron
What is the limitation of perceptrons
How does ANN solve non-linearly separable problems
What is the key idea of Backpropogation algorithm
What are the main issues of BP networks
What are the examples of other types of ANN31
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 32: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/32.jpg)
Next Weekrsquos Class Talk
Volunteers are required for next weekrsquos class talk
Topic 1 Applications of ANN
Topic 2 Recurrent Neural Networks
Hints Robot Driving Character Recognition Face Recognition Hopfield Network
Length 20 minutes plus question time
32
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-
![Page 33: Classification III](https://reader035.vdocument.in/reader035/viewer/2022062501/56816381550346895dd46674/html5/thumbnails/33.jpg)
Assignment
Topic Training Feedforward Neural Networks
Technique BP Algorithm
Task 1 XOR Problem 4 input samples
bull 0 0 0bull 1 0 1
Task 2 Identity Function 8 input samples
bull 10000000 10000000bull 00010000 00010000bull hellip
Use 3 hidden units
Deliverables Report Code (any programming language with detailed comments)
Due Sunday 14 December
Credit 15 33
- Classification III
- Overview
- Biological Motivation
- Biological Motivation (2)
- Neural Network Representations
- Robot vs Human
- Perceptrons
- Power of Perceptrons
- Error Surface
- Gradient Descent
- Delta Rule
- Batch Learning
- Stochastic Learning
- Stochastic Learning NAND
- Multilayer Perceptron
- XOR
- XOR (2)
- Hidden Layer Representations
- The Sigmoid Threshold Unit
- Backpropagation Rule
- Training Rule for Output Units
- Training Rule for Hidden Units
- BP Framework
- More about BP Networks hellip
- More about BP Networks hellip (2)
- Beyond BP Networks
- Beyond BP Networks (2)
- Beyond BP Networks (3)
- When does ANN work
- Reading Materials
- Review
- Next Weekrsquos Class Talk
- Assignment
-