introduction to artificial neural network - dosen perbanas€¦ · artificial neural network w 1....
Post on 24-May-2020
23 Views
Preview:
TRANSCRIPT
1
Introduction to Artificial Neural Network
- theory, application and practice using WEKA-
Anto Satriyo Nugroho, Dr.Eng Center for Information & Communication Technology,
Agency for the Assessment & Application of Technology (PTIK-BPPT) Email: asnugroho@gmail.com URL: http://asnugroho.net
Agenda
1. Brain, Biological neuron, Artificial Neuron 2. Perceptron 3. Multilayer Perceptron & Backpropagation Algorithm 4. Application of neural network 5. Practice using WEKA 6. Important & useful references
Brain vs Computer Brain Computer
Informa(on Proc. Low speed, fuzzy, parallel
Fast, accurate, sequen(al
Specializa(on Pa=ern recogni(on Numerical computa(on
Informa(on representa(on Analog Digital
Num. of elements 10 billion ~ 106
Speed Slow (103/s) Fast (109/s) Performance improvement Learning SoLware upgrade
Memory Associa(ve (distributed among the synapses)
address
Biological Neural Network
1. Principal of neuron : collection, processing, dissemination of electrical signals
2. Information processing capacity of brain : from network of the neurons
• McCulloch & Pitts (1943)
y = f xi ×wii=1
n
∑#
$%
&
'(
. . .
x1
x2
x3
xn
y
w1
w2
w3
wn
Input signal
Output signal f
w= synapses f = activation function
Mathematical Model of Neuron
• Input signal can be considered as dendrites in the biological neuron
• Output signal can be considered as axon in the biological neuron
Components of a neuron • Synapse • Calculator of weighted input signals • Activation Function
y = f xi ×wii=1
n
∑#
$%
&
'(
Activation Function
1. Threshold function (Heaviside function)
f (v) =1 if v > 0
0 if v ≤ 0
"
#$
%$
1 -1 -0.5 0 0.5 1
• used by McCulloch & Pitts
• all-or-none characteristic
2. Piecewise-linear function
f (v) =
1 v ≥ + 12
v +12> v > − 1
2
−1 v ≤ − 12
$
%
&&&
'
&&&
1 -1 -0.5 0 0.5 1 -1
Activation Function
3. Sigmoid function
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-5 -3 -1 1 3 5
xcexf .1
1)(−+
=
c=4 c=2
c=1
Activation Function
How to calculate neuron’s output (without bias) ?
5.0)5.0(15.00 −=−×+×=v
f (v) =1 if v > 0
0 if v ≤ 0
"
#$
%$
0 1
0.5 -0.5
Heaviside Activation Function
0)( =vf⎥⎦
⎤⎢⎣
⎡=10
x
Input :
-0.7
How to calculate neuron’s output (with bias)?
2.0)7.0())5.0(15.00( =−−−×+×=v
⎥⎦
⎤⎢⎣
⎡=10
x
Input :
f (v) =1 if v > 0
0 if v ≤ 0
"
#$
%$
0 1
0.5 -0.5
Heaviside Activation Function
1)( =vf
Artificial Neural Network
ww
1. Architecture : how the neurons are connected each other 1. Feed-forward network
2. Recurrent networks
2. Learning Algorithm: how the network are trained to fit an input-output mapping/function LMS, Delta rule, Backpropagation, etc.
www
Agenda
1. Brain, Biological neuron, Artificial Neuron 2. Perceptron 3. Multilayer Perceptron & Backpropagation Algorithm 4. Application of Neural Network 5. Practice using WEKA 6. Important & useful references
Perceptron Learning (taking of AND function as example)
1x 2x y0 0 0
0 1 0
1 0 0 1 1 1
21 xxy ∧=
1 0 1
Perceptron Learning (taking of AND function as example)
Training set: 4 examples, each consists of 2 dimensional vector
0 0
( , 0), ( ,0), ( ,0), ( ,1) 0 1
1 0
1 1
teaching signal (desired output)
22 ))((21
21 xhyErrE W−≡=
Output
xInput :
Learn by adjusting weights to reduce error on training set. The square error for an example with input and true output (teaching signal) y is
x
j
n
jjj
j
jj
xingErr
xWgyW
Err
WErrErr
WE
××−=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛−
∂
∂×=
∂
∂×=
∂
∂
∑=
)('0
jjj xingErrWW ×××+← )('α
∑=
=n
jjj xWin
0Simple weight update rule:
Perform optimization search by gradient descent:
Weight Update rule
))(1()(11
1)1(
)()1(
1
)1()1(
1)(
2
2
2
xgxgee
e
ee
ee
ede
xgdxd
x
x
x
x
x
xx
xx
−×=+
×+
=
+=
−×+
−=
++
−=
−
−
−
−
−
−−
−−
xexg
−+=11)(
What if we use Sigmoid function as g ?
Like this !
j
n
jjj
j
jj
xingErr
xWgyW
Err
WErrErr
WE
××−=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛−
∂
∂×=
∂
∂×=
∂
∂
∑=
)('0
jjj xingingErrWW ×−××+← ))(1)((α
∑=
=n
jjj xWin
0Simple weight update rule:
Perform optimization search by gradient descent:
Weight Update rule (using Sigmoid as Activation Function)
)(ing
∑=
←n
jjj exWin
0][
][)(' exingErrWW jjj ×××+← α
Output calculation
)(][ ingeyErr −←
][exInput :
For (e=1;e<n;e++)
Error calculation
Weight update
)(ing
Perceptron Learning Algorithm
1.5 1.0
1.0
AND function using Perceptron
1x 2x y0 0 0
0 1 0
1 0 0 1 1 1
21 xxy ∧=
!"#
≤
>=
0001
)(vifvif
vf
Heaviside Activation Function
0.5 1.0
1.0
OR function using Perceptron
1x 2x y0 0 0
0 1 1
1 0 1 1 1 1
21 xxy ∨=
!"#
≤
>=
0001
)(vifvif
vf
Heaviside Activation Function
iteration
MS
E
Result of XOR output
Problem appears when perceptron is used to learn NON-linear function
-2.5
1x
2x
-2.5
2.5
-5
-5
5
5
5
-5
Non linear mapping can be realized by inserting a hidden layer. But the learning algorithm is not known until 1986.
0 0 0 0 1 1 1 0 1 1 1 0
1x 2x yXOR
Agenda
1. Brain, Biological neuron, Artificial Neuron 2. Perceptron 3. Multilayer Perceptron & Backpropagation Algorithm 4. Application of neural network 5. Practice using WEKA 6. Important & useful references
David E.Rumelhart: A Scientific Biography http://www.cnbc.cmu.edu/derprize/
1986, Chap.8, pp.318-362, Learning Internal Representations by Error propagation
w
Input data
Input layer Output layer w Hidden layer
X
1. Input a datum from training set to the Input Layer, and calculate the output of each neuron in Hidden and Output Layer
Forward pass
Backpropagation Learning
wΔ
w
X
Teaching signal
2. Calculate the Error, that is the difference (Δ) between the output of neuron in output layer
with the desired value (teaching signal)
Input data
Input layer Output layer Hidden layer
Δ
Δ
Backpropagation Learning
ww
2. Calculate the Error, that is the difference (Δ) between the output of neuron in output layer
with the desired value (teaching signal)
Input data : an image of “B”
Input layer Output layer Hidden layer
Backpropagation Learning
B
A
B
C
Output value : 0.5
Output value : 0.3
Output value : 0.1
ww
2. Calculate the Error, that is the difference (Δ) between the output of neuron in output layer
with the desired value (teaching signal)
Input data : an image of “B”
Input layer Output layer Hidden layer
Backpropagation Learning
B
A
B
C
Output value : 0.5
Output value : 0.3
Output value : 0.1
w0.5
w Teaching signal
2. Calculate the Error, that is the difference (Δ) between the output of neuron in output layer
with the desired value (teaching signal)
Input data : an image of “B”
Input layer Output layer Hidden layer
03
0.1
Backpropagation Learning
B
A
B
C
0
1
0
wΔ = 0-0.5
w
2. Calculate the Error, that is the difference (Δ) between the output of neuron in output layer
with the desired value (teaching signal)
Input data : an image of “B”
Input layer Output layer Hidden layer
Δ = 1-03
Δ = 0-0.1
Backpropagation Learning
B
A
B
C
ww
X
3. Using the Δ value, update the weight between Output-Hidden Layer, and Hidden-Input Layer
Backward pass
Teaching signal
Input data
Input layer Output layer Hidden layer
Δ
Δ
Δ
Backpropagation Learning
4. Repeat step 1 to step 3, until stopping criteria is satisfied.
Stopping Criteria: - maximal epochs/iteration - MSE (Mean Square Error)
Backpropagation Learning
BP for 3 layers MLP
… …
…
Input Layer Output LayerHidden Layer
Ii
i j k
HjOk
wji wkj
Input layer Hidden Layer
Output layer
x!
Input layer-Hidden layer
ii xI =
jnetjj enetfH −+
==11)(
∑+=i
ijijj Iwnet θ… …
…
Input Layer Output LayerHidden Layer
Ii
i j k
HjOk
wji wkj
Input layer Hidden Layer
Output layer
x!
bias
Forward Pass (1)
Hidden layer-Output layer
x!knetkk e
netfO −+==11)(
∑+=j
jkjkk Hwnet θ
Forward Pass (2)
x!
… …
…
Input Layer Output LayerHidden Layer
Ii
i j k
HjOk
wji wkj
Input layer Hidden Layer
Output layer
Backward Pass 1: Hidden-Output Layer
Hidden layer-Output layer
∑ −=k
kk OtE 2)(21
x! Teaching signal Δ )1()( kkkkk OOOt −−=δ
jkkj
kj HwEw ηδη =
∂∂
−=Δ
Error (MSE:Mean Square Error)
Weight update
kjoldnew www Δ+=
Learning rate
… …
…
Input Layer Output LayerHidden Layer
Ii
i j k
HjOk
wji wkj
Input layer Hidden Layer
Output layer
Δ
Δ
Δ
∑ −=k
kk OtE 2)(21
)( kkk
OtOE
−−=∂∂
)1()1(
12 kk
netnet
k
k OOeenet
Ok
k−=
+=
∂
∂ −−
jkj
k Hwnet
=∂∂
Error is given by Modification of weights between Output and Hidden Layer due to the error E is calculated as follows:
)1()( kkkkk OOOt −−=δ
jkkj
kj HwEw ηδη =
∂∂
−=Δ
jkjkkkkkj
HHOOOtwE
δ−=−−−=∂∂ )1()(
where
Thus, the weight correction is obtained as follows
η is the learning rate
Hidden layer-Input layer
Weight update
jioldnew www Δ+=
∑−=k
kkjjjj wHH δδ )1(
ijji
ji xwEw ηδη =
∂∂
−=Δx! Teaching signal Δ
… …
…
Input Layer Output LayerHidden Layer
Ii
i j k
HjOk
wji wkj
Input layer Hidden Layer
Output layer
Δ
Δ
Δ
Backward Pass 2: Input-Hidden Layer
kjj
k
kkkkkj
k
k
iji
j
jjnet
netj
j
wHnet
OOOtnetO
OE
xwnet
HHeenet
Hj
j
=∂
∂
−=−−−=∂
∂
∂
∂
=∂
∂
−=+
=∂
∂ −
−
δ)1()(
)1()1(
12
j
k
k
k
k kji
j
j
j
ji
j
j
j
j
k
k
k
k kji
Hnet
netO
OE
wnet
netH
wnet
netH
Hnet
netO
OE
wE
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂=
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂=
∂
∂
∑
∑
The weight correction between Hidden and Input layer are determined using the similar way.
ij
ij
kkkjjji
kkjkijj
ji
xI
wHHI
wIHHwE
δ
δ
δ
δ
=
=
−−=
−−=∂
∂
∑
∑)1(
)()1(hence
∑−=k
kkjjjj wHH δδ )1(where
The correction of weight vector is
ijji
ji xwEw ηδη =
∂∂
−=Δ
Momentum
jkkj Htw ηδ=Δ )(
ijji xtw ηδ=Δ )(
)1()( −Δ+=Δ twHtw kjjkkj αηδ
)1()( −Δ+=Δ twxtw jiijji αηδ
Add inertia to the motion through weight space, preventing the oscillation
Output-Hidden Layer
Hidden-Input Layer
ii xI =
jnetjj enetfH −+
==11)(∑+=
iijijj Iwnet θ
Training Process: Forward Pass
1. Calculate the Output of Input Layer
2. Calculate the Output of Hidden Layer
3. Calculate the Output of Output Layer
knetkk enetfO −+
==11)(∑+=
jjkjkk Hwnet θ
∑−=k
kjkjjj wHH δδ ,)1(
Training Process: Backward Pass
1. Calculate the of Output Layer
2. Update the weight between Hidden & Output Layer
3. Calculate the of Hidden layer
4. Update the weight between Input & Hidden Layer
jkjk Hw ηδ=Δ ,
))(1( kkkkk OtOO −−=δ
jkoldjknewjk www ,)(,)(, Δ+=
kδ
jδ
ijij Iw ηδ=Δ , ijoldijnewij www ,)(,)(, Δ+=
Implementation of Neural Network for Handwriting Numeral Recognition System
in Facsimile Autodialing System 1
123-456-7890
Facsimile Form
To Mr.Tanaka
XXXXXXXXXX
XXXXXXXXXXXXX
XXXXXXXXXXXXX
XXXXXXXXXX
H.Kawajiri
123-456-7890
Facsimile Form
To Mr.Tanaka
XXXXXXXXXX
XXXXXXXXXXXXX
XXXXXXXXXXXXX
XXXXXXXXXX
H.Kawajiri
123-456-7890
④Auto-dialing
①Write the dial number at the head of the facsimile draft
②Insert the draft
③Dial number will be recognized and displayed
Facsimile Form
⑤Sending the draft
Facsimile Form
⑤Sending the draft
Hand-written Auto-dialing Facsimile(SFX-70CL)
Related Publication: Hand-written Numeric Character Recognition for Facsimile Auto-dialing by Large Scale Neural Network CombNET-II, Proc. of 4th.International Conference on Engineering Applications of NeuralNetworks, pp.40-46, June 10-12,1998, Gibraltar
• Application : - Robot Eyes - Support System for Visually Handicapped
Input Image
Camera
Find the text region
Character recognition
Text to Speech Synthesizer
Automatic System for locating characters Using Stroke Analysis Neural Network
2
Related Publication: An algorithm for locating characters in color image using stroke analysis neural network, Proc. of the 9th International Conference on Neural Information Processing (ICONIP’02), Vol.4, pp.2132-2136, November 18-22, 2002, Singapore
Fog Forecasting by large scale neural network CombNET-II
3
l Predicting fog event based on meteorological observation l The prediction was held every 30 minutes and the result was used to
support aircraft navigation l The number of fog events was very small compared to no fog events
which can be considered as a pattern classification problem involving imbalanced training sets
l Observation was held every 30 minutes, in Long.141.70 E, 42.77 Lat., 25 m above sea level by Shin Chitose Meteorological Observatory Station (Hokkaido Island, Japan)
l Fog Event is defined for condition where l Range of Visibility < 1000 m l Weather shows the appearance of the fog
l Winner of the competition (1999)
No. Meteorological Information
1 2 3 4 5 6 7 8 9 10 11 12 13
Year Month Date Time Atmospheric Pressure [hPA] Temperature [oC] Dew Point Temperature [oC] Wind Direction [o] Wind Speed [m/s] Max.Inst.Wind Speed [m/s] Change of Wind (1) [o] Change of Wind (1) [o] Range of Visibility
No. Meteorological Information
14 15 16 17 18 19 20 21 22 23 24 25 26
Weather Cloudiness (1st layer) Cloud Shape (1st layer) Cloud Height (1st layer) Cloudiness (2st layer) Cloud Shape (2st layer) Cloud Height (2st layer) Cloudiness (3st layer) Cloud Shape (3st layer) Cloud Height (3st layer) Cloudiness (4st layer) Cloud Shape (4st layer) Cloud Height (4st layer)
Example : 1984 1 1 4.5 1008 0.0 –7.0 270 6 –1 –1 –1 9999 85 0 2 10 0 4 25 –1 –1 –1 –1 –1 –1
Observed Information
Proposed Method
CombNET-II
Probabilistic NN
Modified Counter Propagation NN
Fog Events (539 correct) Predictions
622
169
908
Correctly Pred. 374
127
178
Num. of false prediction.
370
445
734
Result of 1999 Fog Forecasting Contest
This study won the first prize award in the 1999 Fog Forecasting Contest sponsored by Neurocomputing Technical Group of IEICE-Japan
Problem: given the complete observation data of 1984-1988, 1990-1994 for designing the model, then predict the appearance of fog-event during
1989 and 1995
Achievements
This study won the first prize award in the 1999 Fog Forecasting Contest sponsored by Neurocomputing Technical Group of IEICE-Japan
Related Publications: 1. A Solution for Imbalanced Training Sets Problem by CombNET-II
and Its Application on Fog Forecasting, IEICE Trans. on Information & Systems, Vol.E85-D, No.7, pp.1165-1174, July 2002
2. Mathematical perspective of CombNET and its application to meteorological prediction, Special Issue of Meteorological Society of Japan on Mathematical Perspective of Neural Network and its Application to Meteorological Problem, Meteorological Research Note, No.203, pp.77-107, October 2002
NET Talk
• T.J. Sejnowski and C.R. Rosenberg : a parallel network that learns to read aloud, Cognitive Science, 14:179-211, 1990. Simulation: “Continuous Informal Speech” pp.194-203
• Network architecture: 203-120-26 (trained in 30,000 iterations)
Output: phoneme (accuracy 98%)
Text (1000 words) THE OF AND TO IN …etc
… …
…
Input Layer Output LayerHidden Layer
Ii
i j k
HjOk
wji wkj
http://www.cnl.salk.edu/ParallelNetsPronounce/index.php
4
Handwriting Digit Recognition
• MNIST database consists of 60,000 examples as training set, and 10,000 examples as testing set
• Linear Classifier: 8.4% error • K-Nearest Neighbor Classifier, L3: 1.22% error • SVM Gaussian kernel: 1.4% • SVM deg.4 polynomial : 1.1% error • 2 layer ANN with 800 hidden units: 0.9% error • Currently (26 October 2009) the best accuracy is achieved using
Large Convolutional Network (0.39% error)
http://yann.lecun.com/exdb/mnist/
5
Agenda
1. Brain, Biological neuron, Artificial Neuron 2. Perceptron 3. Multilayer Perceptron & Backpropagation Algorithm 4. Application of neural network 5. Practice using WEKA 6. Important & useful references
Flow of an AI experiment
AI model
Training Set
Testing Set
Validation Set
Model fitting
Error estimation Of selected model
Generalization assessment Of the final chosen model
applied to the real world
How to make experiment using ANN ?
Step 1 Prepare three data set which is independent each other: Training Set, Validation Set and Testing Set. Step 2 Train the neural network using initial parameter setting : - stopping criteria (training is stopped if exceeded t iteration OR MSE is lower than z)
- num. of hidden neuron - learning rate - momentum
How to make experiment using ANN ?
Step 3 Evaluate the performance of the initial model by measuring its accuracy to the validation set Step 4 Change the parameter and repeat step 2 and step 3 until satisfied result achieved. Step 5 Evaluate the performance of the neural network by measuring its accuracy to the testing set
Performance Evaluation
• Training set: model fitting • Validation set: estimation of prediction error for
model selection • Testing set: assessment of generalization error
of the final chosen model
Train Validation Test
Agenda
1. Brain, Biological neuron, Artificial Neuron 2. Perceptron 3. Multilayer Perceptron & Backpropagation Algorithm 4. Application of neural network 5. Practice using WEKA 6. Important & useful references
Important & Useful References for Neural Network
• Neural Networks for Pattern Recognition, Christopher M. Bishop, Oxford University Press, 1995
• Neural Network Comprehensive Foundation (2nd edition), Simon Haykin, Prentice Hall, 1998
• Pattern Classification, Richard O. Duda, Peter E. Hart, David G. Stork, John Wiley & Sons Inc, 2000
• Artificial Intelligence: A Modern Approach, Stuart J. Russell, Peter Norvig, Prentice Hall, 2002
• Introduction to Data Mining, Pang Ning Tan, Michael Steinbach, Vipin Kumar, Addison Wesley, 2006
• Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Ian H. Witten, Eibe Frank, Morgan Kaufmann, June 2005
• FAQ Neural Network ftp://ftp.sas.com/pub/neural/FAQ.html
• Backpropagator’s review http://www.dontveter.com/bpr/bpr.html
• UCI Machine Learning Repository http://archive.ics.uci.edu/ml/index.html
• WEKA: http://www.cs.waikato.ac.nz/~ml/weka/
• Kangaroos and Training Neural Networks: http://www.sasenterpriseminer.com/documents/Kangaroos%20and%20Training%20Neural%20Networks.txt
top related