supervised learning: linear perceptron nn
DESCRIPTION
Supervised Learning: Linear Perceptron NN. Distinction Between Approximation-Based vs. Decision-Based NNs. Teacher in Approximation-Based NN are quantitative in real or complex values Teacher in Decision-Based NNs are symbols, instead of numeric complex values. Decision-Based NN (DBNN). - PowerPoint PPT PresentationTRANSCRIPT
Supervised Learning: Linear Perceptron NN
Distinction Between Approximation-Based vs. Decision-Based NNs
•Teacher in Approximation-Based NN are quantitative in real or complex values
•Teacher in Decision-Based NNs are symbols, instead of numeric complex values.
Decision-Based NN (DBNN)
•Linear Perceptron
•Discriminant function (Score function)
•Reinforced and Anti-reinforced Learning Rules
•Hierarchical and Modular Structures
incorrect/correct classes
next pattern
1xw) 2xw) Mxw)
Supervised Learning: Linear Perceptron NN
Upon the presentation of the m-th training pattern z(m) , the weight vector w(m) is updated as
Two-Classes:Linear Perceptron Learning Rule
w(m+1) = w(m) + (t (m) - d (m) ) z(m)
jxwj) = xTwj+w0)= zTŵj (= zTw)▽jzwj) = z
where is a positive learning rate.
If a set of training patterns is linearly separable, then the linear perceptron learning algorithm converges to a correct solution in a finite number of iterations.
Linear Perceptron: Convergence Theorem(Two Classes)
It converges when learning rate is small enough.
w(m+1) = w(m) + (t (m) - d (m) ) z(m)
linearly separable
Multiple Classes
strongly linearly separable
If the given multiple-class training set is linearly separable, then the linear perceptron learning algorithm converges to a correct solution after a finite number of iterations.
Linear Perceptron Convergence Theorem(Multiple Classes)
Multiple Classes:Linear Perceptron Learning Rule
(linearly separability)
P1j= [ z 0 0 … -z 0 … 0]
DBNN Structure for Nonlinear Discriminant Function
x
y
1xw) 2xw) 3xw)
MAXNET
DBNN
MAXNET
w1w2 w3
teacher Training if teacher indicates the need
x
y
Decision-based learning rule is based on a minimal updating principle. The rule tends to avoid or minimize unnecessary side-effects due to overtraining.
•One scenario is that the pattern is already correctly classified by the current network, then there will be no updating attributed to that pattern, and the learning process will proceed with the next training pattern.
•The second scenario is that the pattern is incorrectly classified to another winning class. In this case, parameters of two classes must be updated. The score of the winning class should be reduced, by the anti-reinforced learning rule, while the score of the correct (but not winning) class should be enhanced by the reinforced learning rule.
wjwjjxw)
Reinforced and Anti-reinforced Learning
wiwiixw)Reinforced Learning
Anti-Reinforced Learning
Suppose that the m -th training patternn x(m) ,
j = arg maxi≠j φ( x(m), Θj )
The leading challenger is denoted by
x(m) is known to belong to the i-th class.
Anti-Reinforced Learning wjx wj)
▽jxwj) = x wj)
wix wi)Reinforced Learning
For Simple RBF Discriminant Function
Upon the presentation of the m-th training pattern z(m) , the weight vector w(m) is updated as
jxwj) = .5x wj)2
The learning scheme of the DBNN consists of two phases:
• locally unsupervised learning.
• globally supervised learning.
Decision-Based Learning Rule
Several approaches can be used to estimate the number of hidden nodes or the initial clustering can be determined based on VQ or EM clustering methods.
Locally Unsupervised Learning Via VQ or EM Clustering Method
2 2.5 3 3.5 4 4.5 5 5.5 x 105-1
-0.5
0
0.5
1x 10
5
1st Principal Components
2nd
Prin
cipa
l Com
pone
nts 1 2 3
4
• EM allows the final decision to incorporate prior information. This could be instrumental to multiple-expert or multiple-channel information fusion.
•The objective of learning is minimum classification error (not maximum likelihood estimation) .
•Inter-class mutual information is used to fine tune the decision boundaries (i.e., the globally supervised learning).
•In this phase, DBNN applies reinforced-antireinforcedlearning rule [Kung95] , or discriminative learning rule [Juang92] , to adjust network parameters. Only misclassified patterns need to be involved in this training phase.
Globally Supervised Learning Rules
a
a
aa a
a
a
aa
a
a
a
aa a
b
b
b
bb
b
b
b
b
bb
b
cc cc c
cc ccc c
cc ccc c
bb
a
a
aa a
a
a
aa a
a
b
b
b
bb
b
bb
cc ccc c
bbb b
Pictorial Presentation of Hierarchical DBNN
Discriminant function (Score function)
•LBF Function (or Mixture of)
•RBF Function (or Mixture of)
•Prediction Error Function
• Likelihood Function : HMM
Hierarchical and Modular DBNN
•Subcluster DBNN
•Probabilistic DBNN
•Local Experts via K-mean or EM
•Reinforced and Anti-reinforced Learning
MAXNET
Subcluster DBNN
Subcluster DBNN
Subcluster Decision-Based Learning Rule
Probabilistic DBNNProbabilistic DBNN
MAXNET
Probabilistic DBNN
Probabilistic DBNN
MAXNET
Probabilistic DBNN
Subnetwork of a Probabilistic DBNN is basically a mixture of local experts
RBF RBF RBF
P(y|x,P(y|x,
P(y|x,
P(y|x,k
k-th subnetwork x
Probabilistic Decision-Based Neural NetworksProbabilistic Decision-Based Neural Networks
Training of Probabilistic DBNN
•Selection of initial local experts: Intra-class training
Unsupervised training
EM (Probabilistic) Training
•Training of the experts: Inter-class training
Supervised training
Reinforced and Anti-reinforced Learning
Locally Unsupervised Phase Globally supervised Phase
K-meansK-means
Feature Vectors
K-NNsK-NNs
EMEM
x(t)
ClassificationClassification
x(t)Class ID
ReinforcedLearningReinforcedLearning
Converge ?
Y
Misclassifiedvectors
N
Probabilistic Decision-Based Neural NetworksProbabilistic Decision-Based Neural Networks
Training procedure
j
j
)}(,,{ jPjj }),(,,{ TjPjj
Probabilistic Decision-Based Neural NetworksProbabilistic Decision-Based Neural Networks
500
1000
1500
2000
2500
3000
3500
0 200 400 600 800 1000 1200 1400
F2
(Hz)
F1(Hz)
headhid
hodhad
heardwho'd
hawedhud
heedhood
500
1000
1500
2000
2500
3000
3500
0 200 400 600 800 1000 1200 1400
F2
(Hz)
F1(Hz)
headhid
hodhad
heardwho'd
hawedhud
heedhood
GMM PDBNN
2-D Vowel Problem:
For MOE, the influence from the training patterns on each expert is regulated by the gating network (which itself is under training) so that as the training goes, the training patterns will have higher influence on the closer-by experts, and lower influence on the far-away ones. (The MOE updates all the classes.)
Unlike the MOE, the DBNN makes use of both unsupervised (EM-type) and supervised (decision-based) learning rules. The DBNN uses only mis-classified training patterns for its globally supervised learning. The DBNN updates only the ``winner" class and the class which the mis-classified pattern actually belongs to. Its training strategy is to abide by a ``minimal updating principle“.
Difference of MOE and DBNN
DBNN/PDBNN Applications
• OCR (DBNN)• Texture Segmentation(DBNN)• Mammogram Diagnosis (PDBNN)• Face Detection(PDBNN)• Face Recognition (PDBNN)• Money Recognition(PDBNN)• Multimedia Library(DBNN)
OCR Classification (DBNN)
Image Texture Classification (DBNN)
Face Detection (PDBNN)
Face Recognition (PDBNN)
show movies
Multimedia Library(PDBNN)
MatLab Assignment #4: DBNN to separate 2 classes
•RBF DBNN with 4 centroids per class
•RBF DBNN with 4 centroids and 6 centroids for green and blue classes respectively.
ratio=2:1
RBF-BP NN for Dynamic Resource Allocation
•use content to determine renegotiation time
•use content/ST-traffic to estimate how much resource to request
Neural network traffic predictor yields smaller prediction MSE and higher link utilization.
Modern information technology in the internet era should support interactive and intelligent processing that transforms and transfers information.
Intelligent Media Agent
Integration of signal processing and neural net techniques could be a versatile tool to a broad spectrum of multimedia applications.
EM Applications
•Uncertain Clustering/ Model
•Channel Confidence
*
*
*
Expert 1 Expert 2
Channel 1
Channel 2
Channel Fusion
classes-in-channel network
channel channel
Sensor = Channel = Expert
Sensor Fusion
Human Sensory
Modalities
Computer Sensory
Modalities
Da.
“Ga”
“Ba”
Fusion Example
Toy Car Recognition
Probabilistic Decision-Based Neural NetworksProbabilistic Decision-Based Neural Networks
Locally Unsupervised Phase Globally supervised Phase
K-meansK-means
Feature Vectors
K-NNsK-NNs
EMEM
x(t)
ClassificationClassification
x(t)Class ID
ReinforcedLearningReinforcedLearning
Converge ?
Y
Misclassifiedvectors
N
Probabilistic Decision-Based Neural NetworksProbabilistic Decision-Based Neural Networks
Training procedure
j
j
)}(,,{ jPjj }),(,,{ TjPjj
Probabilistic Decision-Based Neural NetworksProbabilistic Decision-Based Neural Networks
500
1000
1500
2000
2500
3000
3500
0 200 400 600 800 1000 1200 1400
F2
(Hz)
F1(Hz)
headhid
hodhad
heardwho'd
hawedhud
heedhood
500
1000
1500
2000
2500
3000
3500
0 200 400 600 800 1000 1200 1400
F2
(Hz)
F1(Hz)
headhid
hodhad
heardwho'd
hawedhud
heedhood
GMM PDBNN
2-D Vowel Problem:
References:
[1] Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8 (1), pp. 114-132.
[2] Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons and Radial Basis Functions Networks,” Neurocomputing, 6 (1), 99-118.