least-mean-square training of cluster-weighted-modeling national taiwan university department of...
TRANSCRIPT
Least-Mean-Square Least-Mean-Square Training of Cluster-Training of Cluster-Weighted-ModelingWeighted-Modeling
National Taiwan UniversityNational Taiwan University
Department of Computer Department of Computer Science and Information Science and Information
EngineeringEngineering
OutlineOutline
• Introduction of CWMIntroduction of CWM
• Least-Mean-Square Training of CWMLeast-Mean-Square Training of CWM
• ExperimentsExperiments
• SummarySummary
• Future workFuture work
• Q&AQ&A
Cluster-Weighted Modeling Cluster-Weighted Modeling (CWM)(CWM)• CWM is a supervised learning model which are CWM is a supervised learning model which are
based on the joint probability density estimation based on the joint probability density estimation of a set of input and output (target) data.of a set of input and output (target) data.
• The joint probability is expended into clusters The joint probability is expended into clusters which describe local subspaces well. Each local which describe local subspaces well. Each local Gaussian expert can have its own local functionGaussian expert can have its own local function
(constant, linear or quadratic function).(constant, linear or quadratic function).• The global (nonlinear) model can be constructed The global (nonlinear) model can be constructed
by combining all the local models.by combining all the local models.• The resulting model has transparent local The resulting model has transparent local
structures and meaningful parameters.structures and meaningful parameters.
Prediction calculationPrediction calculation
• Conditional forecast: Conditional forecast: The expected output The expected output given the input.given the input.
• Conditional error Conditional error (output uncertainty): (output uncertainty): The expected output The expected output covariance given the covariance given the input input
• Objective function: Log-likelihood functionObjective function: Log-likelihood function
• Initialize cluster means (k-means), variances Initialize cluster means (k-means), variances (maximal range for each dimension). Initialize (maximal range for each dimension). Initialize
=1/M. M: Predetermined number of clusters.=1/M. M: Predetermined number of clusters.• E-step: Evaluate the posterior probabilityE-step: Evaluate the posterior probability
• M-step:M-step: Update clusters meansUpdate clusters means
Update prior probabilityUpdate prior probability
Training (EM Algorithm)Training (EM Algorithm)
M-step ( Cont.)M-step ( Cont.)• Define cluster-weighted expectationDefine cluster-weighted expectation
• Update cluster-weighted covariance matricesUpdate cluster-weighted covariance matrices
• Update cluster parameters which maximizesUpdate cluster parameters which maximizes
the data likelihoodthe data likelihood
wherewhere
• Update output covariance matricesUpdate output covariance matrices
Least-Mean-Square Training of Least-Mean-Square Training of CWM CWM
• To train CWM’s model parameters from a leaTo train CWM’s model parameters from a least-squared perspective.st-squared perspective.
• Minimizing squared error function of CWM’s tMinimizing squared error function of CWM’s training result to find another solution which craining result to find another solution which can have a better accuracy.an have a better accuracy.
• To find another solution when CWM is trapped To find another solution when CWM is trapped in local minima.in local minima.
• Applying supervised selection of cluster centerApplying supervised selection of cluster centers instead of unsupervised method.s instead of unsupervised method.
LMS Learning AlgorithmLMS Learning Algorithm
The instantaneous error produced by sample n iThe instantaneous error produced by sample n iss
The prediction formula isThe prediction formula is
Using softmax function to constrain prior probaUsing softmax function to constrain prior probability to have value between 0 and 1 and their bility to have value between 0 and 1 and their summation equal to 1.summation equal to 1.
LMS Learning Algorithm LMS Learning Algorithm (cont.)(cont.)• The derivation of gradients: The derivation of gradients:
LMS CWM Learning LMS CWM Learning AlgorithmAlgorithm• Initialization: Initialize Initialization: Initialize Using CWM’s training result. InitializeUsing CWM’s training result. InitializeIterate until convergence:Iterate until convergence: For n=1:NFor n=1:N Estimate error Estimate error Estimate gradients Estimate gradients UpdateUpdate
EndEndE-step:E-step:
M-step:M-step:
ExperimentsExperiments
• A simple Sin function.A simple Sin function.
• LMS-CWM has a better interpolation LMS-CWM has a better interpolation result.result.
Mackey-Glass Chaotic Time Mackey-Glass Chaotic Time Series PredictionSeries Prediction
• 1000 data points. We take the first 500 1000 data points. We take the first 500 points as training set, the last 500 points as training set, the last 500 points are chosen as test set.points are chosen as test set.
• Single-step predictionSingle-step prediction
• Input: [s(t),s(t-6),s(t-12),s(t-18)]Input: [s(t),s(t-6),s(t-12),s(t-18)]
• Output: s(t+85)Output: s(t+85)
• Local linear modelLocal linear model
• Number of clusters: 30Number of clusters: 30
Results (2)Results (2)• Learning curveLearning curve
CWM LMS CWMCWM LMS CWM
MSEMSE CWMCWM LMS CWMLMS CWM
Test setTest set 0.00080270.0008027 0.00044800.0004480
Training setTraining set 0.00065680.0006568 0.00042930.0004293
Local MinimaLocal Minima
• The initial locations of four clusters.The initial locations of four clusters.
The initial locations of four clusters
The resulting centers’ locations after each training session of CWM and LMS-CWM.
SummarySummary
• A LMS learning method for CWM is presented.A LMS learning method for CWM is presented.• May lose the benefits of data density estimation May lose the benefits of data density estimation
and characterizing data. and characterizing data. • Provides an alternative training option.Provides an alternative training option.• Parameters can be trained by EM and LMS Parameters can be trained by EM and LMS
alternatively.alternatively.• Combine both advantages of EM and LMS Combine both advantages of EM and LMS
learning.learning.• LMS-CWM learning can be viewed as a LMS-CWM learning can be viewed as a
refinement to CWM if only prediction accuracy refinement to CWM if only prediction accuracy is our main concern.is our main concern.
Future workFuture work
• Regularization.Regularization.
• Comparison between different Comparison between different models (from theoretical, models (from theoretical, performance point of views)performance point of views)