aw presentation june27 - wordpress.com · 2009. 8. 5. · kpcr, revised kpcr (chapter 4) rigde...
TRANSCRIPT
SOME APPLICATIONS:
NONLINEAR REGRESSIONS BASED
ON KERNEL METHOD IN SOCIAL
SCIENCES AND ENGINEERING
Antoni Wibowo
Farewell Lecture – PPI Ibaraki
27 June 2009
EDUCATION BACKGROUNDS
Dr.Eng., Social Systems and Management,
Graduate School of Systems and Information
Engineering, University of Tsukuba-2009.
M.Eng., Social Systems Engineering,
Graduate School of Systems and Information
Engineering, University of Tsukuba-2006.
M.Sc., Computer Science, University of
Indonesia-2000.
B.Sc./B.Eng.,Mathematics Engineering,
Sebelas Maret University-1995.
TABLE OF CONTENTS
� Introduction.
� Ordinary Linear Regression (OLR).
� Principal Component Regression and Ridge Regression.
� Motivations.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
Ordinary Linear Regression (OLR)
� Regression Analysis: a model of relationship of Y
(response variable) and x1, x2, …, xp (regressor variables).
� Ordinary Linear Regression (OLR)
� : random error (a random variable)
� : regression coefficients.
� Let
� : the response variable in the i-th observation,
� : the i-th observation of regressor (j=1,…,p),
� : random error on the i-th observation (i=1,…,N),
: set of real numbers.
Ordinary Linear Regression (OLR)
The standard OLR model corresponding to model (1.1)
Assumption:
Regressor
Matrix.
: Nⅹ N identity matrix
Ordinary Linear Regression (OLR)
� The aim of regression analysis:
�
� Solution (1.3) is given by
To find the estimator of , say such that:
Ordinary Linear Regression (OLR)
� Let y be the observed data corresponding to Y.
� Let be the value of when Y is replaced by y in (1.4).
� Under the assumption that the column vectors of X are
linearly independent:
� Prediction value of y
� Residual between y and
Ordinary Linear Regression (OLR)
� Root Mean Square Errors (RMSE)
� The prediction by OLR
OLR-Limitations
� OLR does not yields a nonlinear prediction.
� The existence of multicollinearity (collinearity) in X
can seriously deteriorate the prediction by OLR.
� variance of becomes a large number.
⇒We cannot be confident whether xj makes contribution
to the prediction by OLR or not.
Remarks:
• Collinearity is said to exist in X if XTX is a singular matrix.
•Multicollinearity is said to exist in X if XTX is a nearly
singular matrix,
i.e., some eigenvalues of XTX are close to zero.
• Eigenvalues of XTX are nonnegative real numbers.
• vector a≠≠≠≠0 is called an eigenvetor of XTX if
XTX a=λa for some scalar λ. The scalar λ is called an eigenvalue of XTX.
Example 01:
The Household Consumption
yi : the i-th household consumption expenditure,
xi1: the i-th household income,
xi2: the i-th household wealth.
The OLR of the household consumption data :
Table 1: The household consumption data
Example 01:
The Household Consumption Data
Applying OLR to the consumption data:
Eigenvalues of XTX:
λ1=3.4032e+7,
λ2=6.7952e+1,
λ3=1.0165.
λ2/λ1=1.9967e-6,
λ3/λ1=2.9868e-8.
Table 1: The household consumption data
We cannot be confident whether x2 makes
contribution to this prediction or not.
95% Confidence Interval of β2 : [-0.2331,0.1485]
⇒Multicollinearity
/collinearity exists in X.
TABLE OF CONTENTS
� Introduction.
� Ordinary Linear Regression (OLR).
� Principal Component Regression and Ridge Regression.
� Motivations.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
PCR AND RR
To overcome the effects of multicollinearity (collinearity).
1. Principal Component Regression (PCR)
2. Ridge Regression (RR).
PCR
1. Principal Component Regression (PCR):
What is PCA ?
+ PC
A= PC
R
OLR
Principal Component Analysis (PCA)
PCA
� PCA: Orthogonal transformation.
� PCA’s procedure:
PCR=OLR+PCA
PCR’s Procedure:
r : the retained
number of principal
component for PCR.
How to choose r?
Estimator of
PCR’s
regression
coefficients
Limitation:
prediction by PCR
is linear model.
Example 01:
The Household Consumption Data
Applying PCR to the household consumption data :
λ2/λ1=1.9953e-5,
r =1
Table 1: The household consumption data
Eigenvalues of :
λ1= 3.8525e+5,
λ2= 7.5329
-The effects of multicollinearity
/collinearity are avoided. -
-But, linear prediction regression.
95% Confidence Interval of β1 :
[0.0409,0.0581]
RR
2. Ridge Regression (RR) :
Prediction by ridge regression:
Limitation:
prediction by RR
is linear model.
(RR)
for some q>0
(OLR)
An appropriate q can be obtained by
the cross validation/holdout method.
Example 01:
The Household Consumption Data
Table 1: The household consumption data
Applying RR to the household consumption data : q=20
TABLE OF CONTENTS
� Introduction.
� Ordinary Linear Regression (OLR).
� Principal Component Regression and Ridge Regression.
� Motivations.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
MOTIVATIONS
�OLR, PCR and RR yield linear prediction.
⇒Motivation 1.
�Equal variance of random errors is assumed.
� What happens if random errors have unequal
variance and the observed data contain
multicollinearity /collinearity?
⇒Motivation 2.
� What happens if the observed data contain outliers?
⇒Motivation 3.
Outliers : residuals of the
observed data are large number.
Motivation 1: Linearity.
To overcome the limitation of PCR:
� Rosipal et al,
� Jade et al.,
� Hoegaerts et al.
However, the existing KPCR has theoretical
difficulties in the procedure to obtain the prediction
of KPCR.
⇒ We revise the existing KPCR.
} Kernel Principal
Component
Regression (KPCR).
Neural Computing and Application [2001],
Journal of Machine Learning [2002]
Chemical Engineering Sciences [2003]
Neurocomputing [2005]
Motivation 2: Equal Variances
� Weighted Least Squares (WLS) is a widely used technique.
� Limitation:
�WLS yields a linear prediction.
�There is no guarantee that multicollinearity can be avoided.
� KPCR (KRR) can be inappropriate to be used since they are
constructed based on the standard OLR model.
� We propose two methods:
� a combination of WLS and KPCR (WLS-KPCR),
� a combination of WLS and KRR (WLS-KRR) .
(Standard OLR model) (Feasible WLS model)
WN : a diagonal matrix.
Motivation 3: Sensitive to Outliers
� OLR, PCR, RR, KPCR and KRR can be inappropriate.
� M-estimation is a widely used technique to eliminate the
effect of the outliers.
� Limitation: M-estimation yields a linear prediction.
� Famenko et al. [2006] proposed a nonlinear prediction
based on M-estimation.
� It needs a specific nonlinear model in advance.
� We propose two methods:
� a combination of M-estimation and KPCR (R-KPCR),
� a combination of M-estimation and KRR (R-KRR).
�No need to specify a nonlinear model in advance.
Kernel Ridge Regression (KRR) is
proposed to overcome the limitation
of Ridge Regression.
MOTIVATIONS
Remarks: R-KPCR=Robust -Kernel Principal Components Regression,
R-KRR = Robust -Kernel Ridge Regression.
Method
Model Linear Nonlinear
(Non Kernel)
Nonlinear
(Kernel)
OLS Ordinary Linear
Regression (OLR)
Jukic’s regression
[2004]
KPCR,
Revised KPCR
(Chapter 4)
Rigde Ridge Regression
(RR)
KRR
Weighted Least
Squares
(WLS)
WLS Linear
Regression
(WLS-LR)
WLS-KPCR,
WLS-KRR
(Chapter 4-5)
Robust M Estimation Famenko [2006] –
M Estimation
R-KPCR,
R-KRR
(Chapter 4-5)
Nonparametric Nadaraya [1964] - Watson[1964]
TABLE OF CONTENTS
� Introduction.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
KERNEL PRINCIPAL COMPONENT
ANALYSIS (KPCA)
�
� F is assumed to be an Euclidean space of higher dimension,
say pF >> p.
→
PCA
Conceptual
KPCA
KERNEL PRINCIPAL COMPONENT
ANALYSIS (KPCA) Unknown
explicitly
KERNEL PRINCIPAL COMPONENT
ANALYSIS (KPCA)
KERNEL PRINCIPAL COMPONENT
ANALYSIS (KPCA)
Problem: we don’t know K explicitly.
Use Mercer’s Theorem:
KERNEL PRINCIPAL COMPONENT
ANALYSIS (KPCA)
� Choose a symmetric, continuous and p.s.d function κ,
⇒ There exists φ such that κ(x,z)= φ(x)T φ(z) for any x,z ∈Rp.
� Instead of choosing ψ explicitly, we employ φ as ψ.
K is known explicitly now.
κ is called the kernel
function.
employ φ as ψ
Finding eigenvalues/
eigenvectors of ,KPCA
�Conceptual KPCA’s procedure:via kernel κ.
normalized
eigenvectors of
via kernel κ, it is known explicitly
Conceptual
KPCA
KPCA
Let’s consider:
via kernel κ
It is known
explicitly
�Actual KPCA’s procedure:
The nonlinear principal
component corresponding to κ.
When the assumption
does not
hold,
K is replaced by
KN=K-EK-KE+EKE,
where ENxN=[1/N]
TABLE OF CONTENTS
� Introduction.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
REVISED KPCR
PCREstimator of PCR’s
regression coefficients
→ via kernel κ, they are known explicitly.
Estimator of the revised KPCR’s
regression coefficients
Conceptual
Revised KPCR
REVISED KPCR
Eq. (3.3) and
(3.5) are known
explicitly.
⇒⇒⇒⇒ (via kernel κ)
⇒⇒⇒⇒ (via kernel κ)
K and
are
known
explicitly.
: the retained
number of principal
component for the
revised KPCR.
REVISED KPCR
Summary of the revised KPCR’s procedure:
Actual
Revised KPCR
TABLE OF CONTENTS
� Introduction.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
EXAMPLES:
� Kernels :
� Gaussian:
� Sigmoid:
� Polynomial :
Applying the Revised KPCR (Gaussian kernel and =5 )
to the household consumption data:
EXAMPLE 01:
The Household Consumption Data
RMSE OLR=5.6960,
RMSE PCR=6.2008,
RMSE RR= 9.4442,
RMSE The Revised KPCR=
0.0214.
- Nonlinear prediction regression.
-The effects of multicollinearity
(collinearity) are avoided.
Selection of the best model by AIC:
(Model with the smallest AIC is the best model)
AIC OLR=72.7589,
AIC PCR=71.1182,
AIC RR= 109.3479,
AIC The Revised KPCR= -215.0172.
EXAMPLES:
� The prediction by Nadaraya-Watson Regression:
where
In our examples:
• p=1
• h1 is estimated by
• Bowman-Azzalini’s
method (h1ba).
•Silverman’s method
(h1s).
•
EXAMPLE 02: Sinc Function
Training data(Standard deviation of noise=0.2).
Black circles: original data.
Black dots: original
data with noise.
Green: OLR.
Blue: Nadaraya-Watson
with Bowman -
Azzalini’s method
(h1ba=0.6967).
Red: Revised KPCR with
parameter Gaussian= 5.
EXAMPLE 02: Sinc Function
Black circles: original data.
Black dots: original
data with noise.
Green: OLR.
Blue: Nadaraya-Watson
with Bowman -
Azzalini’s method
(h1ba=0.6967).
Red: Revised KPCR with
parameter Gaussian= 5.
Testing data(Standard deviation of noise=0.5).
EXAMPLE 02: Sinc Function
Training data(Standard deviation of noise=0.2).
Black circles: original data.
Black dots: original
data with noise.
Green: OLR.
Blue: Nadaraya-Watson
with Silverman’s
method (h1s= 2.4680).
Red: Revised KPCR with
parameter Gaussian= 5.
EXAMPLE 02: Sinc Function
Black circles: original data.
Black dots: original
data with noise.
Green: OLR.
Blue: Nadaraya-Watson
with Silverman’s
method (h1s= 2.4680)
Red: Revised KPCR with
parameter Gaussian= 5.
Testing data(Standard deviation of noise=0.5).
EXAMPLE 02: Sinc Function
The retained
number of PC
for the Revised
KPCR.
Table 2: Comparison OLR, Nadaraya-Watson regression and the
revised KPCR for the sinc function. (#: N-W with Bowman-Azzalini’s
method; : N-W with Silverman’s method)
EXAMPLE 03: Stock of Cars
Table 3: The stock of cars (expressed in Thousand) in the Netherlands.
Table 4: Comparison OLR, Nadaraya-Watson regression (#: N-
W with Bowman-Azzalini’s method; : N-W with Silverman’s
method) and the revised KPCR for the stock of cars.
Jukic et al. [2003] used the Gompertz function
to fit this data:
Black circles: original data.
Green: OLR.
Blue:
(a) N-W with Bowman-
Azzalini’s method
(h1s= 62.8357).
(b) N-W with Silverman’s
method (h1s= 4.0981).
Red: Revised KPCR with
parameter Gaussian=5.
EXAMPLE 04: The Weight of Chickens
Table 5: The weight of female chickens.
Jukic et al. [2003] used the Gompertz function
to fit this data:
Table 6: Comparison OLR, Nadaraya-Watson regression (#: N-W with
Bowman-Azzalini’s method; : N-W with Silverman’s method) and the
revised KPCR for the female chickens.
Black circles: original data .
Green: OLR.
Blue:
(a) N-W with Bowman-
Azzalini’s method
(h1s= 1.7682).
(b) N-W with Silverman’s
method (h1s=2.4715).
Red: Revised KPCR with
parameter Gaussian=5.
EXAMPLE 05: Growth of the SonTable 7: Growth of the Son [Seber et al.,
1998, Nonlinear Programming]
Black circles: original data .
Green: OLR.
Blue:
(a) N-W with Bowman-
Azzalini’s method
(h1s= 9.1208).
(b) N-W with Silverman’s
method (h1s=2.8747).
Red: Revised KPCR with
parameter Gaussian=5.
Table 8: Comparison OLR, Nadaraya-Watson
regression (#: N-W with Bowman-Azzalini’s
method; : N-W with Silverman’s method) and the
revised KPCR for the growth of son.
EXAMPLE 06: The Puromicyn DataTable 9: The Puromicyn [Montgomery, 2006, Introduction To
Linear Regression Analysis]
Black circles: original data .
Green: OLR.
Blue:
(a) N-W with Bowman-
Azzalini’s method
(h1s= 2.3170).
(b) N-W with Silverman’s
method (h1s=0.2571).
Red: Revised KPCR with
parameter Gaussian=5.
xi: the i-th substrate concentration of the puromycin,
yi: the i-th reaction velocity of the puromycin,
Table 10: Comparison OLR, Nadaraya-Watson
regression (#: N-W with Bowman-Azzalini’s
method; : N-W with Silverman’s method) and
the revised KPCR for the puromicyn.
EXAMPLE 07: Radioactive Tracer DataTable 11: Radioactive Tracer [Seber et al.,
1998, Nonlinear Programming]
Black circles: original data .
Green: OLR.
Blue:
(a) N-W with Bowman-
Azzalini’s method
(h1s= 9.1208).
(b) N-W with Silverman’s
method (h1s=1.1079).
Red: Revised KPCR with
parameter Gaussian=5.
xi: the i-th time,
yi: the i-th radioactive tracer,
Table 12: Comparison OLR, Nadaraya-Watson
regression (#: N-W with Bowman-Azzalini’s
method; : N-W with Silverman’s method) and
the revised KPCR for the puromicyn.
TABLE OF CONTENTS
� Introduction.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
CONCLUSIONS
� KPCR is a novel method to perform nonlinear prediction in
regression analysis.
� We showed that the previous works of KPCR have
theoretical difficulty to derive the prediction and to obtain
the retained numbers of PCs.
� We revised the previous KPCR and showed that the
difficulties of the previous KPCR were eliminated by the
revised KPCR.
� In our case studies, the revised KPCR together with the
Gaussian kernel gives the better results than Jukic’s
regression does.
� The revised KPCR together with appropriate parameter of
the Gaussian kernel gives better results than Nadaraya-
Watson Regression does.
Remark:
KPCR=Kernel Principal
Components Regression.
KRR=Kernel Ridge Regression.
TABLE OF CONTENTS
� Introduction.
� Kernel Principal Component Analysis.
� Kernel Principal Component Regression (KPCR).
� Kernel Ridge Regression (KRR).
� Weighted Least Squares-KPCR
� Weighted Least Squares-KRR.
� Robust KPCR.
� Robust KRR.
� Numerical Examples.
� Conclusions.
Thank you for your attention.
EXAMPLE 08: Sinc Function+Outliers
Robust-KPCR
Black dots: original data+noise .
Green: OLR.
Magenta:M-Estimation
Blue: Revised KPCR with par. Gaussian=5
Red: Robust KPCR with par. Gaussian=5.
EXAMPLE 09: Sine Function+Outliers
Robust-KRR
Black dots: original data+noise .
Green: OLR.
Magenta:M-Estimation
Blue: Revised KPCR with par. Gaussian=5
Red: Robust KPCR with par. Gaussian=5.