aw presentation june27 - wordpress.com · 2009. 8. 5. · kpcr, revised kpcr (chapter 4) rigde...

SOME APPLICATIONS:

NONLINEAR REGRESSIONS BASED

ON KERNEL METHOD IN SOCIAL

SCIENCES AND ENGINEERING

Antoni Wibowo

Farewell Lecture – PPI Ibaraki

27 June 2009

EDUCATION BACKGROUNDS

Dr.Eng., Social Systems and Management,

Graduate School of Systems and Information

Engineering, University of Tsukuba-2009.

M.Eng., Social Systems Engineering,

Graduate School of Systems and Information

Engineering, University of Tsukuba-2006.

M.Sc., Computer Science, University of

Indonesia-2000.

B.Sc./B.Eng.,Mathematics Engineering,

Sebelas Maret University-1995.

TABLE OF CONTENTS

� Introduction.

� Ordinary Linear Regression (OLR).

� Principal Component Regression and Ridge Regression.

� Motivations.

� Kernel Principal Component Analysis.

� Kernel Principal Component Regression (KPCR).

� Kernel Ridge Regression (KRR).

� Weighted Least Squares-KPCR

� Weighted Least Squares-KRR.

� Robust KPCR.

� Robust KRR.

� Numerical Examples.

� Conclusions.

Ordinary Linear Regression (OLR)

� Regression Analysis: a model of relationship of Y

(response variable) and x1, x2, …, xp (regressor variables).

� Ordinary Linear Regression (OLR)

� : random error (a random variable)

� : regression coefficients.

� Let

� : the response variable in the i-th observation,

� : the i-th observation of regressor (j=1,…,p),

� : random error on the i-th observation (i=1,…,N),

: set of real numbers.


The standard OLR model corresponding to model (1.1)

Assumption:

Regressor

Matrix.

: Nⅹ N identity matrix


� The aim of regression analysis:

�

� Solution (1.3) is given by

To find the estimator of , say such that：


� Let y be the observed data corresponding to Y.

� Let be the value of when Y is replaced by y in (1.4).

� Under the assumption that the column vectors of X are

linearly independent:

� Prediction value of y

� Residual between y and


� Root Mean Square Errors (RMSE)

� The prediction by OLR

OLR-Limitations

� OLR does not yields a nonlinear prediction.

� The existence of multicollinearity (collinearity) in X

can seriously deteriorate the prediction by OLR.

� variance of becomes a large number.

⇒We cannot be confident whether xj makes contribution

to the prediction by OLR or not.

Remarks:

• Collinearity is said to exist in X if XTX is a singular matrix.

•Multicollinearity is said to exist in X if XTX is a nearly

singular matrix,

i.e., some eigenvalues of XTX are close to zero.

• Eigenvalues of XTX are nonnegative real numbers.

• vector a≠≠≠≠0 is called an eigenvetor of XTX if

XTX a=λa for some scalar λ. The scalar λ is called an eigenvalue of XTX.

Example 01:

The Household Consumption

yi : the i-th household consumption expenditure,

xi1: the i-th household income,

xi2: the i-th household wealth.

The OLR of the household consumption data :

Table 1: The household consumption data

Example 01:

The Household Consumption Data

Applying OLR to the consumption data:

Eigenvalues of XTX:

λ1=3.4032e+7,

λ2=6.7952e+1,

λ3=1.0165.

λ2/λ1=1.9967e-6,

λ3/λ1=2.9868e-8.


We cannot be confident whether x2 makes

contribution to this prediction or not.

95% Confidence Interval of β2 : [-0.2331,0.1485]

⇒Multicollinearity

/collinearity exists in X.

TABLE OF CONTENTS

� Introduction.



� Motivations.






� Robust KPCR.

� Robust KRR.


� Conclusions.

PCR AND RR

To overcome the effects of multicollinearity (collinearity).

1. Principal Component Regression (PCR)

2. Ridge Regression (RR).

PCR

1. Principal Component Regression (PCR):

What is PCA ?

+ PC

A= PC

R

OLR

Principal Component Analysis (PCA)

PCA

� PCA: Orthogonal transformation.

� PCA’s procedure:

PCR=OLR+PCA

PCR’s Procedure:

r : the retained

number of principal

component for PCR.

How to choose r?

Estimator of

PCR’s

regression

coefficients

Limitation:

prediction by PCR

is linear model.

Example 01:


Applying PCR to the household consumption data :

λ2/λ1=1.9953e-5,

r =1


Eigenvalues of :

λ1= 3.8525e+5,

λ2= 7.5329

-The effects of multicollinearity

/collinearity are avoided. -

-But, linear prediction regression.

95% Confidence Interval of β1 :

[0.0409,0.0581]

RR

2. Ridge Regression (RR) :

Prediction by ridge regression:

Limitation:

prediction by RR

is linear model.

(RR)

for some q>0

(OLR)

An appropriate q can be obtained by

the cross validation/holdout method.

Example 01:



Applying RR to the household consumption data : q=20

TABLE OF CONTENTS

� Introduction.



� Motivations.






� Robust KPCR.

� Robust KRR.


� Conclusions.

MOTIVATIONS

�OLR, PCR and RR yield linear prediction.

⇒Motivation 1.

�Equal variance of random errors is assumed.

� What happens if random errors have unequal

variance and the observed data contain

multicollinearity /collinearity?

⇒Motivation 2.

� What happens if the observed data contain outliers?

⇒Motivation 3.

Outliers : residuals of the

observed data are large number.

Motivation 1: Linearity.

To overcome the limitation of PCR:

� Rosipal et al,

� Jade et al.,

� Hoegaerts et al.

However, the existing KPCR has theoretical

difficulties in the procedure to obtain the prediction

of KPCR.

⇒ We revise the existing KPCR.

} Kernel Principal

Component

Regression (KPCR).

Neural Computing and Application [2001],

Journal of Machine Learning [2002]

Chemical Engineering Sciences [2003]

Neurocomputing [2005]

Motivation 2: Equal Variances

� Weighted Least Squares (WLS) is a widely used technique.

� Limitation:

�WLS yields a linear prediction.

�There is no guarantee that multicollinearity can be avoided.

� KPCR (KRR) can be inappropriate to be used since they are

constructed based on the standard OLR model.

� We propose two methods:

� a combination of WLS and KPCR (WLS-KPCR),

� a combination of WLS and KRR (WLS-KRR) .

(Standard OLR model) (Feasible WLS model)

WN : a diagonal matrix.

Motivation 3: Sensitive to Outliers

� OLR, PCR, RR, KPCR and KRR can be inappropriate.

� M-estimation is a widely used technique to eliminate the

effect of the outliers.

� Limitation: M-estimation yields a linear prediction.

� Famenko et al. [2006] proposed a nonlinear prediction

based on M-estimation.

� It needs a specific nonlinear model in advance.

� We propose two methods:

� a combination of M-estimation and KPCR (R-KPCR),

� a combination of M-estimation and KRR (R-KRR).

�No need to specify a nonlinear model in advance.

Kernel Ridge Regression (KRR) is

proposed to overcome the limitation

of Ridge Regression.

MOTIVATIONS

Remarks: R-KPCR=Robust -Kernel Principal Components Regression,

R-KRR = Robust -Kernel Ridge Regression.

Method

Model Linear Nonlinear

(Non Kernel)

Nonlinear

(Kernel)

OLS Ordinary Linear

Regression (OLR)

Jukic’s regression

[2004]

KPCR,

Revised KPCR

(Chapter 4)

Rigde Ridge Regression

(RR)

KRR

Weighted Least

Squares

(WLS)

WLS Linear

Regression

(WLS-LR)

WLS-KPCR,

WLS-KRR

(Chapter 4-5)

Robust M Estimation Famenko [2006] –

M Estimation

R-KPCR,

R-KRR

(Chapter 4-5)

Nonparametric Nadaraya [1964] - Watson[1964]

TABLE OF CONTENTS

� Introduction.






� Robust KPCR.

� Robust KRR.


� Conclusions.

KERNEL PRINCIPAL COMPONENT

ANALYSIS (KPCA)

�

� F is assumed to be an Euclidean space of higher dimension,

say pF >> p.

→

PCA

Conceptual

KPCA


ANALYSIS (KPCA) Unknown

explicitly


ANALYSIS (KPCA)


ANALYSIS (KPCA)

Problem: we don’t know K explicitly.

Use Mercer’s Theorem:


ANALYSIS (KPCA)

� Choose a symmetric, continuous and p.s.d function κ,

⇒ There exists φ such that κ(x,z)= φ(x)T φ(z) for any x,z ∈Rp.

� Instead of choosing ψ explicitly, we employ φ as ψ.

K is known explicitly now.

κ is called the kernel

function.

employ φ as ψ

Finding eigenvalues/

eigenvectors of ,KPCA

�Conceptual KPCA’s procedure:via kernel κ.

normalized

eigenvectors of

via kernel κ, it is known explicitly

Conceptual

KPCA

KPCA

Let’s consider:

via kernel κ

It is known

explicitly

�Actual KPCA’s procedure:

The nonlinear principal

component corresponding to κ.

When the assumption

does not

hold,

K is replaced by

KN=K-EK-KE+EKE,

where ENxN=[1/N]

TABLE OF CONTENTS

� Introduction.






� Robust KPCR.

� Robust KRR.


� Conclusions.

REVISED KPCR

PCREstimator of PCR’s

regression coefficients

→ via kernel κ, they are known explicitly.

Estimator of the revised KPCR’s

regression coefficients

Conceptual

Revised KPCR

REVISED KPCR

Eq. (3.3) and

(3.5) are known

explicitly.

⇒⇒⇒⇒ (via kernel κ)

⇒⇒⇒⇒ (via kernel κ)

K and

are

known

explicitly.

: the retained

number of principal

component for the

revised KPCR.

REVISED KPCR

Summary of the revised KPCR’s procedure:

Actual

Revised KPCR

TABLE OF CONTENTS

� Introduction.






� Robust KPCR.

� Robust KRR.


� Conclusions.

EXAMPLES:

� Kernels :

� Gaussian:

� Sigmoid:

� Polynomial :

Applying the Revised KPCR (Gaussian kernel and =5 )

to the household consumption data:

EXAMPLE 01:


RMSE OLR=5.6960,

RMSE PCR=6.2008,

RMSE RR= 9.4442,

RMSE The Revised KPCR=

0.0214.

- Nonlinear prediction regression.

-The effects of multicollinearity

(collinearity) are avoided.

Selection of the best model by AIC:

(Model with the smallest AIC is the best model)

AIC OLR=72.7589,

AIC PCR=71.1182,

AIC RR= 109.3479,

AIC The Revised KPCR= -215.0172.

EXAMPLES:

� The prediction by Nadaraya-Watson Regression:

where

In our examples:

• p=1

• h1 is estimated by

• Bowman-Azzalini’s

method (h1ba).

•Silverman’s method

(h1s).

•

EXAMPLE 02: Sinc Function

Training data(Standard deviation of noise=0.2).

Black circles: original data.

Black dots: original

data with noise.

Green: OLR.

Blue: Nadaraya-Watson

with Bowman -

Azzalini’s method

(h1ba=0.6967).

Red: Revised KPCR with

parameter Gaussian= 5.




data with noise.

Green: OLR.


with Bowman -

Azzalini’s method

(h1ba=0.6967).



Testing data(Standard deviation of noise=0.5).


Training data(Standard deviation of noise=0.2).



data with noise.

Green: OLR.


with Silverman’s

method (h1s= 2.4680).






data with noise.

Green: OLR.


with Silverman’s

method (h1s= 2.4680)



Testing data(Standard deviation of noise=0.5).


The retained

number of PC

for the Revised

KPCR.

Table 2: Comparison OLR, Nadaraya-Watson regression and the

revised KPCR for the sinc function. (#: N-W with Bowman-Azzalini’s

method; : N-W with Silverman’s method)

EXAMPLE 03: Stock of Cars

Table 3: The stock of cars (expressed in Thousand) in the Netherlands.

Table 4: Comparison OLR, Nadaraya-Watson regression (#: N-

W with Bowman-Azzalini’s method; : N-W with Silverman’s

method) and the revised KPCR for the stock of cars.

Jukic et al. [2003] used the Gompertz function

to fit this data:


Green: OLR.

Blue:

(a) N-W with Bowman-

Azzalini’s method

(h1s= 62.8357).

(b) N-W with Silverman’s

method (h1s= 4.0981).


parameter Gaussian=5.

EXAMPLE 04: The Weight of Chickens

Table 5: The weight of female chickens.

Jukic et al. [2003] used the Gompertz function

to fit this data:

Table 6: Comparison OLR, Nadaraya-Watson regression (#: N-W with

Bowman-Azzalini’s method; : N-W with Silverman’s method) and the

revised KPCR for the female chickens.

Black circles: original data .

Green: OLR.

Blue:


Azzalini’s method

(h1s= 1.7682).


method (h1s=2.4715).



EXAMPLE 0５: Growth of the SonTable 7: Growth of the Son [Seber et al.,

1998, Nonlinear Programming]


Green: OLR.

Blue:


Azzalini’s method

(h1s= 9.1208).





Table 8: Comparison OLR, Nadaraya-Watson

regression (#: N-W with Bowman-Azzalini’s

method; : N-W with Silverman’s method) and the

revised KPCR for the growth of son.

EXAMPLE 06: The Puromicyn DataTable 9: The Puromicyn [Montgomery, 2006, Introduction To

Linear Regression Analysis]


Green: OLR.

Blue:


Azzalini’s method

(h1s= 2.3170).





xi: the i-th substrate concentration of the puromycin,

yi: the i-th reaction velocity of the puromycin,



method; : N-W with Silverman’s method) and

the revised KPCR for the puromicyn.

EXAMPLE 07: Radioactive Tracer DataTable 11: Radioactive Tracer [Seber et al.,

1998, Nonlinear Programming]


Green: OLR.

Blue:


Azzalini’s method

(h1s= 9.1208).





xi: the i-th time,

yi: the i-th radioactive tracer,



method; : N-W with Silverman’s method) and

the revised KPCR for the puromicyn.

TABLE OF CONTENTS

� Introduction.






� Robust KPCR.

� Robust KRR.


� Conclusions.

CONCLUSIONS

� KPCR is a novel method to perform nonlinear prediction in

regression analysis.

� We showed that the previous works of KPCR have

theoretical difficulty to derive the prediction and to obtain

the retained numbers of PCs.

� We revised the previous KPCR and showed that the

difficulties of the previous KPCR were eliminated by the

revised KPCR.

� In our case studies, the revised KPCR together with the

Gaussian kernel gives the better results than Jukic’s

regression does.

� The revised KPCR together with appropriate parameter of

the Gaussian kernel gives better results than Nadaraya-

Watson Regression does.

Remark:

KPCR=Kernel Principal

Components Regression.

KRR=Kernel Ridge Regression.

TABLE OF CONTENTS

� Introduction.






� Robust KPCR.

� Robust KRR.


� Conclusions.

Thank you for your attention.

EXAMPLE 08: Sinc Function+Outliers

Robust-KPCR

Black dots: original data+noise .

Green: OLR.

Magenta:M-Estimation

Blue: Revised KPCR with par. Gaussian=5

Red: Robust KPCR with par. Gaussian=5.

EXAMPLE 09: Sine Function+Outliers

Robust-KRR

Black dots: original data+noise .

Green: OLR.

Magenta:M-Estimation

Blue: Revised KPCR with par. Gaussian=5

Red: Robust KPCR with par. Gaussian=5.

aw presentation june27 - wordpress.com · 2009. 8. 5. · kpcr, revised kpcr (chapter 4) rigde...

Documents