pcr, cca, and other pattern-based regression techniques

53
PCR, CCA, and other pattern-based regression techniques Michael K. Tippett International Research Institute for Climate and Society The Earth Institute, Columbia University Statistical Methods in Seasonal Prediction, ICTP Aug 2-13, 2010

Upload: others

Post on 23-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PCR, CCA, and other pattern-based regression techniques

PCR, CCA, and other pattern-basedregression techniques

Michael K. Tippett

International Research Institute for Climate and SocietyThe Earth Institute, Columbia University

Statistical Methods in Seasonal Prediction, ICTPAug 2-13, 2010

Page 2: PCR, CCA, and other pattern-based regression techniques

Principal component analysis and regression

Key idea of PCA: data compressionI Many (colinear) variables are replaced by a few new

variables.I The new variables are optimally chosen to approximate the

original variables.

[Assumption: “important” components have large variance]

How is PCA useful in regression problems?

Page 3: PCR, CCA, and other pattern-based regression techniques

Pop quiz

I What is the variance of a PC (time-series)?I What is the correlation between PCs?

Fact about regressionI (Easy) If y = ax is regression between x and y , and x and

y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data

transform the regression coefficients the same way.

y = Ax

y ′ = Ly , x ′ = Mx

y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′

(LAM−1) = regression coefficient matrix between x ′ and y ′

Page 4: PCR, CCA, and other pattern-based regression techniques

Pop quiz

I What is the variance of a PC (time-series)?I What is the correlation between PCs?

Fact about regressionI (Easy) If y = ax is regression between x and y , and x and

y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data

transform the regression coefficients the same way.

y = Ax

y ′ = Ly , x ′ = Mx

y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′

(LAM−1) = regression coefficient matrix between x ′ and y ′

Page 5: PCR, CCA, and other pattern-based regression techniques

Pop quiz

I What is the variance of a PC (time-series)?I What is the correlation between PCs?

Fact about regressionI (Easy) If y = ax is regression between x and y , and x and

y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data

transform the regression coefficients the same way.

y = Ax

y ′ = Ly , x ′ = Mx

y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′

(LAM−1) = regression coefficient matrix between x ′ and y ′

Page 6: PCR, CCA, and other pattern-based regression techniques

Pop quiz

I What is the variance of a PC (time-series)?I What is the correlation between PCs?

Fact about regressionI (Easy) If y = ax is regression between x and y , and x and

y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data

transform the regression coefficients the same way.

y = Ax

y ′ = Ly , x ′ = Mx

y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′

(LAM−1) = regression coefficient matrix between x ′ and y ′

Page 7: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Problem: predict gridded April-June precipitation over thePhilippines from proceeding (January-March) SST.

100E 120E 140E 160E 180 160W 140W

20S

10S

0

10N

20N

SST anomaly JFM 1971

−1.6

−1.2

−0.8

−0.4

0

0.4

0.8

1.2

1.6

115E 120E 125E 130E

5N

10N

15N

20N

Rainfall anomalies AMJ 1971

−8

−6

−4

−2

0

2

4

6

8

Page 8: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Problem: predict gridded April-June precipitation over thePhilippines from proceeding (January-March) sea surfacetemperature.

Details:I Data from 1971-2007 (37 years).I 194 precipitation gridpoints.I 1378 SST gridpoints.

What is the problem?

Page 9: PCR, CCA, and other pattern-based regression techniques

PCA and regression

For climate forecasts, the length of the historical recordseverely limits the number of predictors

If the predictors are spatial fields such as SST or the output of aGCM, the number of grid point values (100’s, 1000’s) is largecompared to the number time samples (10’s for climate)

Need to represent the information in the predictor spatial fieldusing fewer numbers.

I Spatial averages e.g., NINO 3.4.I Principal component analysis (PCA).

I Weighted spatial average.I Weights are chosen in an optimal manner to maximize

explained variance.

Page 10: PCR, CCA, and other pattern-based regression techniques

Example: PCA of SST

EOF 1 – Correlation with NINO 3.4 = -0.96

100E 120E 140E 160E 180 160W 140W

20S

10S

0

10N

20N

EOF 1 variance explained =50%

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1975 1980 1985 1990 1995 2000 2005

−101

Page 11: PCR, CCA, and other pattern-based regression techniques

Example: PCA of SST

EOF 2

1975 1980 1985 1990 1995 2000 2005−2−1

01

100E 120E 140E 160E 180 160W 140W

20S

10S

0

10N

20N

EOF 2 variance explained =16%

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Page 12: PCR, CCA, and other pattern-based regression techniques

Example: PCA of SST

EOF 3

1975 1980 1985 1990 1995 2000 2005−2−1

01

100E 120E 140E 160E 180 160W 140W

20S

10S

0

10N

20N

EOF 3 variance explained =11%

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Page 13: PCR, CCA, and other pattern-based regression techniques

Principal component regression

PCRI y = a1x1 + a2x2 + . . . amxm + bI Predictors xi are PCs.

In this example:I y = observed precipitation at a gridpoint.I PCs of SST anomalies.

How many PCs to use?

Predictor selection problem.

Page 14: PCR, CCA, and other pattern-based regression techniques

Principal component regression

PCRI y = a1x1 + a2x2 + . . . amxm + bI Predictors xi are PCs.

In this example:I y = observed precipitation at a gridpoint.I PCs of SST anomalies.

How many PCs to use?

Predictor selection problem.

Page 15: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Two models: climatology or ENSO PC as predictor.

Use AIC to select model.

115E 120E 125E 130E

5N

10N

15N

20N

AIC selected model

CL

ENSO

Page 16: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

I This modelseems to havesome skill(cross-validated)

I Why the negativecorrelation?[Later]

I How are the twoskill measuresrelatedin-sample?

115E 120E 125E 130E

5N

10N

15N

20N

correlation

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

115E 120E 125E 130E

5N

10N

15N

20N

1−error var/clim var

0

0.1

0.2

0.3

0.4

0.5

Page 17: PCR, CCA, and other pattern-based regression techniques

PCA and regression

Is there any benefit to using PCA on the predictand as well asthe predictors?

Is there any benefit to predicting the PCs of y rather than y?

Perhaps. One could imagine a spatial average (like a PC) beingmore predictable than a value at a gridpoint.

Page 18: PCR, CCA, and other pattern-based regression techniques

PCA and regression

Is there any benefit to using PCA on the predictand as well asthe predictors?

Is there any benefit to predicting the PCs of y rather than y?

Perhaps. One could imagine a spatial average (like a PC) beingmore predictable than a value at a gridpoint.

Page 19: PCR, CCA, and other pattern-based regression techniques

PCA and regression

I Predicting the PCs of y leads to a different predictorselection problem

I Before: select a model for each gridpoint?I Now: select a model for each PC?

Page 20: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

36 PCs of y . (Why?)

Use AIC to select model. ENSO or climatology.

5 10 15 20 25 30 35

CL

ENSO

PCs of y

5 10 15 20 25 30 35

0

5

10

PCs of y

∆ A

IC

Page 21: PCR, CCA, and other pattern-based regression techniques

Example: PhilippinesFirst 2 EOFs of AMJ precipitation:

115E 120E 125E 130E

5N

10N

15N

20N

EOF 1 of prcp

−4

−3

−2

−1

0

1

2

3

4

1975 1980 1985 1990 1995 2000 2005

−1012

1975 1980 1985 1990 1995 2000 2005−2

0

2115E 120E 125E 130E

5N

10N

15N

20N

EOF 2 of prcp

−4

−3

−2

−1

0

1

2

3

4

Page 22: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

I Correlations ofgridpoint andpatternregressions aresimilar.

I Normalized errorof gridpoint andpatternregressions aresimilar.

115E 120E 125E 130E

5N

10N

15N

20N

correlation

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

115E 120E 125E 130E

5N

10N

15N

20N

1−error var/clim var

0

0.1

0.2

0.3

0.4

0.5

Page 23: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Pattern regression is:

PCy1 = 0.61 PCx1

PCy2 = −0.36 PCx1

What do these numbers mean? (Hint: PCs have unit variance.)

115E 120E 125E 130E

5N

10N

15N

20N

EOF 1 of prcp

−4

−3

−2

−1

0

1

2

3

4

1975 1980 1985 1990 1995 2000 2005

−1012

1975 1980 1985 1990 1995 2000 2005−2

0

2115E 120E 125E 130E

5N

10N

15N

20N

EOF 2 of prcp

−4

−3

−2

−1

0

1

2

3

4

Page 24: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Reconstructing the spatial field:

Predicted rainfall = Climatology + PCy1EOFy1 + PCy2EOFy2

Difference between prediction and climatology (anomaly) is:

Predicted anomaly =

PCy1EOFy1 + PCy2EOFy2 = 0.61 PCx1EOFy1 − 0.36 PCx1EOFy1

= PCx1(0.61 EOFy1 − 0.36 EOFy1)

Is this simpler? Why?

Page 25: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

One pattern of rainfall goes with one pattern of SST.

100E 120E 140E 160E 180 160W 140W

20S

10S

0

10N

20N

EOF 1 variance explained =50%

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1975 1980 1985 1990 1995 2000 2005

−101

1975 1980 1985 1990 1995 2000 2005

−101

115E 120E 125E 130E

5N

10N

15N

20N

Pattern of prcp

−4

−3

−2

−1

0

1

2

3

4

What is the time series of the pattern?

Page 26: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Define the pattern to be

P ≡ 1√0.612 + 0.362

(0.61 EOFy1 − 0.36 EOFy1)

(Why this scaling of P?) The time series of the pattern P is:

TS =1√

0.612 + 0.362(0.61 PCy1 − 0.36 PCy1)

What is the variance of TS? (Hint: PCs are independent.) Key

Predicted anomaly = PCx1(0.61 EOFy1 − 0.36 EOFy1)

= 0.71 PCx1P

TS = 0.71 PCx1

What is 0.71? Hint: TS and PCx1 have unit variance.

Page 27: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Define the pattern to be

P ≡ 1√0.612 + 0.362

(0.61 EOFy1 − 0.36 EOFy1)

(Why this scaling of P?) The time series of the pattern P is:

TS =1√

0.612 + 0.362(0.61 PCy1 − 0.36 PCy1)

What is the variance of TS? (Hint: PCs are independent.) Key

Predicted anomaly = PCx1(0.61 EOFy1 − 0.36 EOFy1)

= 0.71 PCx1P

TS = 0.71 PCx1

What is 0.71? Hint: TS and PCx1 have unit variance.

Page 28: PCR, CCA, and other pattern-based regression techniques

Example: Philippines

Define the pattern to be

P ≡ 1√0.612 + 0.362

(0.61 EOFy1 − 0.36 EOFy1)

(Why this scaling of P?) The time series of the pattern P is:

TS =1√

0.612 + 0.362(0.61 PCy1 − 0.36 PCy1)

What is the variance of TS? (Hint: PCs are independent.) Key

Predicted anomaly = PCx1(0.61 EOFy1 − 0.36 EOFy1)

= 0.71 PCx1P

TS = 0.71 PCx1

What is 0.71? Hint: TS and PCx1 have unit variance.

Page 29: PCR, CCA, and other pattern-based regression techniques

In summary,

1975 1980 1985 1990 1995 2000 2005

−101

115E 120E 125E 130E

5N

10N

15N

20N

Pattern of prcp

−4

−3

−2

−1

0

1

2

3

4

= 0.71×100E 120E 140E 160E 180 160W 140W

20S

10S

0

10N

20N

EOF 1 variance explained =50%

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1975 1980 1985 1990 1995 2000 2005

−101

In general, any pattern regression can be decomposed intopairs of patterns related by the correlation of their time series.

Canonical correlation analysis (CCA) is an example of such adecomposition.

Page 30: PCR, CCA, and other pattern-based regression techniques

In summary,

1975 1980 1985 1990 1995 2000 2005

−101

115E 120E 125E 130E

5N

10N

15N

20N

Pattern of prcp

−4

−3

−2

−1

0

1

2

3

4

= 0.71×100E 120E 140E 160E 180 160W 140W

20S

10S

0

10N

20N

EOF 1 variance explained =50%

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1975 1980 1985 1990 1995 2000 2005

−101

In general, any pattern regression can be decomposed intopairs of patterns related by the correlation of their time series.

Canonical correlation analysis (CCA) is an example of such adecomposition.

Page 31: PCR, CCA, and other pattern-based regression techniques

Pattern regression

y1y2...yl

=

a11 a12 . . . a1ma21 a22 . . . a2m. . . . . .. . . . . .. . . . . .

al1 al2 . . . alm

x1x2...

xm

I l predictand PCsI m predictor PCsI l ×m regression coefficients.

A = Cov (PCy , PCx)[Cov

(PCx , PCT

x

)]−1

Page 32: PCR, CCA, and other pattern-based regression techniques

Pattern regression

y = Ax

A = Cov (PCy , PCx)[Cov

(PCx , PCT

x

)]−1

What is Cov(

PCx , PCTx

)? Why?

Hint: PCs are . . . .

A = Cov (PCy , PCx)

What do the elements of A measure?

A = Corr (PCy , PCx)

Page 33: PCR, CCA, and other pattern-based regression techniques

Pattern regression

y = Ax

A = Cov (PCy , PCx)[Cov

(PCx , PCT

x

)]−1

What is Cov(

PCx , PCTx

)? Why?

Hint: PCs are . . . .

A = Cov (PCy , PCx)

What do the elements of A measure?

A = Corr (PCy , PCx)

Page 34: PCR, CCA, and other pattern-based regression techniques

Pattern regression

y = Ax

A =

Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)

. . . . . .

. . . . . .

. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)

In general, each predicted PC of y depends on all the PCs of x .

What if A were diagonal? Is it likely that A is diagonal?Angle.

Page 35: PCR, CCA, and other pattern-based regression techniques

Pattern regression

y = Ax

A =

Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)

. . . . . .

. . . . . .

. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)

In general, each predicted PC of y depends on all the PCs of x .

What if A were diagonal? Is it likely that A is diagonal?Angle.

Page 36: PCR, CCA, and other pattern-based regression techniques

Pattern regression

y = Ax

A =

Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)

. . . . . .

. . . . . .

. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)

In general, each predicted PC of y depends on all the PCs of x .

What if A were diagonal? Is it likely that A is diagonal?Angle.

Page 37: PCR, CCA, and other pattern-based regression techniques

Pattern regression

y = Ax

A =

Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)

. . . . . .

. . . . . .

. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)

In general, each predicted PC of y depends on all the PCs of x .

What if A were diagonal? Is it likely that A is diagonal?Angle.

Page 38: PCR, CCA, and other pattern-based regression techniques

Pattern regression

I To decompose the regression into pairs of patterns,diagonalize A.

I Many ways to diagonalize A. The singular valuedecomposition (SVD) is:

A = USV T

where U and S are orthogonal and S is diagonal.(orthogonal matrix = columns are unit vectors = preserves angles and

magnitudes)

Substitutingy = Ax = USV T x

ory ′ = Sx ′

where y ′ = UT y and x ′ = V T x

Page 39: PCR, CCA, and other pattern-based regression techniques

Pattern regression

I To decompose the regression into pairs of patterns,diagonalize A.

I Many ways to diagonalize A. The singular valuedecomposition (SVD) is:

A = USV T

where U and S are orthogonal and S is diagonal.(orthogonal matrix = columns are unit vectors = preserves angles and

magnitudes)

Substitutingy = Ax = USV T x

ory ′ = Sx ′

where y ′ = UT y and x ′ = V T x

Page 40: PCR, CCA, and other pattern-based regression techniques

Diagonalized pattern regression

What can we say about the new variables

y ′ = UT y and x ′ = V T x?I y ′ and x ′ have unit variance and are uncorrelated (like

PCs).I PCs have unit variance and are uncorrelated.I Orthogonal transformation of PCs gives new uncorrelated

(angle) variables with unit variance (magnitude).I Each new predictand related to just one new predictor.

I y ′ = Sx ′ S is diagonal,I What do the values of S measure?

(Hint: what is the regression coefficient for two variables withunit variance?)

Page 41: PCR, CCA, and other pattern-based regression techniques

Diagonalized regression and CCA

This procedure is the same as canonical correlation analysis.I Regress PCs (uncorrelated unit variance) of y and y .

y = Ax

I Use SVD of A to get diagonal relation: y ′ = Sx ′.I New variables (canonical variates) are linear (orthogonal)

combinations of the PCs.I New variables have unit variance and are uncorrelated.I Associated patterns are linear combinations of EOFs.

(Generally not orthogonal).I Elements of S are correlations (canonical correlations).

Page 42: PCR, CCA, and other pattern-based regression techniques

More CCA

CCA is usually described as finding linear combinations of thex ’s and the y ’s which have maximum correlation.

Did we do that???

Finding maximum correlation between linear combinations of xand y is the same as finding maximum correlation betweenlinear combinations of x ′ and y ′. Why?

This means we can look at the correlation between linearcombinations of x ′ and y ′.

Page 43: PCR, CCA, and other pattern-based regression techniques

More CCA

CCA is usually described as finding linear combinations of thex ’s and the y ’s which have maximum correlation.

Did we do that???

Finding maximum correlation between linear combinations of xand y is the same as finding maximum correlation betweenlinear combinations of x ′ and y ′. Why?

This means we can look at the correlation between linearcombinations of x ′ and y ′.

Page 44: PCR, CCA, and other pattern-based regression techniques

A calculation

Corr(∑

aix ′i ,

∑bjy ′

j

)=

Cov(∑

aix ′i ,

∑bjy ′

j

)√

Var(∑

aix ′i

)Var

(∑bjy ′

j

)Var

(∑aix ′

i

)=

∑a2

i Var(x ′

i)

=∑

a2i = ‖a‖2

Var(∑

bjy ′j

)=

∑b2

j Var(

y ′j

)=

∑b2

j = ‖b‖2

Cov(∑

aix ′i ,

∑bjy ′

j

)=

∑aibiCov

(x ′

i , y ′i)

=∑

aibiSi ≤ S1∑

aibi ≤ ‖a‖ ‖b‖

Corr(∑

aix ′i ,

∑biy ′

i

)≤ S1

Page 45: PCR, CCA, and other pattern-based regression techniques

A calculation

Corr(∑

aix ′i ,

∑bjy ′

j

)=

Cov(∑

aix ′i ,

∑bjy ′

j

)√

Var(∑

aix ′i

)Var

(∑bjy ′

j

)Var

(∑aix ′

i

)=

∑a2

i Var(x ′

i)

=∑

a2i = ‖a‖2

Var(∑

bjy ′j

)=

∑b2

j Var(

y ′j

)=

∑b2

j = ‖b‖2

Cov(∑

aix ′i ,

∑bjy ′

j

)=

∑aibiCov

(x ′

i , y ′i)

=∑

aibiSi ≤ S1∑

aibi ≤ ‖a‖ ‖b‖

Corr(∑

aix ′i ,

∑biy ′

i

)≤ S1

Page 46: PCR, CCA, and other pattern-based regression techniques

A calculation

Corr(∑

aix ′i ,

∑bjy ′

j

)=

Cov(∑

aix ′i ,

∑bjy ′

j

)√

Var(∑

aix ′i

)Var

(∑bjy ′

j

)Var

(∑aix ′

i

)=

∑a2

i Var(x ′

i)

=∑

a2i = ‖a‖2

Var(∑

bjy ′j

)=

∑b2

j Var(

y ′j

)=

∑b2

j = ‖b‖2

Cov(∑

aix ′i ,

∑bjy ′

j

)=

∑aibiCov

(x ′

i , y ′i)

=∑

aibiSi ≤ S1∑

aibi ≤ ‖a‖ ‖b‖

Corr(∑

aix ′i ,

∑biy ′

i

)≤ S1

Page 47: PCR, CCA, and other pattern-based regression techniques

A calculation

Corr(∑

aix ′i ,

∑bjy ′

j

)=

Cov(∑

aix ′i ,

∑bjy ′

j

)√

Var(∑

aix ′i

)Var

(∑bjy ′

j

)Var

(∑aix ′

i

)=

∑a2

i Var(x ′

i)

=∑

a2i = ‖a‖2

Var(∑

bjy ′j

)=

∑b2

j Var(

y ′j

)=

∑b2

j = ‖b‖2

Cov(∑

aix ′i ,

∑bjy ′

j

)=

∑aibiCov

(x ′

i , y ′i)

=∑

aibiSi ≤ S1∑

aibi ≤ ‖a‖ ‖b‖

Corr(∑

aix ′i ,

∑biy ′

i

)≤ S1

Page 48: PCR, CCA, and other pattern-based regression techniques

More CCA components

Look for the linear combination of x and y that maximizescorrelation but is uncorrelated with the first component means

I looking for the linear combination of x ′i and y ′

i i = 2, 3, . . .that maximizes correlation. Why?

I Previous argument give that it is S2.

Page 49: PCR, CCA, and other pattern-based regression techniques

CCA and regression

Knowing CCA is regression is useful . . .I What happens if many (compared to sample size) PCs are

included in a CCA calculation? What happens to thecanonical correlations?

I How can the number of PCs included in CCA be decided?

Page 50: PCR, CCA, and other pattern-based regression techniques

CCA and regression

Knowing CCA is regression is useful . . .I What happens if many (compared to sample size) PCs are

included in a CCA calculation? What happens to thecanonical correlations?

I How can the number of PCs included in CCA be decided?

Page 51: PCR, CCA, and other pattern-based regression techniques

CCA and regression

Knowing CCA is regression is useful . . .I What happens if many (compared to sample size) PCs are

included in a CCA calculation? What happens to thecanonical correlations?

I How can the number of PCs included in CCA be decided?

Page 52: PCR, CCA, and other pattern-based regression techniques

Other pattern regression methods

Other diagonalizations of the regression coefficient matrix A(based on variants of the SVD) give other diagonalized patternregressions with components that optimize other quantities.

E.g., Redundancy analysis give components that maximizeexplained variance.

Maximum covariance analysis (MCA) finds components withmaximum covariance. However, the regression between thesepatterns is generally not diagonal–no simple relations betweenpairs of patterns.

Page 53: PCR, CCA, and other pattern-based regression techniques

Summary

I PCA compresses data and is useful in regressions. InPCR, PCs are the predictors.

I It can be useful to use PCs as predictands, too.I Diagonalizing regressions between PCs decomposes the

regression in pairs of patterns.I CCA diagonalizes the regression and find the components

with maximum correlation.