principal component regression & canonical...

Post on 24-Mar-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Principal Component Regression & Canonical Correlation Analysis

Nachiketa Acharya nachiketa@iri.columbia.edu

Big Thanks to Dr. Simon Mason

Training Workshop on Seasonal Prediction of Southwest Monsoon Rainfall: 16th – 18th April, 2018

2 Seasonal Forecasting Using the Climate Predictability Tool

Making Seasonal forecast for monsoon

There are two main methods in use (and in practice we often combine the two, and/or use a hybrid of them).

I: Models of past statistics – teleconnection

EAST ASIA PR ANOMALY

WARM WATER VOLUME

N. ATL SST ANOMALY

EQ. SE INDIAN OCEAN SST ANOMALY

NORTH WEST EUROPE TEMP ANOMALY

NINO 3.4 SST ANOMALY

NATL PR ANOMALY

NCPAC U850 ANOMALY

Courtesy: IMD

3 Seasonal Forecasting Using the Climate Predictability Tool

Making Seasonal forecast for monsoon

II: Models of the physics – causation • Climate models are computer codes based on fundamental laws of physics

• However the output from these ensemble prediction systems cannot be used directly and requires further calibration in order to produce reliable forecasts.

4 Seasonal Forecasting Using the Climate Predictability Tool

Seasonal forecasting tool

• Climate Predictability tool (CPT) is an easy-to-use software for making seasonal forecast using either empirical predictors, of the outputs from GCM.

• Developed and maintain by Dr. Simon Mason.

• CPT available for Windows 95+ and Linux Batch version.

5 Seasonal Forecasting Using the Climate Predictability Tool

How the CPT make forecast?

6 Seasonal Forecasting Using the Climate Predictability Tool

Options for Making seasonal forecast in CPT

• Multiple Linear Regression

• Principal Component Regression.

• Canonical Correlation Analysis.

7 Seasonal Forecasting Using the Climate Predictability Tool

Multiple Linear Regression Area-average MAM rainfall for Thailand Ocean-based ENSO Indices

MAM rainfall over Thailand can be predict using a single predictor such as Feb NIÑO4 SSTs

0 1

0

1

ˆ NINO4

340 mm

50

y

0.48r

A simple linear regression equation for predicting rainfall has two parameters: • constant: how much rainfall can we

expect on average when the value of the predictor is 0.

• coefficient: how much can we expect rainfall to increase or decrease when the predictor increases by 1.

8 Seasonal Forecasting Using the Climate Predictability Tool

In CPT the MLR (multiple linear regression) option allows for more than one predictor:

Multiple Linear Regression

n

iii XbbY

10

where:

Y = dependent variable

Xi = independent variables

bi = regression coefficients

n = number of independent variables

9 Seasonal Forecasting Using the Climate Predictability Tool

In CPT the MLR (multiple linear regression) option allows for more than one predictor:

Let’s have some equations! The Multiple Regression Model

Estimates coefficients using least-square by minimizing

Multiple Linear Regression

10 Seasonal Forecasting Using the Climate Predictability Tool

Problems with Multiple Linear Regression

Multiplicity - Too many predictors from which to choose.

With more than a handful of candidate predictors, the probability of including a least one spurious predictor (and therefore of subsequently making a bad prediction) becomes very high.

11 Seasonal Forecasting Using the Climate Predictability Tool

Problems with Multiple Linear Regression

• Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated , meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy.

• When two X variables are highly correlated, they both convey essentially the same information.

• In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data.

• When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value.

12 Seasonal Forecasting Using the Climate Predictability Tool

Multicollinearity: Example

MAM Feb5ˆ 340 NINO40y MAM Jan3ˆ 344 NINO48y

MAM Jan Febˆ 332 NINO4 NIN131 O475y

For the first half of the data (1961 – 1985) only:

MAM Jan Febˆ 330 NINO4 NIN17 O419y

Predicting MAM 1961 – 2010 rainfall for Thailand from NIÑO4 SSTs:

Correlation between NINO4Jan and NINO4Feb is 0.97.

Perfect example where coefficient estimates are change erratically in response to small changes in the model or the data due to strong correlation among predictors.

MLR

13 Seasonal Forecasting Using the Climate Predictability Tool

Principal Components Regression • Principal components regression is just like

standard regression except the independent variables are principal components rather than the original X variables.

• Principal components regression (PCR) is a method for combating multicollinearity and results in estimation and prediction better than ordinary least squares

What is principal components analysis? • Principal components are linear

combinations of the X’s. Principal components are new variables which are linear combinations of the X’s. the new variables are not correlated with each other.

• The principal components transformation is equivalent to a rotation of axes.

Credit: Dave Garen

It was later independently developed (and named) by Harold Hotelling in the 1930s

PCA was invented in 1901 by Karl Pearson

14 Seasonal Forecasting Using the Climate Predictability Tool

Principal components

• Principal components analysis is specifically designed as a data reduction technique.

• PCR analysis is sometimes also known as empirical orthogonal function (EOF) analysis

15 Seasonal Forecasting Using the Climate Predictability Tool

Understanding PCA in a simple way

16 Seasonal Forecasting Using the Climate Predictability Tool

Diagrammatic Representation of Principal component

17 Seasonal Forecasting Using the Climate Predictability Tool

Selecting of PCs

How many of the new variables should be retained to represent the total variability of the original variables adequately? A stopping rule is required to identify at which point additional principal components are no longer required.

18 Seasonal Forecasting Using the Climate Predictability Tool

Visualization of Principal Components: Example from SST

Scores and loadings for first principal component of February 1961 – 2000 sea-surface temperatures.

A principal component is a weighted sum of a set of original variables, with the weights set so that the principal component has maximum variance.

19 Seasonal Forecasting Using the Climate Predictability Tool

Principal Components

The score indicates how intensely developed the loading pattern is for each year.

????

20 Seasonal Forecasting Using the Climate Predictability Tool

Scores and loadings for second principal component of February 1961 – 2000 sea-surface temperatures.

Separate patterns (“modes”) of variability can be defined. We can use just a few of these modes to represent the SST variability throughout the domain.

Principal Components

Principal Component Regression: Math (boring )

23 Seasonal Forecasting Using the Climate Predictability Tool

Principal Component

The principal components are orthogonal to each other, that is:

Elimination of Principal Components.

Transformation Back to the Original Variables:

25 Seasonal Forecasting Using the Climate Predictability Tool

Selecting Models in CPT

• MLR can be used when there is one or a very small number of predictors(independent to each other).

• PCR can be used to address problems with MLR that arise when there are many predictors.

• But what if there are many predictands?

26 Seasonal Forecasting Using the Climate Predictability Tool

Canonical Correlation Analysis

27 Seasonal Forecasting Using the Climate Predictability Tool

Canonical Correlation Analysis

28 Seasonal Forecasting Using the Climate Predictability Tool

Some terminology of CCA

29 Seasonal Forecasting Using the Climate Predictability Tool

Mode 1 r=0.73

Feb SSTs, 1961-2000

MAM rainfall

Visualization of Canonical Correlation Analysis: Example from SST and Rain

30 Seasonal Forecasting Using the Climate Predictability Tool

Canonical Correlation Analysis

Mode 2 r=0.67

Feb SSTs, 1961-2000

MAM rainfall

31 Seasonal Forecasting Using the Climate Predictability Tool

Selecting Models in CPT

• MLR can be used when there is one or a very small number of predictors (independent to each other).

• PCR can be used to address problems with MLR that arise when there are many predictors.

• CCA can be used if there are many predictors AND many predictands? But it can also be used even if there are a few of each.

32 Seasonal Forecasting Using the Climate Predictability Tool

Making probability forecast in CPT

• Generally Seasonal forecast describes in “tercile probability”

• Let’s do some hands on to understand this.

Example data: 10.13038, 27.59568, 13.42799, 13.96082, 21.76947, 16.92497, 18.6818, 25.95358, 30.46833, 18.02041,

23.27678, 17.61698, 22.29597, 24.39998, 13.83134, 22.74837, 26.01102, 20.92308, 37.29841, 13.91443, 12.6294, 2.501207, 29.10483, 28.67083, 19.20107, 28.98476, 21.83703, 22.9079, 21.12945, 24.39952

Mean=21.0205 and SD= 7.0933 Based on percentile we can estimates the thresholds Lower bound (33rd ) = 18.3511, Upper bound = 23.8382 (67th )

33 Seasonal Forecasting Using the Climate Predictability Tool

Plot the frequency and Probabilities

35.18XP 83.231)83.23( XPXP

Histogram

Probability Distribution Function based on Normal Distribution

34 Seasonal Forecasting Using the Climate Predictability Tool

• What will be the guess if no forecast is available?

– The forecast will fall in any of the categories

• So what are the chances of getting below normal?

– 33%

• Above Normal?

– 33%

• Near Normal

– 33%

• So every time the forecast is 33% probability of getting all the three categories, We call it as No Skill

35 Seasonal Forecasting Using the Climate Predictability Tool

• Now suppose in one year our models give mean forecast 30

• By various methods we calculate the spread (standard deviation) lets consider 6

• And then we generate forecast pdf

Blue is climatology red is forecast PDF

2% 2%

14% 14%

2%

84%

36 Seasonal Forecasting Using the Climate Predictability Tool

SO WE LEARN ABOUT Methods in CPT

WHAT DOES THEY REALLY

MEAN?

SORRY, I’M NOT PREPARED FOR

IN-DEPTH QUESTIONS

Questions?

web: iri .columbia.edu

@climatesociety

…/climatesociety

top related