1 building the regression model –i selection and validation knn ch. 9 (pp. 343-375)
DESCRIPTION
3 The Model Building Process Data collection Controlled Experiments (levels, treatments) With supplemental variables (incorporate uncontrollable variables in regression model rather than in the experiment) Confirmatory Observational Studies (hypothesis testing, primary variables and risk factors) Exploratory Observational Studies (Measurement errors/problems, duplication of variables, spurious variables, sample size; are but some of the issues here)TRANSCRIPT
1
Building the Regression Model –I
Selection and Validation
KNN Ch. 9 (pp. 343-375)
2
The Model Building ProcessCollect and prepare data
Reduction of explanatory variables for exploratory/ observational studies
Refine model and select best model
Validate model – if it passes the checks then adopt it
All four of the above have several intermediate steps. These are outlined in Fig. 9.1, page 344 of KNN
3
The Model Building ProcessData collection
Controlled Experiments (levels, treatments)
With supplemental variables (incorporate uncontrollable variables in regression model rather than in the experiment)
Confirmatory Observational Studies (hypothesis testing, primary variables and risk factors)
Exploratory Observational Studies (Measurement errors/problems, duplication of variables, spurious variables, sample size; are but some of the issues here)
4
The Model Building ProcessData Preparation
What are the standard techniques here? Its an easy guess, a rough-cut approach is to look at various plots and identify obvious problems such as outliers, spurious variables etc.
Preliminary Model Investigation
Scatter Plots and Residual Plots (For what?)
Functional forms and transformations (of entire data or some explanatory variables or predicted variable?)
Interactions and …..Intuition
5
The Model Building ProcessReduction of Explanatory Variables
Generally an issue for Controlled Experiments with Supplemental Variables and for Exploratory Observational Studies
It is not difficult to guess that for Exploratory Observational Studies, this is more serious
Identification of good subsets of the explanatory variables and their functional forms and any interactions, is perhaps the most difficult problem in multiple regression analysis
Need to be careful of specification bias and latent explanatory variables.
6
The Model Building ProcessModel Refinement and Selection
Diagnostics for candidate models
Lack-of-fit tests if repeat obs. available
“Best” model’s # of variables should be used as benchmark for investigating other models with similar number of variables
Model Validation
Robustness and Usability of regression coefficients
Usability of regression function. Does it all make sense ?
7
All Possible Regressions: Variable Reduction
Usually many explanatory variables (p-1) present at the outset
Select the best subset of these variables
Best The smallest subset of variables which provides an adequate prediction of Y.
Multicollinearity usually a problem when all variables in the model.
Variable selection may be based on the determination coefficient or on the statistic
(Equivalent Procedures).
2pR pSSE
8
2pRpSSE - and are highest when all the
variables are in the model.
One intends to find the point at which adding more variables causes a very small increase in or a very small decrease in .
Given a value of p, we compute the maximum of Rp
2 (or minimum of SSEp) and then we compare the several maxima (minima).
See the Surgical Unit Example on page 350 of KNN.
2pR
pSSE
All Possible Regressions: Variable Reduction
9
A Simple ExampleRegression AnalysisThe regression equation isY = 0.236 + 9.09 X1 - 0.330 X2 - 0.203 X3 Predictor Coef StDev T PConstant 0.2361 0.2545 0.93 0.355X1 9.090 1.718 5.29 0.000X2 -0.3303 0.2229 -1.48 0.141X3 -0.20286 0.05894 -3.44 0.001 S = 1.802 R-Sq = 95.7% R-Sq(adj) = 95.6% Regression AnalysisThe regression equation isY = 0.408 + 6.55 X1 - 0.173 X3 Predictor Coef StDev T PConstant 0.4078 0.2276 1.79 0.075X1 6.5506 0.1201 54.54 0.000X3 -0.17253 0.05551 -3.11 0.002 S = 1.810 R-Sq = 95.6% R-Sq(adj) = 95.5% Regression AnalysisThe regression equation isY = 0.014 + 6.50 X1 Predictor Coef StDev T PConstant 0.0144 0.1949 0.07 0.941X1 6.4957 0.1225 53.05 0.000 S = 1.866 R-Sq = 95.3% R-Sq(adj) = 95.3%
10
Rp2 does not take into account the number of
parameters (p) and never decreases as p increases.
This is a mathematical property, but it may not make sense practically.
However, useless explanatory variables can actually worsen the predictive power of the model. How?
The adjusted coefficient of multiple determination will account for the increased p always.
The Ra2 and MSEp criterion are equivalent
When can MSEp actually increase with p?
)1/()/(12
nSSTO
pnSSERa
All Possible Regressions: Variable Reduction
11
A Simple ExampleRegression AnalysisThe regression equation isY = 21.7 + 12.8 X1 - 0.88 X2 - 5.93 X3 Predictor Coef StDev T PConstant 21.69 14.77 1.47 0.381X1 12.763 9.225 1.38 0.398X2 -0.877 1.099 -0.80 0.571X3 -5.927 2.033 -2.92 0.210 S = 2.878 R-Sq = 99.3% R-Sq(adj) = 97.1% Regression AnalysisThe regression equation isY = 27.8 + 5.45 X1 - 6.37 X3 Predictor Coef StDev T PConstant 27.76 11.45 2.43 0.136X1 5.4534 0.9666 5.64 0.030X3 -6.370 1.769 -3.60 0.069 S = 2.603 R-Sq = 98.8% R-Sq(adj) = 97.7% Regression AnalysisThe regression equation isY = - 10.4 + 8.05 X1 Predictor Coef StDev T PConstant -10.363 9.738 -1.06 0.365X1 8.049 1.439 5.59 0.011 S = 5.816 R-Sq = 91.2% R-Sq(adj) = 88.3%
Interesting
12
The Cp criterion is concerned with the total MSE of the n fitted values.
Total error for any fitted value is a sum of bias and random error components
is the total error, where i is the “true” mean response of Y when X=Xi .
The bias is and the random error is
Then the total mean squared error is shown to be:
When the above is divided by the variance of the actual Y values i.e., by 2, then we get the criterion p
The estimator of p is what we shall use:Cp
iiY ˆ
iiYE }ˆ{ }ˆ{ˆii YEY
}]ˆ{}ˆ{[ 2
1
2
i
n
iii YYE
All Possible Regressions: Variable Reduction
13
Choose a model with small Cp
Cp should be as close as possible to p. When all variables are included then obviously Cp = p (=P)
If the model has very little bias then in that case
and E(Cp) ≈ p
When we plot a line through the origin at 45o and plot the (p,Cp) points, then for models with little bias, the points will fall almost on the straight line, for models with substantial bias, the points will fall much above the line, and if the points fall below the then such models have no bias but just some random sampling error.
)2(),,( 11
pnXXMSE
SSEC
P
pp
iiYE )ˆ(
All Possible Regressions: Variable Reduction
14
The PRESSp criterion :
is the predicted value of when the ith observation is not in the dataset.
Choose models with small values of PRESSp .
It may seem that one will have to run “n” separate regressions in order to calculate PRESSp . Not so, as we will see later.
n
iiiip YYPRESS
1
2)( )ˆ(
)(ˆ
iiY iY
All Possible Regressions: Variable Reduction
15
Best Subsets Algorithm: Best subsets (a limited number) are identified according to pre-specified criteria. Require much less computational effort than when evaluating all possible subsets. Provide “good” subsets along with best, which is quite useful.When pool of X variables is large, then this algorithm can run out of steam. What then? We will see in the ensuing discussion.
Best Subsets
16
Best Subsets Regression (Note: “s” is the square root of MSEp)
Response variable is Y
Adj.
Vars R-Sq R-Sq C-p s X1 X2 X3
1 95.3 95.3 11.9 1.8656 X
1 94.7 94.7 30.8 1.9801 X
2 95.6 95.5 4.2 1.8101 X X
2 95.3 95.2 13.8 1.8718 X X
3 95.7 95.6 4.0 1.8023 X X X
Response variable is Y
Adj.
Vars R-Sq R-Sq C-p s X1 X2 X3 X4
1 95.3 95.3 13.4 1.8656 X
1 94.7 94.7 32.4 1.9801 X
2 95.6 95.5 5.6 1.8101 X X
2 95.5 95.4 9.8 1.8374 X X
3 95.7 95.6 3.9 1.7927 X X X
3 95.7 95.6 5.3 1.8023 X X X
4 95.7 95.6 5.0 1.7936 X X X X
A Simple Example
17
Forward Stepwise Regression An iterative procedure Based on the partial F* or t* statistic one decides whether to add a variable or not. One variable at a time is considered. Before we see the actual algorithm here are some levers:
Minimum acceptable F to enter (FE)
Minimum acceptable F to remove (FR)
Minimum acceptable Tolerance (Tmin)
Maximum number of iterations (N) And here is the general form of the test statistic:
2*
}{)model in thealready XsOther ,(model in thealready XsOther |
k
k
k
kk bs
bXMSEXMSRF
18
Forward Stepwise Regression The procedure:1. Run a simple linear regression of all variables with the Y variable.
2. If none of the individual F values are larger than the cut-off FE value,
then stop. Else, enter the variable with the largest F.3. Now run the regression of remaining variables with Y given that the
variable entered in step 2 is already in the model.4. Repeat step 2. If a candidate is found, then check for tolerance. If tolerance (1-R2
k) is not larger than cut-off tolerance value Tmin , then
choose a different candidate. If none available, then terminate. Else, add the candidate variable.
5. Calculate the partial F for the variable entered in step 2 given that the variable entered in step 4 is already in the model. Check if this F is less
than FR. If so, then remove the variable entered in step 2.
Else keep it. Check if number of iterations is equal to N. If yes, terminate. If not, then proceed to step 6.
6. Check from results of step 1, which is the next candidate variable to enter. If number of iterations exceeded, then terminate
19
Other Stepwise Regression Procedures Backward Stepwise Regression
exact opposite of forward procedure.Sometimes preferred to forward stepwise.Think about how this procedure would work why, or under which conditions you would use it instead of forward stepwise ?
Forward SelectionSimilar to forward stepwise; except that the variable dropping part is not present
Backward EliminationSimilar to backward stepwise; except that the variable adding part is not present
20
An Example
Let us go through the example (Fig. 9.7) on page 366 of KNN.
21
Some other Selection Criteria
Akaike Information Criteria (AIC)– Impose a penalty for adding regressors– AIC = e2p/n SSEp /n , where 2p/n is the penalty factor
– Harsher penalty than Ra2 (How?)
– Model with lowest AIC is preferred– AIC used for in-sample and out-of-sample forecasting
performance measurement– Useful for nested and non-nested mode and for
determining lag-length in autoregressive models (Ch12)
22
Some other Selection Criteria
Schwarz Information Criteria (SIC)– SIC = np/n SSEp /n– Similar to AIC– Imposes stricter penalty than AIC– Has similar advantages as AIC
23
Model Validation Checking the prediction ability of the model.
Methods for the model validation;
1. Collection of new data;
- We select a new sample with the same variables of dimension ;
- Compute the mean squared prediction error:
2. Comparison of results with theoretical expectations;
3. Data splitting in two data sets: model building and validation.
*1
2*
)ˆ(
n
YYMSPR
n
iii