prediction-based model selection in pls-pm
Post on 22-Jan-2018
120 Views
Preview:
TRANSCRIPT
Prediction-oriented Model Selection in PLS-PM
Pratyush Nidhi Sharma, University of Delaware
Galit Shmueli*, National Tsing Hua University
Marko Sarstedt, Otto-van-Guericke-University Magdeburg
Nicholas Danks, National Tsing Hua University
Soumya Ray, National Tsing Hua University
Goal of Study
• PLS: an “exploratory” yet causal-predictive technique. Role of model comparisons
is highlighted.
• Prediction requires holdout sample: often expensive and impractical.
• R2 and related in-sample criteria often (incorrectly) considered predictive
measures.
• Information theoretic criteria designed as in-sample predictive measures.
• We asked: Can in-sample criteria substitute for out-of-sample predictive
criteria? If so, in which conditions?
Information theoretic criteria
AIC = −2𝑙𝑜𝑔 𝐿 + 2𝑝𝑘 AIC = 𝑛 𝑙𝑜𝑔𝑆𝑆𝑒𝑟𝑟𝑜𝑟𝑘
𝑛+
2𝑝𝑘
𝑛
BIC = −2𝑙𝑜𝑔 𝐿 + 𝑝𝑘𝑙𝑜𝑔(𝑛) BIC = 𝑛 𝑙𝑜𝑔𝑆𝑆𝑒𝑟𝑟𝑜𝑟𝑘
𝑛+
𝑝𝑘𝑙𝑜𝑔(𝑛)
𝑛
HQ = −2𝑙𝑜𝑔 𝐿 + 2𝑝𝑘𝑙𝑜𝑔(𝑙𝑜𝑔 𝑛 ) HQ = 𝑛 𝑙𝑜𝑔𝑆𝑆𝑒𝑟𝑟𝑜𝑟𝑘
𝑛+
2𝑝𝑘𝑙𝑜𝑔(𝑙𝑜𝑔 𝑛 )
𝑛
SSerror(k) = sum of squared errors for kth model in a set of models
pk = number of coefficients in the kth model plus 1
• Well-developed for model comparison in parametric models
• Typically calculated using log-likelihood
• Under a normal error distribution assumption, the likelihood-based formulas can be
written in terms of SSerror (Burnham & Anderson, 2002; p.63; McQuarrie & Tsai, 1998):
Predictive model selection: Two lenses
1. Prediction only (P):
• Focus only on comparing the predictive accuracy of models (Gregor, 2006).
• Limited or no role of theory (no causal explanation).
• Select the model with best out-of-sample predictive accuracy.
• Out-of-sample criteria (e.g. RMSE) are the gold standard for judging.
• Exemplar technique: ANNs
• We ask: Can (& which) in-sample criteria be used (in place of RMSE)?
2. Explanation with Prediction (EP):
• Focus on balancing causal explanation and prediction (Gregor, 2006).
• Prominent role of theory (causal explanation is foremost).
• Requires trade-off in predictive power to accommodate explanatory power.
• Exemplar technique: PLS (“causal-predictive” (Jöreskog and Wold, 1982)).
• We ask: Can (& which) in-sample criteria be used?
Study Design: Eight Competing Models
Experimental Design
Simulate composite data using SEGIRLS package (Ringle et al. 2014) :
● 6 sample sizes (50, 100, 150, 200, 250, and 500)
● 5 effect sizes on structural path ξ2 η1 (0.1, 0.2, 0.3, 0.4, and 0.5)
● 3 factor loading patterns (AVEs):
o High AVE with loadings: (0.9, 0.9, 0.9)
o Moderate AVE with loadings: (0.8, 0.8, 0.8)
o Low AVE with loadings: (0.7, 0.7, 0.7)
200 replications for each of the 90 (6 x 5 x 3) conditions (18,000 runs)
Generate Predictions using PLSpredict (Shmueli et al. 2016)
Measure Outcomes:
PLS criteria: R2, Adjusted R2, Q2, GoF.
IT criteria: FPE, Cp, AIC, AICu, AICc, BIC, GM, HQ, HQc.
Out-of-sample criteria: RMSE, MAD, MAPE, SMAPE.
Procedure for assessing predictive model selection performance
Step # Details
1 Generate training & holdout data from data generating model (Model 5).
2 Estimate all 8 competing PLS models on the training data.
3 Compute the in-sample criteria for all 8 competing models using the training data.
4 Predict holdout items and compute out-of-sample criteria for all 8 competing models using
PLSPredict (Shmueli et al.’s 2016).
5 Compare the best model selected by each in-sample criterion to the RMSE-selected model.
Benchmarking: Which models are being selected by various criteria?
Overall proportion of model choice by each criterion (across all conditions)
Model # 1 2 3 4 5 6 7 8
PLS Criteria
R2 0.000 0.273 0.000 0.003 0.019 0.000 0.695 0.009
Adjusted R2 0.000 0.537 0.000 0.005 0.074 0.000 0.303 0.081
GoF 0.000 0.001 0.000 0.000 0.037 0.000 0.962 0.000
Q2 0.003 0.305 0.000 0.004 0.224 0.002 0.179 0.281
Information
Theoretic
Criteria
FPE 0.000 0.638 0.000 0.006 0.091 0.000 0.163 0.101
CP 0.000 0.686 0.000 0.006 0.100 0.001 0.096 0.111
GM 0.000 0.743 0.000 0.006 0.109 0.007 0.011 0.123
AIC 0.000 0.638 0.000 0.006 0.091 0.000 0.164 0.101
AICu 0.000 0.688 0.000 0.006 0.099 0.002 0.093 0.112
AICc 0.000 0.649 0.000 0.006 0.093 0.001 0.146 0.104
BIC 0.000 0.731 0.000 0.006 0.107 0.005 0.032 0.120
HQ 0.000 0.695 0.000 0.006 0.100 0.001 0.085 0.112
HQc 0.000 0.705 0.000 0.006 0.102 0.002 0.070 0.114
Out of Sample
Criteria
MAD 0.000 0.351 0.000 0.000 0.183 0.000 0.236 0.229
RMSE 0.000 0.365 0.000 0.000 0.186 0.000 0.218 0.230
MAPE 0.094 0.044 0.247 0.076 0.044 0.347 0.090 0.058
SMAPE 0.000 0.365 0.000 0.000 0.123 0.000 0.343 0.168
Summary: R2 and GoF overwhelmingly select saturated model 7. Adjusted R2 prefers model 2.
IT criteria select correctly-specified but parsimonious model 2 & avoid model 7.
RMSE, MAD, SMAPE, and Q2 select among models 2, 5, 7, and 8.
Exception: MAPE selects incorrect models (1, 3, 4, 6).
Assessing the performance in the P lens
Can (& which) in-sample criteria help select the best predictive model?
(regardless of correct specification)
Prediction-only (P) lens
Percentage agreement with RMSE (across all conditions)Model # 1 2 3 4 5 6 7 8 Success Rate
PLS Criteria
R2 0.000 0.092 0.000 0.000 0.003 0.000 0.128 0.001 0.224
Adjusted
R2 0.000 0.183 0.000 0.000 0.011 0.000 0.031 0.014 0.238
GoF 0.000 0.000 0.000 0.000 0.006 0.000 0.207 0.000 0.213
Q2 0.000 0.101 0.000 0.000 0.034 0.000 0.018 0.054 0.207
Information
Theoretic
Criteria
FPE 0.000 0.223 0.000 0.000 0.013 0.000 0.011 0.018 0.266
CP 0.000 0.244 0.000 0.000 0.015 0.000 0.006 0.021 0.285
GM 0.000 0.267 0.000 0.000 0.016 0.000 0.000 0.024 0.308
AIC 0.000 0.223 0.000 0.000 0.013 0.000 0.011 0.018 0.266
AICu 0.000 0.244 0.000 0.000 0.015 0.000 0.005 0.022 0.285
AICc 0.000 0.229 0.000 0.000 0.014 0.000 0.011 0.019 0.272
BIC 0.000 0.263 0.000 0.000 0.016 0.000 0.001 0.023 0.303
HQ 0.000 0.247 0.000 0.000 0.015 0.000 0.003 0.022 0.287
HQc 0.000 0.252 0.000 0.000 0.015 0.000 0.003 0.022 0.292
Summary: Success Rates (agreement with RMSE over specific model) too low!
None of the in-sample criteria can help when using the P lens.
Using RMSE (& holdout) cannot be avoided when using the P lens.
Assessing the performance in the EP lens
Can (& which) in-sample criteria help select a correctly specified (w.r.t. η2) but highly predictive model?
Study Design: Eight Competing Models
Explanation-Prediction (EP) lens
Percentage agreement with RMSE by model type (across all conditions)
Model Type
Correctly Specified
(Model 2 or 5 or 8)
Incorrectly Specified
(Model 1 or 3 or 4 or 6) Saturated (Model 7)
PLS Criteria
R2 0.211 0.000 0.128
Adjusted R2 0.504 0.000 0.031
GoF 0.026 0.000 0.207
Q2 0.611 0.000 0.018
Information
Theoretic
Criteria
FPE 0.623 0.000 0.011
CP 0.684 0.000 0.006
GM 0.757 0.000 0.000
AIC 0.623 0.000 0.011
AICu 0.685 0.000 0.005
AICc 0.639 0.000 0.011
BIC 0.740 0.000 0.001
HQ 0.692 0.000 0.003
HQc 0.705 0.000 0.003
Summary: Overall, IT criteria offer significant improvement over PLS criteria.
None of the PLS criteria provide comparable performance.
BIC & GM are best in-sample candidates when using EP lens.
How do experimental conditions affect model selection in the EP lens?
Impact of sample size: (EP) lensPercentage agreement with RMSE on correctly specified model set by Sample Size
Criterion 50 100 150 200 250 500 Pattern
PLS Criteria
R2 0.266 0.212 0.226 0.199 0.201 0.162
Adjusted R2 0.589 0.544 0.528 0.479 0.477 0.409
GoF 0.044 0.028 0.022 0.018 0.024 0.020
Q2 0.685 0.663 0.636 0.599 0.583 0.497
Information
Theoretic Criteria
FPE 0.704 0.676 0.661 0.605 0.591 0.504
Cp 0.761 0.742 0.720 0.663 0.653 0.564
GM 0.792 0.822 0.788 0.750 0.736 0.655
AIC 0.702 0.675 0.659 0.605 0.591 0.504
AICu 0.755 0.743 0.721 0.669 0.656 0.566
AICc 0.737 0.697 0.675 0.612 0.603 0.509
BIC 0.773 0.799 0.771 0.731 0.720 0.645
HQ 0.742 0.743 0.726 0.682 0.674 0.589
HQc 0.765 0.765 0.737 0.689 0.679 0.593
Summary: Agreement decreases with increase in sample sizes for all cases.
PLS criteria (including Q2) show lower rates of agreement than all IT criteria.
BIC & GM “peak” (~80%) at sample sizes 50-150, precisely when holdout is impractical
Impact of effect size: (EP) lensPercentage agreement with RMSE on correctly specified model set by Effect Size (ξ2 η1)
Criterion 0.1 0.2 0.3 0.4 0.5 Pattern
PLS Criteria
R2 0.148 0.182 0.220 0.239 0.265
Adjusted R2 0.458 0.494 0.509 0.519 0.541
GoF 0.024 0.026 0.024 0.025 0.032
Q2 0.589 0.603 0.616 0.620 0.624
Information
Theoretic Criteria
FPE 0.587 0.611 0.630 0.637 0.652
Cp 0.653 0.677 0.689 0.697 0.703
GM 0.733 0.746 0.764 0.767 0.775
AIC 0.586 0.610 0.630 0.636 0.652
AICu 0.652 0.678 0.688 0.700 0.708
AICc 0.603 0.627 0.646 0.651 0.666
BIC 0.714 0.727 0.747 0.751 0.760
HQ 0.663 0.684 0.696 0.706 0.713
HQc 0.673 0.695 0.708 0.722 0.728
Summary: Agreement increases with increase in effect size (signal strength).
PLS criteria (including Q2) show lower rates of agreement than all IT criteria.
Impact of item loadings: (EP) lensPercentage agreement with RMSE on correctly specified model set by Loading Values (AVE)
Criterion 0.7 0.8 0.9 Pattern
PLS Criteria
R2 0.264 0.218 0.152
Adjusted R2 0.504 0.510 0.499
GoF 0.038 0.026 0.014
Q2 0.603 0.610 0.618
Information Theoretic
Criteria
FPE 0.606 0.626 0.639
Cp 0.648 0.688 0.716
GM 0.726 0.762 0.784
AIC 0.605 0.625 0.639
AICu 0.658 0.689 0.708
AICc 0.619 0.641 0.656
BIC 0.708 0.744 0.767
HQ 0.666 0.696 0.716
HQc 0.678 0.708 0.729
Summary: R2, Adj-R2, GoF decrease in agreement as AVE increases (start preferring model 7)
Q2 improves with an increase in AVE; however it is inferior to BIC and GM.
IT criteria improve with AVE; BIC & GM show best performance.
Summary
• PLS: an “exploratory” yet causal-predictive technique: Role of model comparisons.
• Prediction requires holdout sample: often expensive and impractical.
• We asked: Can in-sample criteria substitute for out-of-sample criteria? If so, when?
• Prediction only (P): None of the in-sample criteria are useful substitutes. Use of holdout
sample cannot be avoided. RMSE & MAD behave per expectation. MAPE not
recommended.
• Explanation-Prediction (EP): Most relevant for PLS. IT criteria (BIC and GM) suitable
substitutes for RMSE. PLS criteria (R2, Adjusted R2, GoF, Q2) not recommended.
• Best conditions to use BIC and GM as substitutes for out-of-sample criteria:
• Sample size between 50-150: precisely where holdout sample is impractical!
• High factor loadings (AVE): reliable & valid instruments.
• Higher expected effect sizes: relevant theory-backed constructs.
Robustness check!
What if the data generation model is not included in the
competing model set-up?
We introduce: Data generating Model X with hidden variable ξ4. Model X is out of reach!
• Results almost perfectly mimic the earlier (main) results.
• Conclusion: BIC & GM provide best predictive model selection ability regardless
of whether data generation is included or excluded (out of reach)!
• PLS criteria (R2, Adjusted R2, GoF, Q2) are not recommended.
Thank you!
top related