mlbbaseballregressionsample

19
MLB Linear and Robust Modeling Sample Dan Tetrick Friday, May 01, 2015 This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: Section 1: Set Up Function, Environment, and Data A. Load in All Packages Functions source(~/R/R Code/PackagesAndFunctions.R) PackagesAndFunctions(All = T) B. Create BatModelData from df with Data Scrubber BatModelData <- read.csv("~/Baseball Regression/Final Batting Data Full.csv", stringsAsFactors=FALSE) BatModelData <- Data.Scrubber(BatModelData) C. Remove Pitchers from BatModelData BatModelData<-BatModelData[which(BatModelData$POSPlayedMost!="P"),] D. Remove All Batters with less than 50 Plate Appearances in Any Year BatModelData<-BatModelData[which(BatModelData$PA>=50),] E. Remove All NA’s in Team Statistics BatModelData<-BatModelData[which(!is.na(BatModelData$W)),] 1

Upload: dan-tetrick

Post on 19-Aug-2015

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MLBBaseballRegressionSample

MLB Linear and Robust Modeling SampleDan Tetrick

Friday, May 01, 2015

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, andMS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as theoutput of any embedded R code chunks within the document. You can embed an R code chunk like this:

Section 1: Set Up Function, Environment, and Data

A. Load in All Packages Functions

source('~/R/R Code/PackagesAndFunctions.R')PackagesAndFunctions(All = T)

B. Create BatModelData from df with Data Scrubber

BatModelData <- read.csv("~/Baseball Regression/Final Batting Data Full.csv", stringsAsFactors=FALSE)BatModelData <- Data.Scrubber(BatModelData)

C. Remove Pitchers from BatModelData

BatModelData<-BatModelData[which(BatModelData$POSPlayedMost!="P"),]

D. Remove All Batters with less than 50 Plate Appearances in AnyYear

BatModelData<-BatModelData[which(BatModelData$PA>=50),]

E. Remove All NA’s in Team Statistics

BatModelData<-BatModelData[which(!is.na(BatModelData$W)),]

1

Page 2: MLBBaseballRegressionSample

F. Remove All NA’s and 0’s from Salary Variable

BatModelData<-BatModelData[which(!is.na(BatModelData$Salary)),]BatModelData<-BatModelData[which(BatModelData$Salary!=0),]

G. Create a Summary Table of the Overall Model Data

summary(BatModelData)

Section 2: Summary Analysis, Plots, Histograms, Individual Regressions

A. Create a df Containing All of the Bat Model Data to be Used,Which Can Be Factored

BatModelFactors<-BatModelData[,c("teamIDFirst","yearIDBat","bats","throws","USA","POSPlayedMost","UtilityPOS","College","TotalAllStarSelected","TotalGoldGlove","TotalMostValuablePlayer","TotalRookieoftheYear","TotalSilverSlugger")]

BatModelFactors<-as.data.frame(BatModelFactors)

B. Create a df Containing All of the Continuous Bat Model Datato be Used

BatContinuous<-BatModelData[,c("Salary","weight","height","YearsExperience","HRBat","RBI","SBAttempts","BBBat","SOBat","OBPBat","SLGBat","RunsCreated","FieldingPCT","W","L","BPF","PPF","BodyMassIndex","RBat","HBat","IBB","HBPBat","SH","SFBat","GIDP","TotalMultiTeams","AVGBat","TBBat","OPSBat", "TotalRunsProduced","EBat","Age")]

C. Rebind the Bat Model Factor and Coninuous Variable dfs

BatModel<-as.data.frame(cbind(BatModelFactors,BatContinuous),stringsAsFactors=T)

D. Make the Awards and All-Star Variables Binary

BatModel$TotalAllStarSelected<-ifelse(BatModel$TotalAllStarSelected!=0,1,0)BatModel$TotalGoldGlove<-ifelse(BatModel$TotalGoldGlove!=0,1,0)BatModel$TotalMostValuablePlayer<-ifelse(BatModel$TotalMostValuablePlayer!=0,1,0)BatModel$TotalSilverSlugger<-ifelse(BatModel$TotalSilverSlugger!=0,1,0)

2

Page 3: MLBBaseballRegressionSample

E. Create a Summary Table of the Bat Model Data to be Used inthe Modeling Processes

(SummaryBatModelData<-summary(BatModel))

F. Create a simple Plot of all Continuous Variables in the BatModel Data

Plotter(df=BatContinuous,PlotterPath, PlotTitle = "MLB Bat Model Data Set Plot")

G. Create a Simple Histogram of All Coninuous Variables

Histograms(df = BatContinuous,HistogramPath,Title = "BatContinous Histograms")

H. Create a Scatterplot of Salary Regressed on Model Variables

SimpleScatter(df=BatContinuous,"Salary")

Section 3: Create a Linear Model Based on Entirety of Data

A. Create Bat Model Specs

BatModSpecs <<- paste0("log(Salary) ~ teamIDFirst + yearIDBat + bats + throws + POSPlayedMost +","UtilityPOS + College + height + YearsExperience + W + ","BPF + TotalMultiTeams + RunsCreated + IBB +","TotalAllStarSelected + TotalGoldGlove + ","TotalRookieoftheYear" )

B. Create Linear Model for Initial Analysis

LinearModel<-lm(as.formula(BatModSpecs),BatModel)

C. Summarize Results

3

Page 4: MLBBaseballRegressionSample

SummaryLinearModel<-summary(LinearModel)

D. Perform ANOVA

(anova(LinearModel))

## Analysis of Variance Table#### Response: log(Salary)## Df Sum Sq Mean Sq F value Pr(>F)## teamIDFirst 34 1128.2 33.2 56.0286 < 0.00000000000000022## yearIDBat 1 4020.8 4020.8 6789.2926 < 0.00000000000000022## bats 2 3.4 1.7 2.8949 0.05534## throws 1 16.1 16.1 27.1872 0.0000001875186932552## POSPlayedMost 8 362.1 45.3 76.4253 < 0.00000000000000022## UtilityPOS 1 1620.5 1620.5 2736.2784 < 0.00000000000000022## College 1 39.8 39.8 67.2073 0.0000000000000002673## height 1 11.7 11.7 19.7012 0.0000091290701096658## YearsExperience 1 7415.8 7415.8 12521.8488 < 0.00000000000000022## W 1 40.4 40.4 68.1410 < 0.00000000000000022## BPF 1 4.2 4.2 7.1300 0.00759## TotalMultiTeams 1 188.8 188.8 318.7442 < 0.00000000000000022## RunsCreated 1 2770.1 2770.1 4677.4347 < 0.00000000000000022## IBB 1 52.8 52.8 89.1738 < 0.00000000000000022## TotalAllStarSelected 1 26.6 26.6 44.8479 0.0000000000221728398## TotalGoldGlove 1 14.4 14.4 24.3445 0.0000008155481121597## TotalRookieoftheYear 1 14.4 14.4 24.2804 0.0000008431199481025## Residuals 12959 7674.7 0.6#### teamIDFirst ***## yearIDBat ***## bats .## throws ***## POSPlayedMost ***## UtilityPOS ***## College ***## height ***## YearsExperience ***## W ***## BPF **## TotalMultiTeams ***## RunsCreated ***## IBB ***## TotalAllStarSelected ***## TotalGoldGlove ***## TotalRookieoftheYear ***## Residuals## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4

Page 5: MLBBaseballRegressionSample

E. Identify Potential Outliers

(outlierTest(LinearModel))

## rstudent unadjusted p-value Bonferonni p## 26179 -5.032587 0.00000049039 0.0063839## 7624 -4.744933 0.00000210800 0.0274420

F. Identify MulitCollinearity Issues

(vif(LinearModel))

## GVIF Df GVIF^(1/(2*Df))## teamIDFirst 3.861708 34 1.020068## yearIDBat 1.205054 1 1.097750## bats 1.642685 2 1.132110## throws 1.624370 1 1.274508## POSPlayedMost 2.065586 8 1.046382## UtilityPOS 1.234240 1 1.110963## College 1.065733 1 1.032343## height 1.278291 1 1.130615## YearsExperience 1.347075 1 1.160636## W 1.201951 1 1.096335## BPF 2.422080 1 1.556303## TotalMultiTeams 1.253269 1 1.119495## RunsCreated 2.134285 1 1.460919## IBB 1.842983 1 1.357565## TotalAllStarSelected 1.540961 1 1.241355## TotalGoldGlove 1.264041 1 1.124296## TotalRookieoftheYear 1.104717 1 1.051055

sqrt(vif(LinearModel)) > 2

## GVIF Df GVIF^(1/(2*Df))## teamIDFirst FALSE TRUE FALSE## yearIDBat FALSE FALSE FALSE## bats FALSE FALSE FALSE## throws FALSE FALSE FALSE## POSPlayedMost FALSE TRUE FALSE## UtilityPOS FALSE FALSE FALSE## College FALSE FALSE FALSE## height FALSE FALSE FALSE## YearsExperience FALSE FALSE FALSE## W FALSE FALSE FALSE## BPF FALSE FALSE FALSE## TotalMultiTeams FALSE FALSE FALSE## RunsCreated FALSE FALSE FALSE## IBB FALSE FALSE FALSE

5

Page 6: MLBBaseballRegressionSample

## TotalAllStarSelected FALSE FALSE FALSE## TotalGoldGlove FALSE FALSE FALSE## TotalRookieoftheYear FALSE FALSE FALSE

G. Plot Normality of Residuals

sresid <- studres(LinearModel)hist(sresid, freq=FALSE,

main="Distribution of Studentized Residuals")xModel<-seq(min(sresid),max(sresid),length=40)yModel<-dnorm(xModel)lines(xModel, yModel)

H. Evaluate homoscedasticity non-constant error variance test

(ncvTest(LinearModel))

## Non-constant Variance Score Test## Variance formula: ~ fitted.values## Chisquare = 1245.702 Df = 1 p = 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007132171

I. Test for Autocorrelated Errors

(durbinWatsonTest(LinearModel))

## lag Autocorrelation D-W Statistic p-value## 1 0.455044 1.089884 0## Alternative hypothesis: rho != 0

J. More Diagnostic Plots

plot(LinearModel)

6

Page 7: MLBBaseballRegressionSample

12 14 16 18 20

−4

−2

02

Fitted values

Res

idua

ls

lm(as.formula(BatModSpecs))

Residuals vs Fitted

26179762422217

−4 −2 0 2 4

−4

−2

02

4

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

lm(as.formula(BatModSpecs))

Normal Q−Q

26179762422217

7

Page 8: MLBBaseballRegressionSample

12 14 16 18 20

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

lm(as.formula(BatModSpecs))

Scale−Location26179 7624

22217

8

Page 9: MLBBaseballRegressionSample

0.00 0.02 0.04 0.06 0.08

−6

−4

−2

02

4

Leverage

Sta

ndar

dize

d re

sidu

als

lm(as.formula(BatModSpecs))

Cook's distance

Residuals vs Leverage

7624

762239085

# K. Global test of model assumptions

gvmodel <- gvlma(LinearModel)(summary(gvmodel))

#### Call:## lm(formula = as.formula(BatModSpecs), data = BatModel)#### Residuals:## Min 1Q Median 3Q Max## -3.8588 -0.4787 -0.0281 0.4639 3.0105#### Coefficients:## Estimate Std. Error t value## (Intercept) -127.9132212 1.7564528 -72.825## teamIDFirstARI -0.2146555 0.0847453 -2.533## teamIDFirstATL -0.1328926 0.0785968 -1.691## teamIDFirstBAL -0.0791251 0.0778904 -1.016## teamIDFirstBOS 0.0349063 0.0788499 0.443## teamIDFirstCAL -0.2362334 0.0888873 -2.658## teamIDFirstCHA -0.0901495 0.0786144 -1.147## teamIDFirstCHN -0.1134147 0.0787268 -1.441## teamIDFirstCIN -0.2416669 0.0782027 -3.090## teamIDFirstCLE -0.1927796 0.0784724 -2.457## teamIDFirstCOL -0.2649864 0.0894058 -2.964

9

Page 10: MLBBaseballRegressionSample

## teamIDFirstDET -0.1431190 0.0782229 -1.830## teamIDFirstFLO -0.3646289 0.0837314 -4.355## teamIDFirstHOU -0.2070188 0.0781166 -2.650## teamIDFirstKCA -0.2249127 0.0778122 -2.890## teamIDFirstLAA -0.0059988 0.0939017 -0.064## teamIDFirstLAN -0.0207224 0.0788527 -0.263## teamIDFirstMIA -0.4899488 0.1290776 -3.796## teamIDFirstMIL -0.1927897 0.0851475 -2.264## teamIDFirstMIN -0.1796819 0.0777329 -2.312## teamIDFirstML4 -0.1748943 0.0877154 -1.994## teamIDFirstMON -0.3166123 0.0821394 -3.855## teamIDFirstNYA 0.1187761 0.0786234 1.511## teamIDFirstNYN -0.0666162 0.0784213 -0.849## teamIDFirstOAK -0.1936208 0.0784794 -2.467## teamIDFirstPHI -0.1428800 0.0786145 -1.817## teamIDFirstPIT -0.2157641 0.0788839 -2.735## teamIDFirstSDN -0.2440672 0.0790886 -3.086## teamIDFirstSEA -0.0810341 0.0782881 -1.035## teamIDFirstSFN -0.1076044 0.0786374 -1.368## teamIDFirstSLN -0.0733263 0.0782593 -0.937## teamIDFirstTBA -0.3196683 0.0842131 -3.796## teamIDFirstTEX -0.2496555 0.0781528 -3.194## teamIDFirstTOR -0.0959604 0.0784763 -1.223## teamIDFirstWAS -0.2036729 0.0940338 -2.166## yearIDBat 0.0696856 0.0008577 81.244## batsL -0.0835461 0.0234209 -3.567## batsR -0.0180927 0.0197007 -0.918## throwsR 0.0331896 0.0241859 1.372## POSPlayedMost2B 0.0252328 0.0310816 0.812## POSPlayedMost3B -0.0187086 0.0293550 -0.637## POSPlayedMostC -0.1226876 0.0283942 -4.321## POSPlayedMostCF 0.1180692 0.0296855 3.977## POSPlayedMostDH -0.2293316 0.0501588 -4.572## POSPlayedMostLF 0.0725315 0.0275052 2.637## POSPlayedMostRF 0.1072989 0.0283600 3.783## POSPlayedMostSS 0.0652769 0.0311159 2.098## UtilityPOSUTIL -0.3094957 0.0154371 -20.049## College 0.0449063 0.0139296 3.224## height 0.0079875 0.0036149 2.210## YearsExperience 0.1623170 0.0017876 90.803## W -0.0009419 0.0006209 -1.517## BPF 0.0016842 0.0021848 0.771## TotalMultiTeams -0.1102374 0.0130506 -8.447## RunsCreated 0.0132807 0.0002950 45.015## IBB 0.0179900 0.0022128 8.130## TotalAllStarSelected 0.1655547 0.0285624 5.796## TotalGoldGlove 0.2067810 0.0366641 5.640## TotalRookieoftheYear -0.3132058 0.0635626 -4.928## Pr(>|t|)## (Intercept) < 0.0000000000000002 ***## teamIDFirstARI 0.011322 *## teamIDFirstATL 0.090896 .## teamIDFirstBAL 0.309719## teamIDFirstBOS 0.657995

10

Page 11: MLBBaseballRegressionSample

## teamIDFirstCAL 0.007878 **## teamIDFirstCHA 0.251514## teamIDFirstCHN 0.149719## teamIDFirstCIN 0.002004 **## teamIDFirstCLE 0.014037 *## teamIDFirstCOL 0.003044 **## teamIDFirstDET 0.067328 .## teamIDFirstFLO 0.000013424453105854 ***## teamIDFirstHOU 0.008056 **## teamIDFirstKCA 0.003853 **## teamIDFirstLAA 0.949063## teamIDFirstLAN 0.792709## teamIDFirstMIA 0.000148 ***## teamIDFirstMIL 0.023579 *## teamIDFirstMIN 0.020819 *## teamIDFirstML4 0.046186 *## teamIDFirstMON 0.000116 ***## teamIDFirstNYA 0.130890## teamIDFirstNYN 0.395638## teamIDFirstOAK 0.013632 *## teamIDFirstPHI 0.069167 .## teamIDFirstPIT 0.006243 **## teamIDFirstSDN 0.002033 **## teamIDFirstSEA 0.300653## teamIDFirstSFN 0.171223## teamIDFirstSLN 0.348793## teamIDFirstTBA 0.000148 ***## teamIDFirstTEX 0.001404 **## teamIDFirstTOR 0.221430## teamIDFirstWAS 0.030333 *## yearIDBat < 0.0000000000000002 ***## batsL 0.000362 ***## batsR 0.358438## throwsR 0.170002## POSPlayedMost2B 0.416908## POSPlayedMost3B 0.523925## POSPlayedMostC 0.000015657802835226 ***## POSPlayedMostCF 0.000070071023176836 ***## POSPlayedMostDH 0.000004872993258926 ***## POSPlayedMostLF 0.008374 **## POSPlayedMostRF 0.000155 ***## POSPlayedMostSS 0.035937 *## UtilityPOSUTIL < 0.0000000000000002 ***## College 0.001268 **## height 0.027151 *## YearsExperience < 0.0000000000000002 ***## W 0.129293## BPF 0.440802## TotalMultiTeams < 0.0000000000000002 ***## RunsCreated < 0.0000000000000002 ***## IBB 0.000000000000000468 ***## TotalAllStarSelected 0.000000006939486785 ***## TotalGoldGlove 0.000000017372801593 ***## TotalRookieoftheYear 0.000000843119948103 ***

11

Page 12: MLBBaseballRegressionSample

## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.7696 on 12959 degrees of freedom## Multiple R-squared: 0.6979, Adjusted R-squared: 0.6966## F-statistic: 516.2 on 58 and 12959 DF, p-value: < 0.00000000000000022###### ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:## Level of Significance = 0.05#### Call:## gvlma(x = LinearModel)#### Value p-value Decision## Global Stat 187.91347 0.000000000000 Assumptions NOT satisfied!## Skewness 1.03870 0.308123031028 Assumptions acceptable.## Kurtosis 151.06812 0.000000000000 Assumptions NOT satisfied!## Link Function 35.72387 0.000000002274 Assumptions NOT satisfied!## Heteroscedasticity 0.08277 0.773571576173 Assumptions acceptable.

## Value p-value## Global Stat 187.91346778 0.000000000000000## Skewness 1.03870421 0.308123031028374## Kurtosis 151.06811930 0.000000000000000## Link Function 35.72386937 0.000000002273612## Heteroscedasticity 0.08277491 0.773571576172987## Decision## Global Stat Assumptions NOT satisfied!## Skewness Assumptions acceptable.## Kurtosis Assumptions NOT satisfied!## Link Function Assumptions NOT satisfied!## Heteroscedasticity Assumptions acceptable.

Section 4: Create a Robust Model Using Entirety of Data

A. Create the Robust Model and View Summary

RobustModel<-rlm(formula = as.formula(BatModSpecs),data = BatModel)RobustSummaryModel<-summary(RobustModel)

B. Perform ANOVA

(anova(RobustModel))

## Analysis of Variance Table

12

Page 13: MLBBaseballRegressionSample

#### Response: log(Salary)## Df Sum Sq Mean Sq F value Pr(>F)## teamIDFirst 34 1099.2 32.3## yearIDBat 1 3762.5 3762.5## bats 2 4.2 2.1## throws 1 13.1 13.1## POSPlayedMost 8 353.2 44.2## UtilityPOS 1 1569.5 1569.5## College 1 44.0 44.0## height 1 9.4 9.4## YearsExperience 1 7714.3 7714.3## W 1 49.3 49.3## BPF 1 2.7 2.7## TotalMultiTeams 1 178.9 178.9## RunsCreated 1 2654.6 2654.6## IBB 1 60.6 60.6## TotalAllStarSelected 1 39.6 39.6## TotalGoldGlove 1 15.3 15.3## TotalRookieoftheYear 1 13.9 13.9## Residuals 7700.6

E. Identify Potential Outliers

(outlierTest(RobustModel))

## rstudent unadjusted p-value Bonferonni p## 26179 -5.203332 0.00000019874 0.0025871## 7624 -5.186704 0.00000021728 0.0028285

F. Identify MulitCollinearity Issues

(vif(RobustModel))

## GVIF Df GVIF^(1/(2*Df))## teamIDFirst 3.861708 34 1.020068## yearIDBat 1.205054 1 1.097750## bats 1.642685 2 1.132110## throws 1.624370 1 1.274508## POSPlayedMost 2.065586 8 1.046382## UtilityPOS 1.234240 1 1.110963## College 1.065733 1 1.032343## height 1.278291 1 1.130615## YearsExperience 1.347075 1 1.160636## W 1.201951 1 1.096335## BPF 2.422080 1 1.556303## TotalMultiTeams 1.253269 1 1.119495## RunsCreated 2.134285 1 1.460919

13

Page 14: MLBBaseballRegressionSample

## IBB 1.842983 1 1.357565## TotalAllStarSelected 1.540961 1 1.241355## TotalGoldGlove 1.264041 1 1.124296## TotalRookieoftheYear 1.104717 1 1.051055

(sqrt(vif(RobustModel)) > 2)

## GVIF Df GVIF^(1/(2*Df))## teamIDFirst FALSE TRUE FALSE## yearIDBat FALSE FALSE FALSE## bats FALSE FALSE FALSE## throws FALSE FALSE FALSE## POSPlayedMost FALSE TRUE FALSE## UtilityPOS FALSE FALSE FALSE## College FALSE FALSE FALSE## height FALSE FALSE FALSE## YearsExperience FALSE FALSE FALSE## W FALSE FALSE FALSE## BPF FALSE FALSE FALSE## TotalMultiTeams FALSE FALSE FALSE## RunsCreated FALSE FALSE FALSE## IBB FALSE FALSE FALSE## TotalAllStarSelected FALSE FALSE FALSE## TotalGoldGlove FALSE FALSE FALSE## TotalRookieoftheYear FALSE FALSE FALSE

G. Plot Normality of Residuals

sresid <- studres(RobustModel)hist(sresid, freq=FALSE,

main="Distribution of Studentized Residuals")xModel<-seq(min(sresid),max(sresid),length=40)yModel<-dnorm(xModel)lines(xModel, yModel)

H. Evaluate homoscedasticity non-constant error variance test

(ncvTest(RobustModel))

## Non-constant Variance Score Test## Variance formula: ~ fitted.values## Chisquare = 2197.184 Df = 1 p = 0

I. Test for Autocorrelated Errors

14

Page 15: MLBBaseballRegressionSample

(durbinWatsonTest(RobustModel))

## lag Autocorrelation D-W Statistic p-value## 1 0.4439713 1.11203 0## Alternative hypothesis: rho != 0

J. More Diagnostic Plots

(plot.lmRob(RobustModel))

15

Page 16: MLBBaseballRegressionSample

Residuals vs. Fitted Values

Fitted Values

Res

idua

ls

−4

−2

0

2

12 14 16 18 20

3156 1185 3722

RobustModel

Normal QQ Plot of Modified Residuals

Standard Normal Quantiles

Em

piric

al Q

uant

iles

of M

odifi

ed R

esid

uals

−4

−2

0

2

−4 −2 0 2 4

3156 1185 3722

RobustModel

16

Page 17: MLBBaseballRegressionSample

Scale−Location

Fitted Values

Mod

ified

Res

idua

ls

0.0

0.5

1.0

1.5

2.0

12 14 16 18 20

3156 1185 3722

RobustModel

17

Page 18: MLBBaseballRegressionSample

Modified Residuals vs. Leverage

Leverage

Mod

ified

Res

idua

ls

−4

−2

0

2

0.000 0.005 0.010 0.015 0.020 0.025

3156 1185 3722

RobustModel

## Call:## rlm(formula = as.formula(BatModSpecs), data = BatModel)## Converged in 6 iterations#### Coefficients:## (Intercept) teamIDFirstARI teamIDFirstATL## -128.2436598590 -0.2091091221 -0.1209566046## teamIDFirstBAL teamIDFirstBOS teamIDFirstCAL## -0.0898196200 0.0321250479 -0.2385107902## teamIDFirstCHA teamIDFirstCHN teamIDFirstCIN## -0.0662408849 -0.1168865090 -0.2352958910## teamIDFirstCLE teamIDFirstCOL teamIDFirstDET## -0.1788737920 -0.2498578845 -0.1737538577## teamIDFirstFLO teamIDFirstHOU teamIDFirstKCA## -0.3629384323 -0.2140456199 -0.2152045859## teamIDFirstLAA teamIDFirstLAN teamIDFirstMIA## -0.0115605236 -0.0292576292 -0.4866716336## teamIDFirstMIL teamIDFirstMIN teamIDFirstML4## -0.1955582629 -0.1764502116 -0.1937411671## teamIDFirstMON teamIDFirstNYA teamIDFirstNYN## -0.3097606940 0.0942372746 -0.0858051599## teamIDFirstOAK teamIDFirstPHI teamIDFirstPIT## -0.2159533180 -0.1560319741 -0.2166738066## teamIDFirstSDN teamIDFirstSEA teamIDFirstSFN## -0.2539300564 -0.0918286993 -0.1044125329## teamIDFirstSLN teamIDFirstTBA teamIDFirstTEX

18

Page 19: MLBBaseballRegressionSample

## -0.0931436058 -0.3169427376 -0.2440492661## teamIDFirstTOR teamIDFirstWAS yearIDBat## -0.0934758688 -0.1994119196 0.0699192303## batsL batsR throwsR## -0.0814708172 -0.0171295569 0.0362715591## POSPlayedMost2B POSPlayedMost3B POSPlayedMostC## 0.0281920605 -0.0243023161 -0.1282680333## POSPlayedMostCF POSPlayedMostDH POSPlayedMostLF## 0.1090328956 -0.2580475652 0.0763428729## POSPlayedMostRF POSPlayedMostSS UtilityPOSUTIL## 0.1074938032 0.0654719517 -0.2994100738## College height YearsExperience## 0.0406298717 0.0058110589 0.1697009085## W BPF TotalMultiTeams## -0.0002097312 0.0007611303 -0.1100364805## RunsCreated IBB TotalAllStarSelected## 0.0132393078 0.0207097167 0.2118631512## TotalGoldGlove TotalRookieoftheYear## 0.2210210758 -0.3239296289#### Degrees of freedom: 13018 total; 12959 residual## Scale estimate: 0.686

K. Global test of model assumptions

gvmodel <- gvlma(RobustModel)(summary(gvmodel))

From Here the Next Round of Modeling Would Occur To Correct the Issues from First

19