ch4- model assessment - github pages

5
CH4- Model Assessment Amin Shirazi 2/11/2020 Assessing models When modeling, it’s important to assess the (1) validity and (2) usefulness of your model. These are model assumptions and we need to check them before making any decisions. If these assumptions are violated, we need to go back and review things and maybe refit the data using other methods to have a valid fit on the data. To assess the validity of the model (i.e. to see if model assumptions are met), we will look to the residuals. If the fitted equation is the good one, the residuals will be: Ptternless (cloud like, random scatter) Centered at zero Bell shaped distribution To check if these three things hold, we will use two plotting methods. A residual plot is a plot of the residuals, e = y ˆ y vs. x (or ˆ y in the case of multiple regression, Section 4.2). 20 30 40 50 10 20 30 40 50 x y 4 2 0 2 4 10 20 30 40 50 x e 1

Upload: others

Post on 05-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CH4- Model Assessment - GitHub Pages

CH4- Model Assessment

Amin Shirazi

2/11/2020

Assessing modelsWhen modeling, it’s important to assess the (1) validity and (2) usefulness of your model. These are modelassumptions and we need to check them before making any decisions. If these assumptions are violated, weneed to go back and review things and maybe refit the data using other methods to have a valid fit on thedata.

To assess the validity of the model (i.e. to see if model assumptions are met), we will look to the residuals. Ifthe fitted equation is the good one, the residuals will be:

• Ptternless (cloud like, random scatter)• Centered at zero• Bell shaped distribution

To check if these three things hold, we will use two plotting methods.

A residual plot is a plot of the residuals, e = y ≠ y vs. x (or y in the case of multiple regression,Section 4.2).

20

30

40

50

10 20 30 40 50x

y

−4

−2

0

2

4

10 20 30 40 50x

e

1

Page 2: CH4- Model Assessment - GitHub Pages

0

50

100

150

10 20 30 40 50x

y

−10

−5

0

5

10

15

10 20 30 40 50x

e

25

50

75

10 20 30 40 50x

y

−20

0

20

10 20 30 40 50x

e

2

Page 3: CH4- Model Assessment - GitHub Pages
Page 4: CH4- Model Assessment - GitHub Pages

Normality of residuals

• In addition to the residual versus predicted plot, there are other residual plots we can use to checkregression assumptions.

• A histogram of residuals and a normal probability plot (QQ-plot) of residuals can be used toevaluate whether our residuals are approximately normally distributed.

– However, unless the residuals are far from normal or have an obvious pattern, we generally don’tneed to be overly concerned about normality.

• Note that we check the residuals for normality. We don’t need to check for normality of the raw data.Our response and predictor variables do not need to be normally distributed in order to fit a linearregression model.

How to check the Normality assumptions of residuals? We are already familiar with theoreticalNormal QQ-plots and histograms. So, in order to check normality assumptions of residuals, we need to see if(1) the histogram of the residuals are bell shaped and (2) check their normal QQ-plot to see if the residualslie around a straight line.

Draw a histogram of the residuals (review the JMP toturial for histograms)

It seems the residuals are not normaly distributed in this example.The residuals have a left skewed distirbution.

3

Page 5: CH4- Model Assessment - GitHub Pages

• Again, the QQ-plot also confirms that the assumption of Normal distribution of residuals is violated tosome extend in this example.

• More examination is required to fix the issue or to find the problem.

4