slides by john loucks & updated by spiros velianitis
DESCRIPTION
Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS. Chapter 14 Simple Linear Regression. Simple Linear Regression Model. Least Squares Method. Coefficient of Determination. Model Assumptions. Testing for Significance. Using the Estimated Regression Equation - PowerPoint PPT PresentationTRANSCRIPT
1 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Slides byJOHN
LOUCKS& Updated
bySPIROS
VELIANITIS
2 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Chapter 14Chapter 14 Simple Linear Regression Simple Linear Regression
Simple Linear Regression ModelSimple Linear Regression Model Least Squares MethodLeast Squares Method Coefficient of DeterminationCoefficient of Determination Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression Using the Estimated Regression
EquationEquation for Estimation and Predictionfor Estimation and Prediction Residual Analysis: Validating Model Residual Analysis: Validating Model AssumptionsAssumptions Outliers and Influential Outliers and Influential ObservationsObservations
3 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Simple Linear RegressionSimple Linear Regression
Regression analysisRegression analysis can be used to develop ancan be used to develop an equation showing how the variables are related.equation showing how the variables are related.
Managerial decisions often are based on theManagerial decisions often are based on the relationship between two or more variables.relationship between two or more variables.
The variables being used to predict the value of theThe variables being used to predict the value of the dependent variable are called the dependent variable are called the independentindependent variablesvariables and are denoted by and are denoted by xx..
Variation in a variable is explained by another variableVariation in a variable is explained by another variable..
The variable being predicted is called the The variable being predicted is called the dependentdependent variablevariable and is denoted by and is denoted by yy..
4 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Simple Linear RegressionSimple Linear Regression
The relationship between the two variables isThe relationship between the two variables is approximated by a straight line.approximated by a straight line.
Simple linear regressionSimple linear regression involves one independentinvolves one independent variable and one dependent variable.variable and one dependent variable.
Regression analysis involving two or more Regression analysis involving two or more independent variables is called independent variables is called multiple regressionmultiple regression..
5 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Simple Linear Regression ModelSimple Linear Regression Model
yy = = 00 + + 11xx + +
where:where:00 and and 11 are called are called parameters of the modelparameters of the model,, is a random variable called theis a random variable called the error termerror term..
The The simple linear regression modelsimple linear regression model is: is:
The equation that describes how The equation that describes how yy is related to is related to xx and and an error term is called the an error term is called the regression modelregression model..
6 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Simple Linear Regression EquationSimple Linear Regression Equation
The The simple linear regression equationsimple linear regression equation is: is:
• EE((yy) is the expected value of ) is the expected value of yy for a given for a given xx value. value.• 11 is the is the slope of the regression lineslope of the regression line..• 00 is the is the yy intercept of the regression line intercept of the regression line..• Graph of the regression equation is a straight line.Graph of the regression equation is a straight line.
EE((yy) = ) = 00 + + 11xx
7 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Simple Linear Regression EquationSimple Linear Regression Equation
Positive Linear RelationshipPositive Linear Relationship
EE((yy))
xx
Slope Slope 11is positiveis positive
Regression lineRegression line
InterceptIntercept00
8 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Simple Linear Regression EquationSimple Linear Regression Equation
Negative Linear RelationshipNegative Linear Relationship
EE((yy))
xx
Slope Slope 11is negativeis negative
Regression lineRegression lineInterceptIntercept00
9 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Simple Linear Regression EquationSimple Linear Regression Equation
No RelationshipNo Relationship
EE((yy))
xx
Slope Slope 11is 0is 0
Regression lineRegression lineInterceptIntercept
00
10 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation
The The estimated simple linear regression estimated simple linear regression equationequation
0 1y b b x
• is the estimated value of is the estimated value of yy for a given for a given xx value. value.y• bb11 is the slope of the line. is the slope of the line.• bb00 is the is the yy intercept of the line. intercept of the line.
• The graph is called the estimated regression line.The graph is called the estimated regression line.
11 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Estimation ProcessEstimation Process
Regression ModelRegression Modelyy = = 00 + + 11xx + +
Regression EquationRegression EquationEE((yy) = ) = 00 + + 11xx
Unknown ParametersUnknown Parameters00, , 11
Sample Data:Sample Data:x yx yxx11 y y11. .. . . .. . xxnn yynn
bb00 and and bb11provide estimates ofprovide estimates of
00 and and 11
EstimatedEstimatedRegression EquationRegression Equation
Sample StatisticsSample Statistics
bb00, , bb11
0 1y b b x
12 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Least Squares MethodLeast Squares Method The least squares method is a procedure for using sample data to find the estimated regression equationThe least squares method is a procedure for using sample data to find the estimated regression equation Least Squares CriterionLeast Squares Criterion
min (y yi i )2
where:where:yyii = = observedobserved value of the dependent variable value of the dependent variable for the for the iith observationth observation
^yyii = = estimatedestimated value of the dependent variable value of the dependent variable for the for the iith observationth observation
13 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Slope for the Estimated Regression Equation is Slope for the Estimated Regression Equation is calculated using Differential Calculus aid is:calculated using Differential Calculus aid is:
1 2( )( )
( )i i
i
x x y yb
x x
Least Squares MethodLeast Squares Method
where:where:xxii = value of independent variable for = value of independent variable for iithth observationobservation
__yy = mean value for dependent variable = mean value for dependent variable
__xx = mean value for independent variable = mean value for independent variable
yyii = value of dependent variable for = value of dependent variable for iithth observationobservation
14 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Reed Auto periodically has a special week-long sale. As part Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below.from a sample of 5 previous sales are shown below.
Simple Linear RegressionSimple Linear Regression Example: Reed Auto SalesExample: Reed Auto Sales
Number ofNumber of TV Ads (TV Ads (xx))
Number ofNumber ofCars Sold (Cars Sold (yy))
1133221133
14142424181817172727
xx = 10 = 10 yy = 100 = 1002x 20y
15 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Scatter Diagram and Trend LineScatter Diagram and Trend Line
y = 5x + 10
0
5
10
15
20
25
30
0 1 2 3 4TV Ads
Car
s So
ld
16 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Coefficient of DeterminationCoefficient of Determination How well does the estimated regression equation fit the data? The coefficient of determination provides a How well does the estimated regression equation fit the data? The coefficient of determination provides a
measure of goodness of fit for the estimated regression equation. SSE is the measure of goodness of fit for the estimated regression equation. SSE is the sum of squares due to error sum of squares due to error sums the sums the residualsresiduals or errors. or errors.
Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE
where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error
SST = SSR + SST = SSR + SSE SSE
The The coefficient of determinationcoefficient of determination is: is:
rr22 = SSR/SST = SSR/SST
17 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Coefficient of DeterminationCoefficient of Determination
rr22 = SSR/SST = 100/114 = .8772 = SSR/SST = 100/114 = .8772 The regression relationship is very strong; 87.7%The regression relationship is very strong; 87.7%of the variability in the number of cars sold can beof the variability in the number of cars sold can beexplained by the linear relationship between theexplained by the linear relationship between thenumber of TV ads and the number of cars sold.number of TV ads and the number of cars sold.
18 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Sample Correlation CoefficientSample Correlation Coefficient
21 ) of(sign rbrxy
ionDeterminat oft Coefficien ) of(sign 1brxy
where:where: bb11 = the slope of the estimated regression = the slope of the estimated regression equationequation xbby 10ˆ
The The correlation coefficient correlation coefficient is a descriptive measure of is a descriptive measure of the strength of a linear equation between two variables the strength of a linear equation between two variables x and y. Values of the correlation coefficient are always x and y. Values of the correlation coefficient are always between -1 (negative or inverse relation) and +1 between -1 (negative or inverse relation) and +1 (positive relation). Zero (0), or close to zero, indicates (positive relation). Zero (0), or close to zero, indicates no relationship.no relationship.
19 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
21 ) of(sign rbrxy
The sign of The sign of bb11 in the equation in the equation is “+”. is “+”.ˆ 10 5y x
=+ .8772xyr
Sample Correlation CoefficientSample Correlation Coefficient
rrxyxy = = +.9366 +.9366
20 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Testing for SignificanceTesting for Significance To test for a significant regression relationship, we must conduct aTo test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of hypothesis test to determine whether the value of 11 is zero is zero because if because if 11 is zero, we would conclude that the two variables is zero, we would conclude that the two variables are not related. Also, if are not related. Also, if 11 is not zero the two variables are related. is not zero the two variables are related.
Two tests are commonly used:Two tests are commonly used:
tt Test Test andand FF Test Test
Both the Both the tt test and test and FF test require an estimate of test require an estimate of 22, the variance , the variance of of in the regression model. in the regression model.
21 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
An Estimate of An Estimate of 22
Testing for SignificanceTesting for Significance
ss 22 = MSE = SSE/( = MSE = SSE/(n n 2) 2)
The mean square error (MSE) provides the estimateThe mean square error (MSE) provides the estimateof of 22, and the notation , and the notation ss22 is also used. is also used.
where:where: SSE = sum of squares due to errorSSE = sum of squares due to error
22 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
HypothesesHypotheses
Testing for Significance: Testing for Significance: tt Test Test
0 1: 0H
1: 0aH
Rejection RuleRejection Rule
where: where: tt is based on a is based on a tt distribution distributionwith with nn - 2 degrees of freedom - 2 degrees of freedom
Reject Reject HH00 if if pp-value -value << or or tt << - -ttor or tt >> tt
23 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
1. Determine the hypotheses.1. Determine the hypotheses.
2. Specify the level of significance.2. Specify the level of significance.
3. Select the test statistic.3. Select the test statistic.
= .05= .05
4. State the rejection rule.4. State the rejection rule.Reject Reject HH00 if if pp-value -value << .05 .05or |or |t|t| > 3.182 (with > 3.182 (with
3 degrees of freedom)3 degrees of freedom)
Testing for Significance: Testing for Significance: tt Test Test
0 1: 0H
1: 0aH
1
1
b
bts
24 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Testing for Significance: Testing for Significance: tt Test Test
5. Compute the value of the test statistic.5. Compute the value of the test statistic.
6. Determine whether to reject 6. Determine whether to reject HH00..tt = 4.541 provides an area of .01 in the upper = 4.541 provides an area of .01 in the uppertail. Hence, the tail. Hence, the pp-value is less than .02. (Also,-value is less than .02. (Also,tt = 4.63 > 3.182.) We can reject = 4.63 > 3.182.) We can reject HH00..
1
1 5 4.631.08b
bts
25 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Confidence Interval for Confidence Interval for 11
HH00 is rejected if the hypothesized value of is rejected if the hypothesized value of 11 is not is not included in the confidence interval for included in the confidence interval for 11..
We can use a 95% confidence interval for We can use a 95% confidence interval for 11 to test to test the hypotheses just used in the the hypotheses just used in the tt test. test.
26 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
The form of a confidence interval for The form of a confidence interval for 11 is: is:
Confidence Interval for Confidence Interval for 11
11 / 2 bb t s
wherewhere is the is the tt value providing an area value providing an areaof of /2 in the upper tail of a /2 in the upper tail of a tt distribution distributionwith with n n - 2 degrees of freedom- 2 degrees of freedom
2/tbb11 is the is the
pointpointestimatestimat
oror
is theis themarginmarginof errorof error
1/ 2 bt s
27 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Confidence Interval for Confidence Interval for 11
Reject Reject HH00 if 0 is not included in if 0 is not included inthe confidence interval for the confidence interval for 11..
0 is not included in the confidence interval. 0 is not included in the confidence interval. Reject Reject HH00
= 5 +/- 3.182(1.08) = 5 +/- 3.44= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb
or 1.56 to 8.44or 1.56 to 8.44
Rejection RuleRejection Rule
95% Confidence Interval for 95% Confidence Interval for 11
ConclusionConclusion
28 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
HypothesesHypotheses
Test StatisticTest Statistic
Testing for Significance: Testing for Significance: FF Test Test
FF = MSR/MSE = MSR/MSE
0 1: 0H
1: 0aH
29 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Rejection RuleRejection Rule
Testing for Significance: Testing for Significance: FF Test Test
where:where:FF is based on an is based on an FF distribution with distribution with1 degree of freedom in the numerator and1 degree of freedom in the numerator andnn - 2 degrees of freedom in the denominator - 2 degrees of freedom in the denominator
Reject Reject HH00 if if pp-value -value <<
or or FF >> FF
30 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
1. Determine the hypotheses.1. Determine the hypotheses.
2. Specify the level of significance.2. Specify the level of significance.
3. Select the test statistic.3. Select the test statistic.
= .05= .05
4. State the rejection rule.4. State the rejection rule.Reject Reject HH00 if if pp-value -value << .05 .05or or FF >> 10.13 (with 10.13 (with 1 d.f.1 d.f.
in numerator andin numerator and 3 d.f. in denominator)3 d.f. in denominator)
Testing for Significance: Testing for Significance: FF Test Test
0 1: 0H
1: 0aH
FF = MSR/MSE = MSR/MSE
31 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Testing for Significance: Testing for Significance: FF Test Test
5. Compute the value of the test statistic.5. Compute the value of the test statistic.
6. Determine whether to reject 6. Determine whether to reject HH00.. FF = 17.44 provides an area of .025 in = 17.44 provides an area of .025 in the upper tail. Thus, the the upper tail. Thus, the pp-value -value corresponding to corresponding to FF = 21.43 is less than = 21.43 is less than 2(.025) = .05. Hence, we reject 2(.025) = .05. Hence, we reject HH00..
FF = MSR/MSE = 100/4.667 = 21.43 = MSR/MSE = 100/4.667 = 21.43
The statistical evidence is sufficient to The statistical evidence is sufficient to concludeconcludethat we have a significant relationship that we have a significant relationship between thebetween thenumber of TV ads aired and the number of number of TV ads aired and the number of cars sold. cars sold.
32 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Some Cautions about theSome Cautions about theInterpretation of Significance TestsInterpretation of Significance Tests
Just because we are able to reject Just because we are able to reject HH00: : 11 = 0 and = 0 and demonstrate statistical significance does not enabledemonstrate statistical significance does not enable
us to conclude that there is a us to conclude that there is a linear relationshiplinear relationshipbetween between xx and and yy..
Rejecting Rejecting HH00: : 11 = 0 and concluding that = 0 and concluding that thethe
relationship between relationship between xx and and yy is significant is significant does does not enable us to conclude that a not enable us to conclude that a cause-cause-and-effectand-effect
relationshiprelationship is present between is present between xx and and yy..
33 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
If 3 TV ads are run prior to a sale, we If 3 TV ads are run prior to a sale, we expectexpectthe mean number of cars sold to be:the mean number of cars sold to be:
Point EstimationPoint Estimation
^yy = 10 + 5(3) = 25 cars = 10 + 5(3) = 25 cars
34 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
The 95% confidence interval estimate of the The 95% confidence interval estimate of the mean number of cars sold when 3 TV ads mean number of cars sold when 3 TV ads are run is:are run is:
Confidence Interval for Confidence Interval for EE((yypp))
25 25 ++ 4.61 4.61
/ y t sp yp 2
25 25 ++ 3.1824(1.4491) 3.1824(1.4491)
20.39 to 29.61 cars20.39 to 29.61 cars
35 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
The 95% prediction interval estimate of the The 95% prediction interval estimate of the number of cars sold in one particular week number of cars sold in one particular week when 3 TV ads are run is:when 3 TV ads are run is:
Prediction Interval for Prediction Interval for yypp
25 25 ++ 8.28 8.2825 25 ++ 3.1824(2.6013) 3.1824(2.6013)
/ 2 indpy t s
16.72 to 33.28 cars16.72 to 33.28 cars
36 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Residual AnalysisResidual Analysis
ˆi iy y
Much of the residual analysis is based on anMuch of the residual analysis is based on an examination of graphical plots.examination of graphical plots.
Residual for Observation Residual for Observation ii The residuals provide the best information about The residuals provide the best information about ..
If the assumptions about the error term If the assumptions about the error term appear appear questionable, the hypothesis tests about thequestionable, the hypothesis tests about the significance of the regression relationship and thesignificance of the regression relationship and the interval estimation results may not be valid.interval estimation results may not be valid.
37 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Residual Plot Against Residual Plot Against xx If the assumption that the variance of If the assumption that the variance of is the is the
same for all values of same for all values of x x is valid, and the is valid, and the assumed regression model is an adequate assumed regression model is an adequate representation of the relationship between the representation of the relationship between the variables, thenvariables, then
The residual plot should give an overallThe residual plot should give an overall impression of a horizontal band of pointsimpression of a horizontal band of points
38 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
xx
ˆy y
00
Good PatternGood PatternRe
sidua
lRe
sidua
l
Residual Plot Against Residual Plot Against xx
39 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Residual Plot Against Residual Plot Against xx
xx
ˆy y
00
Resid
ual
Resid
ual
Nonconstant VarianceNonconstant Variance
40 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Residual Plot Against Residual Plot Against xx
xx
ˆy y
00
Resid
ual
Resid
ual
Model Form Not AdequateModel Form Not Adequate
41 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
ResidualsResiduals
Residual Plot Against Residual Plot Against xx
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
42 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Residual Plot Against Residual Plot Against xx
TV Ads Residual Plot
-3
-2
-1
0
1
2
3
0 1 2 3 4TV Ads
Resi
dual
s
43 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Standardized Residual PlotStandardized Residual Plot
The standardized residual plot can provide The standardized residual plot can provide insight about the assumption that the error insight about the assumption that the error term term has a normal distribution. has a normal distribution.
If this assumption is satisfied, the distribution If this assumption is satisfied, the distribution of the standardized residuals should appear to of the standardized residuals should appear to come from a standard normal probability come from a standard normal probability distribution.distribution.
44 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Standardized ResidualsStandardized Residuals
Standardized Residual PlotStandardized Residual Plot
Observation Predicted Y Residuals Standard Residuals1 15 -1 -0.5352 25 -1 -0.5353 20 -2 -1.0694 15 2 1.0695 25 2 1.069
45 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Standardized Residual Standardized Residual PlotPlot
Standardized Residual PlotStandardized Residual Plot
A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y ResidualsStandard Residuals32 1 15 -1 -0.53452233 2 25 -1 -0.53452234 3 20 -2 -1.06904535 4 15 2 1.06904536 5 25 2 1.06904537
-1.5
-1
-0.5
0
0.5
1
1.5
0 10 20 30
Cars Sold
Stan
dard
Res
idua
ls
46 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Standardized Residual PlotStandardized Residual Plot
All of the standardized residuals are between –All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason 1.5 and +1.5 indicating that there is no reason to question the assumption that to question the assumption that has a normal has a normal distribution.distribution.
47 Slide
© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved
Outliers and Influential ObservationsOutliers and Influential Observations Detecting OutliersDetecting Outliers
• An An outlier outlier is an observation that is unusual is an observation that is unusual in comparison with the other data.in comparison with the other data.
• Minitab classifies an observation as an Minitab classifies an observation as an outlier if its standardized residual value is < outlier if its standardized residual value is < -2 or > +2.-2 or > +2.
• This standardized residual rule sometimes This standardized residual rule sometimes fails to identify an unusually large fails to identify an unusually large observation as being an outlier.observation as being an outlier.
• This rule’s shortcoming can be This rule’s shortcoming can be circumvented by using circumvented by using studentized deleted studentized deleted residualsresiduals..
• The |The |i i th studentized deleted residual| will th studentized deleted residual| will be larger than the |be larger than the |i i th standardized th standardized residual|.residual|.