bb abs 10th univariate and bivariate corr sr mr 81pages final
TRANSCRIPT
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
1/80
Business Statistics
ABS-Bangalore
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
2/80
Dr. R. Venkatamuni ReddyAssociate Professor
Contact: 09632326277, 080-30938181
mailto:[email protected]:[email protected]:[email protected]:[email protected] -
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
3/80
Correlation, Simple andMultiple Regression
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
4/80
Learning Objectives
In this chapter
How to use regression analysis to predict the value of a
dependent variable based on an independent variable
The meaning of the regression coefficients b0 and b1
How to evaluate the assumptions of regression analysis and
know what to do if the assumptions are violated
To make inferences about the slope and correlation coefficient
To estimate mean values and predict individual values
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
5/80
Correlation vs. Regression
A scatter diagram can be used to show therelationship between two variables
Correlation analysis is used to measure strengthof the association (linear relationship)between two variables
Correlation is only concerned with strength of therelationship
No causal effect is implied with correlation
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
6/80
Introduction toRegression Analysis
Regression analysis is used to:
Predict the value of a dependent variable based on the value
of at least one independent variable
Explain the impact of changes in an independent variable on
the dependent variable
Dependent variable: the variable we wish to predict
or explainIndependent variable: the variable used to explain
the dependent variable
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
7/80
Simple Linear RegressionModel
Only one independent variable, X
Relationship between X and Y is described by
a linear function
Changes in Y are assumed to be caused by
changes in X
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
8/80
Types of Relationships
YY
XX
YY
XX
YY
YY
XX
XX
Linear relationshipsLinear relationships Curvilinear relationshipsCurvilinear relationships
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
9/80
Types of Relationships
YY
XX
YY
XX
YY
YY
XX
XX
Strong relationshipsStrong relationships Weak relationshipsWeak relationships
(continued)(continued)
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
10/80
Types of Relationships
YY
XX
YY
XX
No relationshipNo relationship
(continued)(continued)
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
11/80
ii10i XY ++=Linear componentLinear component
Simple Linear RegressionModel
PopulationPopulation
Y interceptY intercept
PopulationPopulation
SlopeSlope
CoefficientCoefficient
RandomRandom
ErrorError
termtermDependentDependent
VariableVariable
IndependentIndependent
VariableVariable
Random ErrorRandom Error
componentcomponent
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
12/80
(continued)(continued)
Random Error forRandom Error for
this Xthis Xii valuevalue
YY
XX
Observed ValueObserved Value
of Y for Xof Y for Xii
Predicted ValuePredicted Value
of Y for Xof Y for Xii
ii10i XY ++=
XXii
Slope =Slope = 11
Intercept =Intercept = 00
ii
Simple Linear RegressionModel
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
13/80
i10i XbbY +=
The simple linear regression equation provides anThe simple linear regression equation provides an estimateestimate ofof
the population regression linethe population regression line
Simple Linear RegressionEquation (Prediction Line)
Estimate of theEstimate of theregressionregression
interceptintercept
Estimate of theEstimate of theregression sloperegression slope
Estimated (orEstimated (or
predicted) Ypredicted) Yvalue forvalue for
observation iobservation i
Value of X forValue of X for
observation iobservation i
The individual random error terms eThe individual random error terms eii have a mean of zerohave a mean of zero
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
14/80
Least Squares Method
b0 and b1 are obtained by finding the values of b0 and b1
that minimize the sum of the squared differences
between Y and :
2
i10i
2
ii ))Xb(b(Ymin)Y(Ymin +=
Y
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
15/80
Finding the Least SquaresEquation
The coefficients b0
and b1, and other regression
results will be found using Excel or SPSS or
STATA
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
16/80
b0is the estimated average value of Y
when the value of X is zero
b1is the estimated change in the average
value of Y as a result of a one-unit changein X
Interpretation of theSlope and the Intercept
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
17/80
Simple Linear RegressionExample
A real estate agent wishes to examine the relationship
between the selling price of a home and its size
(measured in square feet)
A random sample of 10 houses is selected
Dependent variable (Y) = house price in $1000s
Independent variable (X) = square feet
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
18/80
Sample Data for HousePrice Model
House Price in $1000s(Y)
Square Feet(X)
245 1400
312 1600
279 1700308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
19/80
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Fee
HousePrice($1000s)
Graphical Presentation
House price model: scatter plot
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
20/80
Regression Using ExcelTools / Data Analysis / Regression
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
21/80
Excel OutputRegression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
The regression equation is:The regression equation is:
feet)(square0.1097798.24833pricehouse +=
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
22/80
0
50100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Fee
HousePrice($1000s)
Graphical PresentationHouse price model: scatter plot andregression line
feet)(square0.1097798.24833pricehouse +=
SlopeSlope
= 0.10977= 0.10977
InterceptIntercept= 98.248= 98.248
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
23/80
Interpretation of theIntercept, b0
b0
is the estimated average value of Y when the
value of X is zero (if X = 0 is in the range of
observed X values)Here, no houses had 0
square feet, so b0 = 98.24833just indicates that, for houses within the range
of sizes observed, $98,248.33 is the portion of
the house price not explained by square feet
feet)(square0.1097798.24833pricehouse +=
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
24/80
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
25/80
317.85
0)0.1098(20098.25
(sq.ft.)0.109898.25pricehouse
=
+=
+=
Predict the price for a housePredict the price for a house
with 2000 square feet:with 2000 square feet:
The predicted price for a house with 2000The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850square feet is 317.85($1,000s) = $317,850
Predictions usingRegression Analysis
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
26/80
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
HousePrice($1
000s)
Interpolation vs. Extrapolation
When using a regression model for prediction,only predict within the relevant range of data
Relevant range forRelevant range for
interpolationinterpolation
Do not try toDo not try to
extrapolate beyondextrapolate beyond
the range ofthe range of
observed Xsobserved Xs
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
27/80
Measures of Variation
Total variation is made up of two parts:
SSESSRSST +=Total Sum ofTotal Sum of
SquaresSquaresRegression Sum ofRegression Sum of
SquaresSquaresError Sum ofError Sum of
SquaresSquares
=
2
i )YY(SST =
2
ii )Y
Y(SSE=
2
i )YY
(SSRwhere:where:
= Average value of the dependent variable= Average value of the dependent variable
YYii = Observed values of the dependent variable= Observed values of the dependent variable
ii = Predicted value of Y for the given X= Predicted value of Y for the given Xii valuevalueY
Y
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
28/80
Measures of Variation
SST = Total sum of squares
Measures the variation of the Yi values around their mean Y
SSR = regression sum of squares
Explained variation attributable to the relationship between X
and Y
SSE = error sum of squares
Variation attributable to factors other than the relationship
between X and Y
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
29/80
(continued)(continued)
XXii
YY
XX
YYii
SSTSST ==(Y(Yii-- YY))22SSESSE == (Y(Yii-- YYii ))22
SSR =SSR = ((YYii-- YY))22
__
__
__
YY
YY
YY
__YY
Measures of Variation
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
30/80
The coefficient of determination is the portion of
the total variation in the dependent variable that is
explained by variation in the independent variable
The coefficient of determination is also called r-squared and is denoted as r2
Coefficient of Determination,r2
1r0 2 note:note:
squaresofsumtotalsquaresofsumregression
SSTSSRr2 ==
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
31/80
rr22 = 1= 1
Examples of Approximater2 Values
YY
XX
YY
XX
rr22 = 1= 1
rr22 = 1= 1
Perfect linear relationship betweenPerfect linear relationship betweenX and Y:X and Y:
100% of the variation in Y is100% of the variation in Y is
explained by variation in Xexplained by variation in X
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
32/80
Examples of Approximater2 Values
YY
XX
YY
XX
0 < r0 < r22 < 1< 1
Weaker linear relationshipsWeaker linear relationshipsbetween X and Y:between X and Y:
Some but not all of the variationSome but not all of the variation
in Y is explained by variation inin Y is explained by variation in
XX
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
33/80
Examples of Approximater2 Values
rr22 = 0= 0
No linear relationship between XNo linear relationship between Xand Y:and Y:
The value of Y does not dependThe value of Y does not depend
on X. (None of the variation in Yon X. (None of the variation in Y
is explained by variation in X)is explained by variation in X)
YY
XXrr22 = 0= 0
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
34/80
Excel OutputRegression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
58.08% of the variation in house58.08% of the variation in house
prices is explained by variation inprices is explained by variation in
square feetsquare feet
0.5808232600.500018934.9348
SSTSSRr2 ===
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
35/80
Standard Error of EstimateThe standard deviation of the variation of
observations around the regression line is estimated
by
2n
)YY(
2n
SSES
n
1i
2
ii
YX
=
==
WhereWhere
SSE = error sum of squaresSSE = error sum of squares
n = sample sizen = sample size
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
36/80
Excel OutputRegression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
41.33032SYX =
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
37/80
Comparing Standard Errors
YYYY
XX XXYXssmall YXslarge
SSYXYX is a measure of the variation of observed Y valuesis a measure of the variation of observed Y values
from the regression linefrom the regression line
The magnitude of SThe magnitude of SYXYX should always be judged relative to the size ofshould always be judged relative to the size of
the Y values in the sample datathe Y values in the sample data
i.e., Si.e., SYXYX = $41.33K is= $41.33K ismoderately small relative to house prices in themoderately small relative to house prices in the
$200 - $300K range$200 - $300K range
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
38/80
Assumptions of RegressionUse the acronym LINE:
Linearity
The underlying relationship between X and Y is linear
Independence of Errors Error values are statistically independent
Normality of Error Error values () are normally distributed for any given value of X
Equal Variance (Homoscedasticity) The probability distribution of the errors has constant variance
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
39/80
Residual Analysis
The residual for observation i, ei, is the difference between its
observed and predicted value
Check the assumptions of regression by examining the residuals Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
Examine for constant variance for all levels of X (homoscedasticity)
Graphical Analysis of Residuals
Can plot residuals vs. X
iii YYe =
Resid al Anal sis for
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
40/80
Residual Analysis forLinearity
Not LinearNot Linear LinearLinear
xx
res
id
ual
s
res
id
ual
s
xx
YY
xx
YY
xx
res
id
ual
s
res
id
ual
s
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
41/80
Residual Analysis forResidual Analysis forIndependenceIndependence
Not IndependentNot Independent
IndependentIndependent
XX
XXres
id
ua
ls
res
id
ua
ls
res
id
ua
ls
res
id
ua
ls
XX
res
id
ua
ls
res
id
ua
ls
Residual Analysis for
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
42/80
Residual Analysis forNormality
PercentPercent
ResidualResidual
A normal probability plot of the residuals can be used toA normal probability plot of the residuals can be used to
check for normality:check for normality:
-3 -2 -1 0 1 2 3-3 -2 -1 0 1 2 3
00
100100
R id l A l i f
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
43/80
Residual Analysis forEqual Variance
Non-constant varianceNon-constant variance Constant varianceConstant variance
xx xx
YY
xx xx
YY
res
id
ual
s
res
id
ua
ls
res
id
ual
s
res
id
ua
ls
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
44/80
House Price Model Residual Pl
-60
-40
-20
0
20
40
60
80
0 1000 2000 300
Square Fee
Residuals
Excel Residual Output
RESIDUAL OUTPUT
Predicted
House Price
Residuals
1 251.92316 -6.923162
2 273.87671 38.123293 284.85348 -5.853484
4 304.06284 3.937162
5 218.99284 -19.99284
6 268.38832 -49.38832
7 356.20251 48.797498 367.17929 -43.17929
9 254.6674 64.33264
10 284.85348 -29.85348
Does not appear to violateDoes not appear to violate
any regression assumptionsany regression assumptions
easur ng u ocorre a on:
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
45/80
Used when data are collected over time
to detect if autocorrelation is present
Autocorrelation exists if residuals in one
time period are related to residuals in
another period
easur ng u ocorre a on:The Durbin-Watson
Statistic
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
46/80
AutocorrelationAutocorrelation is correlation of the errors (residuals) over time
Violates the regression assumption that residuals areViolates the regression assumption that residuals are
random and independentrandom and independent
Time (t) Residual Plot
-15
-10
-5
0
5
10
15
0 2 4 6 8
Time (t)
Residuals
Here, residuals show aHere, residuals show a
cyclic pattern, notcyclic pattern, not
random. Cyclical patternsrandom. Cyclical patterns
are a sign of positiveare a sign of positive
autocorrelationautocorrelation
Th D bi W t
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
47/80
The Durbin-WatsonStatistic
=
=
= n
1i
2
i
n
2i
2
1ii
e
)ee(
D
The possible range is 0The possible range is 0 D 4 D 4
D should be close to 2 if HD should be close to 2 if H00 is trueis true
D less than 2 may signal positiveD less than 2 may signal positive
autocorrelation, D greater than 2 mayautocorrelation, D greater than 2 may
signal negative autocorrelationsignal negative autocorrelation
The Durbin-Watson statistic is used to test for
autocorrelation
HH00: residuals are not correlated: residuals are not correlated
HH11: positive: positive autocorrelation is presentautocorrelation is present
T ti f P iti
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
48/80
Testing for PositiveAutocorrelation
Calculate the Durbin-Watson test statistic = DCalculate the Durbin-Watson test statistic = D
(The Durbin-Watson Statistic can be found using Excel or Minitab)(The Durbin-Watson Statistic can be found using Excel or Minitab)
Decision rule: reject HDecision rule: reject H00
if D < dif D < dLL
HH00: positive autocorrelation does not exist: positive autocorrelation does not exist
HH11::positive autocorrelation is presentpositive autocorrelation is present
00 ddUU 22ddLL
Reject HReject H00 Do not reject HDo not reject H00
Find the values dFind the values dLL and dand dUU from the Durbin-Watson tablefrom the Durbin-Watson table
(for sample size(for sample size nn and number of independent variablesand number of independent variables kk))
InconclusiveInconclusive
i f i i
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
49/80
Suppose we have the following time series data:
Is there autocorrelation?
y = 30.65 + 4.7038
R2
= 0.8976
0
20
40
60
80
100
120
140
160
0 5 10 15 20 25 30
Time
Sales
Testing for PositiveTesting for PositiveAutocorrelationAutocorrelation
(continued)(continued)
T ti f P iti
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
50/80
Example with n = 25:
Durbin-Watson Calculations
Sum of Squared
Difference of Residuals
3296.18
Sum of SquaredResiduals
3279.98
Durbin-Watson Statistic 1.00494
y = 30.65 + 4.7038
R2 = 0.8976
0
20
40
60
80
100
120
140
160
0 5 10 15 20 25 30
Time
Sales
Testing for PositiveAutocorrelation
(continued)(continued)
Excel/PHStat output:Excel/PHStat output:
1.004943279.98
3296.18
e
)e(e
Dn
1i
2
i
n
2i
2
1ii
==
=
=
=
T ti f P iti
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
51/80
Here, n = 25 and there is k = 1 one independent variable
Using the Durbin-Watson table, dL= 1.29 and d
U= 1.45
D = 1.00494 < dL= 1.29, soreject H
0and conclude that
significant positive autocorrelation existsTherefore the linear model is not the appropriate model to
forecast sales
Testing for PositiveAutocorrelation
(continued)(continued)
Decision:Decision: reject Hreject H00 sincesince
D = 1.00494 < dD = 1.00494 < dLL
00 ddUU=1.45=1.45 22ddLL=1.29=1.29Reject HReject H00 Do not reject HDo not reject H00InconclusiveInconclusive
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
52/80
Inferences About the Slope
The standard error of the regression slope
coefficient (b1) is estimated by
==
2
i
YXYXb
)X(X
S
SSX
SS1
where:where:
= Estimate of the standard error of the least squares slope= Estimate of the standard error of the least squares slope
= Standard error of the estimate= Standard error of the estimate
1bS
2n
SSESYX
=
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
53/80
Excel OutputRegression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
0.03297S1b=
C i St d d E
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
54/80
Comparing Standard Errorsof the Slope
YY
XX
YY
XX1bSsmall 1bSlarge
is a measure of the variation in the slope of regression lines fromis a measure of the variation in the slope of regression lines from
different possible samplesdifferent possible samples1b
S
Inference about the Slope:
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
55/80
Inference about the Slope:t Test
t test for a population slopeIs there a linear relationship between X and Y?
Null and alternative hypotheses
H0: 1 = 0 (no linear relationship)
H1: 1 0 (linear relationship does exist)
Test statistic
1b
11
S
bt
=
2nd.f. =
where:where:
bb11 = regression slope= regression slopecoefficientcoefficient
11 = hypothesized slope= hypothesized slope
SSbb = standard= standard
error of the slopeerror of the slope11
Inference about the Slope:
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
56/80
House Price in$1000s
(y)
Square Feet(x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
(sq.ft.)0.109898.25pricehouse +=
Simple Linear Regression Equation:Simple Linear Regression Equation:
The slope of this model is 0.1098The slope of this model is 0.1098
Does square footage of the house affectDoes square footage of the house affect
its sales price?its sales price?
Inference about the Slope:t Test
(continued)(continued)
Inferences about the Slope:
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
57/80
Inferences about the Slope:tTest Example
H0: 1 = 0
H1: 1 0
From Excel output:From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
1bS
tt
bb11
32938.303297.0
010977.0
S
b
t1b
11
=
=
=
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
58/80
Inferences about the Slope:
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
59/80
Inferences about the Slope:tTest Example
H0: 1 = 0
H1: 1 0
P-value =P-value = 0.010390.01039
There is sufficient evidenceThere is sufficient evidence
that square footage affectsthat square footage affects
house pricehouse price
From Excel output:From Excel output:
Reject HReject H00
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
P-valueP-value
Decision:Decision: P-value 3.329)+P(t < -3.329) =P(t > 3.329)+P(t < -3.329) =
0.010390.01039
(for 8 d.f.)(for 8 d.f.)
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
60/80
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
61/80
Excel OutputRegression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
11.08481708.1957
18934.9348
MSE
MSRF ===
With 1 and 8 degrees ofWith 1 and 8 degrees of
freedomfreedom P-value forP-value forthe F Testthe F Test
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
62/80
H0: 1 = 0
H1: 1 0
= .05
df1= 1 df2 = 8
Test Statistic:Test Statistic:
Decision:Decision:
Conclusion:Conclusion:
Reject HReject H00 atat = 0.05= 0.05
There is sufficient evidence that houseThere is sufficient evidence that house
size affects selling pricesize affects selling price00
= .05= .05
FF.05.05 = 5.32= 5.32
Reject HReject H00Do notDo notreject Hreject H00
11.08MSE
MSRF ==
CriticalCritical
Value:Value:
FF = 5.32= 5.32
F Test for Significance(continued)(continued)
FF
Confidence Interval Estimate
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
63/80
Confidence Interval Estimatefor the Slope
Confidence Interval Estimate of the Slope:Confidence Interval Estimate of the Slope:
Excel Printout for House Prices:Excel Printout for House Prices:
At 95% level of confidence, the confidence interval for the slope isAt 95% level of confidence, the confidence interval for the slope is
(0.0337, 0.1858)(0.0337, 0.1858)
1b2n1Stb
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
d.f. = n - 2d.f. = n - 2
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
64/80
t Test for a Correlation
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
65/80
t Test for a CorrelationCoefficient
Hypotheses
H0: = 0 (no correlation between X and Y)
HA: 0 (correlation exists)
Test statistic
(with n 2 degrees of freedom)
2n
r1-rt 2
=
0bifrr
0bifrr
where
1
2
1
2
+=
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
66/80
Example: House PricesIs there evidence of a linear relationship betweenIs there evidence of a linear relationship between
square feet and house price at the .05 level ofsquare feet and house price at the .05 level of
significance?significance?
HH00:: = 0 (No correlation)= 0 (No correlation)HH11:: 0 (correlation exists)0 (correlation exists)
=.05 , df=.05 , df==10 - 2 = 810 - 2 = 8
3.329
210
.76210.762
2n
r1rt
22=
=
=
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
67/80
Example: Test Solution
Conclusion:Conclusion:ThereThere is evidenceis evidence ofof
a linear associationa linear association
at the 5% level ofat the 5% level of
significancesignificance
Decision:Decision:
Reject HReject H00
Reject HReject H00Reject HReject H00
/2=.025/2=.025
-t-t/2/2Do not reject HDo not reject H00
00tt/2/2
/2=.025/2=.025
-2.3060-2.3060 2.30602.30603.3293.329
d.f. = 10-2 = 8d.f. = 10-2 = 8
3.329
210
.7621
0.762
2n
r1
rt
22=
=
=
Estimating Mean Values and
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
68/80
Estimating Mean Values andPredicting Individual Values
YY
XX XXii
Y = bY = b00+b+b11XXii
ConfidenceConfidence
Interval forInterval for
thethe meanmean ofofY, given XY, given Xii
Prediction IntervalPrediction Interval
for anfor an individualindividual YY,,
given Xgiven Xii
Goal: Form intervals around Y to express uncertaintyGoal: Form intervals around Y to express uncertaintyabout the value of Y for a given Xabout the value of Y for a given Xii
YY
Confidence Interval for
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
69/80
Confidence Interval forthe Average Y, Given X
Confidence interval estimate for theConfidence interval estimate for themean value of Ymean value of Y given a particular Xgiven a particular Xii
Size of interval varies according toSize of interval varies according to
distance away from mean,distance away from mean,XX
iYX2n
XX|Y
hStY
:forintervalConfidencei
=
+=
+=2
i
2
i
2
ii
)X(X
)X(X
n
1
SSX
)X(X
n
1h
Prediction Interval for
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
70/80
Prediction Interval foran Individual Y, Given X
Confidence interval estimate for anConfidence interval estimate for anIndividual value of YIndividual value of Y given a particular Xgiven a particular Xii
This extra term adds to the interval width to reflect theThis extra term adds to the interval width to reflect the
added uncertainty for an individual caseadded uncertainty for an individual case
iYX2n
XX
h1StY
:YforintervalConfidencei
+
=
Estimation of Mean Values:
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
71/80
Estimation of Mean Values:Example
Find the 95% confidence interval for the mean price ofFind the 95% confidence interval for the mean price of
2,000 square-foot houses2,000 square-foot houses
Predicted Price YPredicted Price Yii = 317.85 ($1,000s)= 317.85 ($1,000s)
Confidence Interval Estimate forConfidence Interval Estimate forY|X=XY|X=X
37.12317.85
)X(X
)X(X
n
1StY
2
i
2
iYX2-n =
+
The confidence interval endpoints are 280.66 and 354.90, or fromThe confidence interval endpoints are 280.66 and 354.90, or from
$280,660 to $354,900$280,660 to $354,900
ii
Estimation of Individual
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
72/80
Estimation of IndividualValues: Example
Find the 95% prediction interval for an individual houseFind the 95% prediction interval for an individual house
with 2,000 square feetwith 2,000 square feet
Predicted Price YPredicted Price Yii = 317.85 ($1,000s)= 317.85 ($1,000s)
Prediction Interval Estimate for YPrediction Interval Estimate for YX=XX=X
102.28317.85
)X(X
)X(X
n
11StY
2
i
2
iYX1-n =
++
The prediction interval endpoints are 215.50 and 420.07, or fromThe prediction interval endpoints are 215.50 and 420.07, or from
$215,500 to $420,070$215,500 to $420,070
ii
Finding Confidence and
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
73/80
Finding Confidence andPrediction Intervals in Excel
In Excel, use
PHStat | regression | simple linear regression
Check the
confidence and prediction interval for X=
box and enter the X-value and confidence leveldesired
Finding Confidence and
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
74/80
Input values
Finding Confidence andPrediction Intervals in Excel
(continued)(continued)
Confidence Interval Estimate forConfidence Interval Estimate forY|X=XiY|X=Xi
Prediction Interval Estimate for YPrediction Interval Estimate for YX=XiX=Xi
YY
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
75/80
P itfalls of Regression
AnalysisLacking an awareness of the assumptionsunderlying least-squares regression
Not knowing how to evaluate the assumptions
Not knowing the alternatives to least-squares
regression if a particular assumption is violated
Using a regression model without knowledge of
the subject matter
Extrapolating outside the relevant range
Strategies for Avoiding
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
76/80
Strategies for Avoidingthe Pitfalls of Regression
Start with a scatter diagram of X vs. Y to
observe possible relationship
Perform residual analysis to check the
assumptions
Plot the residuals vs. X to check for violations of
assumptions such as homoscedasticity
Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the
residuals to uncover possible non-normality
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
77/80
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
78/80
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
79/80
Chapter Summary
Described inference about the slope
Discussed correlation -- measuring the
strength of the associationAddressed estimation of mean values and
prediction of individual values
Discussed possible pitfalls in regression andrecommended strategies to avoid them
(continued)(continued)
-
8/6/2019 BB ABS 10th Univariate and Bivariate Corr SR MR 81Pages Final
80/80
Thank You