![Page 1: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/1.jpg)
Part 17: Regression Residuals17-1/38
Statistics and Data Analysis
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
![Page 2: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/2.jpg)
Part 17: Regression Residuals17-2/38
Statistics and Data Analysis
Part 17 – The Linear Regression Model
![Page 3: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/3.jpg)
Part 17: Regression Residuals17-3/38
Regression Modeling
Theory behind the regression model Computing the regression statistics Interpreting the results Application: Statistical Cost Analysis
![Page 4: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/4.jpg)
Part 17: Regression Residuals17-4/38
A Linear Regression
Predictor: Box Office = -14.36 + 72.72 Buzz
![Page 5: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/5.jpg)
Part 17: Regression Residuals17-5/38
Data and Relationship
We suggested the relationship between box office sales and internet buzz is Box Office = -14.36 + 72.72 Buzz
Box Office is not exactly equal to -14.36+72.72xBuzz How do we reconcile the equation with the data?
![Page 6: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/6.jpg)
Part 17: Regression Residuals17-6/38
Modeling the Underlying Process
A model that explains the process that produces the data that we observe: Observed outcome = the sum of two parts (1) Explained: The regression line (2) Unexplained (noise): The remainder.
Internet Buzz is not the only thing that explains Box Office, but it is the only variable in the equation.
Regression model The “model” is the statement that part (1) is the
same process from one observation to the next.
![Page 7: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/7.jpg)
Part 17: Regression Residuals17-7/38
The Population Regression
THE model: (1) Explained:
Explained Box Office = α + β Buzz (2) Unexplained: The rest is “noise, ε.”
Random ε has certain characteristics Model statement
Box Office = α + β Buzz + ε Box Office is related to Buzz, but is not exactly
equal to α + β Buzz
![Page 8: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/8.jpg)
Part 17: Regression Residuals17-8/38
The Data Include the Noise
![Page 9: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/9.jpg)
Part 17: Regression Residuals17-9/38
What explains the noise?What explains the variation in fuel bills?
ROOMS
FUEL
BIL
L
111098765432
1400
1200
1000
800
600
400
200
Scatterplot of FUELBILL vs ROOMS
![Page 10: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/10.jpg)
Part 17: Regression Residuals17-10/38
Noisy Data?What explains the variation in milk production other
than number of cows?
![Page 11: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/11.jpg)
Part 17: Regression Residuals17-11/38
Assumptions
(Regression) The equation linking “Box Office” and “Buzz” is stable
E[Box Office | Buzz] = α + β Buzz
Another sample of movies, say 2012, would obey the same fundamental relationship.
![Page 12: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/12.jpg)
Part 17: Regression Residuals17-12/38
Model Assumptions
yi = α + β xi + εi α + β xi is the “regression function” εi is the “disturbance. It is the unobserved
random component The Disturbance is Random Noise
Mean zero. The regression is the mean of yi.
εi is the deviation from the regression. Variance σ2.
![Page 13: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/13.jpg)
Part 17: Regression Residuals17-13/38
We will use the data to estimate and β
Sample : a + b Buzz
![Page 14: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/14.jpg)
Part 17: Regression Residuals17-14/38
We also want to estimate 2 =√E[εi2]
Sample : a + b Buzz
e=y-a-bBuzz
![Page 15: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/15.jpg)
Part 17: Regression Residuals17-15/38
Standard Deviation of the Residuals
Standard deviation of εi = yi-α-βxi is σ
σ = √E[εi2] (Mean of εi is zero)
Sample a and b estimate α and β Residual ei = yi – a – bxi estimates εi
Use √(1/N-2)Σei2 to estimate σ.
N N2 2i i ii=1 i=1
e
e (y - a -bx )s = =
N- 2 N- 2
Why N-2? Relates to the fact that two parameters (α,β) were estimated. Same reason N-1 was used to compute a sample variance.
![Page 16: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/16.jpg)
Part 17: Regression Residuals17-16/38
Residuals
![Page 17: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/17.jpg)
Part 17: Regression Residuals17-17/38
Summary: Regression Computations
N
ii 1
N
ii 1
N2 2x ii 1
N2 2y ii 1
The same 5 statistics (with N) are still needed:
N = 62 complete observations.
1y = y = 20.721
N1
x = x = 0.48242N
1Var(x) = s = (x x) = 0.02453
N-11
Var(y) = s = (y y) = 305N-1
xy
N
i ii 1
.985
Cov(x,y) = s
1 = (x x)(y y) = 1.784
N-1
xy
2x
2 2 2y x
e
2 22 x
2y
sb = = 72.72
s
a = y - bx = -14.36
(N-1)(s -b s )s = = 13.386
N- 2(for later...),
b sR = = 0.424
s
![Page 18: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/18.jpg)
Part 17: Regression Residuals17-18/38
Using se to identify outliersRemember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (a+bx) ± 2se below.)
This point is 2.2 standard deviations from the regression.
Only 3.2% of the 62 observations lie outside the bounds. (We will refine this later.)
![Page 19: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/19.jpg)
Part 17: Regression Residuals17-19/38
![Page 20: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/20.jpg)
Part 17: Regression Residuals17-20/38
Linear Regression
Sample Regression Line
![Page 21: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/21.jpg)
Part 17: Regression Residuals17-21/38
![Page 22: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/22.jpg)
Part 17: Regression Residuals17-22/38
![Page 23: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/23.jpg)
Part 17: Regression Residuals17-23/38
Results to Report
![Page 24: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/24.jpg)
Part 17: Regression Residuals17-24/38
The Reported Results
![Page 25: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/25.jpg)
Part 17: Regression Residuals17-25/38
Estimated equation
![Page 26: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/26.jpg)
Part 17: Regression Residuals17-26/38
Estimated coefficients a and b
![Page 27: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/27.jpg)
Part 17: Regression Residuals17-27/38
S = se = estimated std. deviation of ε
![Page 28: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/28.jpg)
Part 17: Regression Residuals17-28/38
Square of the sample correlation between x and y
![Page 29: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/29.jpg)
Part 17: Regression Residuals17-29/38
N-2 = degrees of freedom
N-1 = sample size minus 1
![Page 30: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/30.jpg)
Part 17: Regression Residuals17-30/38
Sum of squared residuals, Σiei
2
![Page 31: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/31.jpg)
Part 17: Regression Residuals17-31/38
S2 = se2
![Page 32: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/32.jpg)
Part 17: Regression Residuals17-32/38
N 2ii=1
Total Variation
= (y - y)
![Page 33: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/33.jpg)
Part 17: Regression Residuals17-33/38
2
N2
N
2ii=1
2ii=1
Coefficient of Determination R
b (x - x)= =
(y - y)
RegressionSS
TotalSS
![Page 34: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/34.jpg)
Part 17: Regression Residuals17-34/38
The Model
Constructed to provide a framework for interpreting the observed data What is the meaning of the observed relationship
(assuming there is one) How it’s used
Prediction: What reason is there to assume that we can use sample observations to predict outcomes?
Testing relationships
![Page 35: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/35.jpg)
Part 17: Regression Residuals17-35/38
A Cost Model
Electricity.mpj
Total cost in $Million
Output in Million KWH
N = 123 American electric utilities
Model: Cost = α + βKWH + ε
![Page 36: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/36.jpg)
Part 17: Regression Residuals17-36/38
Cost Relationship
Output
Cost
80000700006000050000400003000020000100000
500
400
300
200
100
0
Scatterplot of Cost vs Output
![Page 37: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/37.jpg)
Part 17: Regression Residuals17-37/38
Sample Regression
![Page 38: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/38.jpg)
Part 17: Regression Residuals17-38/38
Interpreting the Model
Cost = 2.44 + 0.00529 Output + e Cost is $Million, Output is Million KWH. Fixed Cost = Cost when output = 0
Fixed Cost = $2.44Million Marginal cost
= Change in cost/change in output= .00529 * $Million/Million KWH= .00529 $/KWH = 0.529 cents/KWH.
![Page 39: Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics](https://reader031.vdocument.in/reader031/viewer/2022032604/56649e665503460f94b615f6/html5/thumbnails/39.jpg)
Part 17: Regression Residuals17-39/38
Summary
Linear regression model Assumptions of the model Residuals and disturbances
Estimating the parameters of the model Regression parameters Disturbance standard deviation
Computation of the estimated model