happiness comes not from material wealth but less desire
DESCRIPTION
Happiness comes not from material wealth but less desire. Applied Statistics Using SAS and SPSS. Topic: Simple linear regression By Prof Kelly Fan, Cal State Univ, East Bay. - PowerPoint PPT PresentationTRANSCRIPT
1
Happiness comes not from material wealth but less desire.
2
Applied Statistics Using SAS and SPSS
Topic: Simple linear regression
By Prof Kelly Fan, Cal State Univ, East Bay
3
Example: Computer Repair
A company markets and repairs small computers. How fast (Time) an electronic component (Computer Unit) can be repaired is very important to the efficiency of the company. The Variables in this example are:
Time and Units.
4
Humm… How long will it take
me to repair this unit?
Goal: to predict the length of repair Time for a given number of computer Units
5
Computer Repair Data
Units Min’s Units Min’s
1 23 6 97
2 29 7 109
3 49 8 119
4 64 9 149
4 74 9 145
5 87 10 154
6 96 10 166
6
Scatterplot of response variable against explanatory variable
What is the overall (average) pattern?What is the direction of the pattern? How much do data points vary from the overall (average)
pattern?Any potential outliers?
Graphical Summary of Two Quantitative Variable
7
Time is Linearly related with computer Units.
(The length of) Time is Increasing as (the number of) Units increases.
Data points are closed to the line.
No potential outlier.
Scatterplot (Time vs Units) Some Simple Conclusions
Summary for Computer Repair Data
8
Numerical Summary of Two Quantitative Variable
Regression equation
Correlation
9
Review: Math Equation for a Line
Y: the response variableX: the explanatory variable
X
Y Y=b0+b1X
} b0
} b1
1
10
Regression Equation
The regression line models the relationship between X and Y on average.
The math equation of a regression line is called regression equation.
11
The Usage of Regression Equation
Predict the value of Y for a given X valueEg. How long will it take to repair 3
computer units?
12
General Notation
is called “predicted Y,” pronounced as “y hat,” which estimates the average Y value for a specified X value.
Eg.
The predicted repair time of a given # of units
XY 51.1516.4ˆ
XbbY 10ˆ
13
The Limitation of the Regression Equation
The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed.
Eg. The predicted WT of a given HT:
Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!!
XY 5205ˆ
14
The Unpredicted Part
The value is the part the regression equation (model) cannot predict, and it is called “residual.”
YY ˆ
15
residual {
16
Correlation between X and Y
X and Y might be related to each other in many ways: linear or curved.
17
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.2
1.4
1.6
1.8
2.0
2.2
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.5
2.0
2.5
3.0
r=.98Strong Linearity
r=.71Median Linearity
Examples of Different Levels of Correlation
18
x
y
0.0 0.2 0.4 0.6 0.8 1.0
2.0
2.5
3.0
3.5
4.0
r=-.09Nearly Uncorrelated
Examples of Different Levels of Correlation
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
r=.00Nearly Curved
19
(Pearson) Correlation Coefficient of X and Y
A measurement of the strength of the “LINEAR” association between X and Y
Sx: the standard deviation of the data values in X, Sy: the standard deviation of the data values in Y;the correlation coefficient of X and Y is:
xy
n
iii
ssn
xxyyr
)1(
))((1
20
Correlation Coefficient of X and Y
-1< r < 1The magnitude of r measures the strength
of the linear association of X and YThe sign of r indicate the direction of the
association: “-” negative association“+” positive
association
21
Goodness of Fit
R^2 is the proportion of Y variance explained/accounted by the model we use to fit the data
When there is only one X (simple linear regression) R^2 = r^2.
22
SPSS Output
Analyze >> Regression >> Linear
Model Summaryb
.994a .987 .986 5.39172Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), unitsa.
Dependent Variable: timeb.
ANOVAb
27419.509 1 27419.509 943.201 .000a
348.848 12 29.07127768.357 13
RegressionResidualTotal
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), unitsa.
Dependent Variable: timeb.
23
Confidence Intervals
Coefficientsa
4.162 3.355 1.240 .239 -3.148 11.47215.509 .505 .994 30.712 .000 14.409 16.609
(Constant)units
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Lower Bound Upper Bound95% Confidence Interval for B
Dependent Variable: timea.
24
Check for Normality
25
Check for Equal VariancesSCATTERPLOT of zresid & zpred
26
4 6 8 10 12 14X3
5
7
9
11
13
Y3
The Influence of Outliers
The slope becomes smaller (toward outliers)
The r value becomes smaller (less linear)
27
The slope becomes clear (toward outliers)
The | r | value becomes larger (more linear: 0.1590.935)
The Influence of Outliers
x
y
1086420
5
4
3
2
1
0
Scatterplot of y vs x
28
Identify Outliers using Residual Plots
Use “standardized” residuals!!
The cases with standardized residuals of size 3 or more outliers