1
• We use sample data to • estimate a population mean () or (1 - 2) • estimate a population proportion (p) or (p1 - p2)• test of hypothesis about or (1 - 2)• test of hypothesis about p or (p1 - p2).
• Now we want to use sample data to investigate the relationships among a group of variables and to create a mathematical model that can be used to predict its value in the future.
• The process of finding a mathematical model (an equation) that best fits the data is known as regression analysis.
Introduction to Regression Analysis
2
• The variable to be predicted (or modeled), y, is called the dependent variable.
• The variables used to predict (or model) y are called independent variables and are denoted by the symbols x1, x2, x3, etc..
• General form of probabilistic model in regression:
where y = dependent variable
= mean or expected value of y, deterministic component
= unexplainable, or random error component• Estimation/prediction equation
Introduction to Regression Analysis
kkxxxy xxxyk
...22110,...,,| 21
kxxxy ,...,,| 21
kk xbxbxbby ...ˆ 22110
3
Form of The Simple LinearRegression Model
Form of The Simple LinearRegression Model
y|x = b0 + b1x is the mean value of the dependent variable y when the value of the independent variable is x
b0 is the y-intercept, the mean of y when x is 0 (when there is observed any values of x near 0)
b1 is the slope, the change in the mean of y per unit change in x (over the range of sample x-values)
e is an error term that describes the effect on y of all factors other than x
εxββ=εμy= y|x 10
4
The Simple Linear Regression ModelIllustrated
The Simple Linear Regression ModelIllustrated
5
Regression TermsRegression Terms
• β0 and β1 are called regression parameters
• β0 is the y-intercept and β1 is the slope
• We do not know the true values of these parameters
• So, we must use sample data to estimate them
• b0 is the estimate of β0 and b1 is the estimate of β1
6
The Least Squares Point EstimatesThe Least Squares Point Estimates
n
xx i
xy 10 bb ˆEstimation/prediction equation
MS EXCEL: =SLOPE(y range, x range) =INTERCEPT(y range, x range)
Slope:
y-intercept:
SSxx
SSxyb 1
xbyb 10
n
yy i n=sample size
yxnyxyyxxSS iiiixy ))((
222 )()( xnxxxSS iixx
7
An Estimator of 2
22
n
SSEs
where
n = sample sizes = standard deviation of error = standard error of estimate
xyixyyyii SSbynySSbSSyySSE 122
12 )()ˆ(
8
A 100(1-)% confidence interval for the simple linear regression slope 1
where
t/2 is based on (n-2) degree of freedom
12/1 bstb
xx
bSS
ss
1
9
Testing the Significance of the Slope
One Tailed Test Two Tailed TestHo: 1 = 0 Ho: 1 = 0Ha: 1 < 0 Ha: 1 0 or 1 > 0
Test Statistic:
Rejection region: t< -t Rejection region: |t|>t/2
or t> t Where t is based on Where t/2 is based on(n-2) degree of freedom (n-2) degree of freedom
1
1
bs
bt
10
The 100(1-)% confidence interval for the mean value of y for x=xp
( )
/y t sn
x x
SSp
xx
2
21
Where t/2 is based on (n-2) degree of freedom
11
The 100(1-)% prediction interval for an individual y for x=xp
( )
/y t sn
x x
SSp
xx
2
2
11
Where t/2 is based on (n-2) degree of freedom
12
Simple Coefficient of Determination
r2 =Explained Variation
Total Variation
2
2
)(
)ˆ(
yy
yy
i
i
About 100(r2)% of the sample variation in y can be explained by using x to predict y in the simple linear regression model.
Total VariationExplained Variationy
xi
yi
iy
Un-Explained Variation
13
The coefficient of correlation
SSxy r = ---------------- SSxx SSyy r for sample and (rho) for population -1< r <1 r > 0 means that y increases as x increases r < 0 means that y decreases as x increases r 0 little or no linear relationship between y and x. the closer r to 1 or –1, the stronger the relationship. High correlation does not imply causality. Only a linear trend may exist between x and y.
222)( ynyyySS iiyy
Where
2rr 2rr when b1>0 or when b1<0
Exercise
• What is the range of values that the coefficient of determination can assume? ___
• If the value of r is -0.96, what does this indicate about the dependent variable as the independent variable increases? __
• If the correlation between sales and advertising is +0.6, what percent of the variation in sales can be attributed to advertising? __
• What does the coefficient of determination equal if r = 0.89?
15
Exercise
• In the regression equation, what does the letter "b" represent?
• What is the null hypothesis to test the significance of the slope in a regression equation?
• The regression equation is Ŷ = 29.29 - 0.96X, the sample size is 8, and the standard error of the slope is 0.22. What is the test statistic to test the significance of the slope?
16
Exercise
• Page 488 no. 26• Page 494 no. 31• Page 500 no. 38• Page 502 no. 46• Page 506 no. 56