1 relationships we have examined how to measure relationships between two categorical variables...
Post on 22-Dec-2015
217 views
TRANSCRIPT
1
Relationships
We have examined how to measure relationships between two categorical variables (chi-square)one categorical variable and one
measurement variable (t-test, F-test)
Now we look at relationships between two measurement variables
2
Interval variable relations
We want to describe the relationship in terms of formstrength
We want to make inferences to the population
3
Our Tools
Correlation to measure strength of relationship
Regression to measure form of relationship
4
Regression
Begin with a scatterplot of two measurement variables, X and Y
Let X be the independent variable
Let Y be the dependent variable
Plot each case as we have done before at the beginning of the course.
5
ScatterplotH
om
icid
es/
100,0
00 p
eople
Percent families below poverty0.00 5.00 10.00 15.00
0.00
10.00
20.00
30.00
Note:
6
Note the outlier: Dallas
7
Relationships
Each city is represented by an X score (percent poor) and a Y score (homicide rate)We are asking about the relationship between poverty and homicideDoes homicide change as percent poor changes? If so, in what way and how much?
8
Looking at the scatterplot
We see that as percent poor (poverty) increases (from left to right on the graph), the homicide rate increases (from low to high on the graph
9
ScatterplotH
om
icid
es/
100,0
00 p
eople
Percent families below poverty0.00 5.00 10.00 15.00
0.00
10.00
20.00
30.00
10
Representing relationships
We represent the relationship with a straight line that goes through the middle of the points on the graph
This line is the regression line
It shows the average homicide rate for every level of poverty.
11
Regression Line
0.00 5.00 10.00 15.00
0.00
10.00
20.00
30.00
12
Regression LineEvery line is represented by a formula
The regression line has the following general formula
‘a’ represents the intercept of the line
‘b’ represents the slope of the line
y-hat is the predicted value of y for a given x value
bxay ˆ
13
Regression of homicide on poverty
xy 944.815.ˆ
a = -.815 b = .944
x is percent poory is homicide rate
14
Slope, the value of bThe slope of the regression line is positive, it goes from the lower left to the upper right.
The slope measures the amount of change in the dependent variable for every unit change in the independent variable
b = .944. There is an increase of .944 units in y for every increase of 1.0 in x
15
Regression Line, slope
Percent families below poverty0.00 5.00 10.00
0.00
10.00
20.00
5 units
5 x .944 units
RegressionLine
“rise”
“run”
16
Intercept, the value of a
The intercept is the point where the regression line crosses the Y axis
This point is the value of Y when X is zero
a = -.815. The predicted rate of homicide is -.815 when there is zero poverty
17
Calculate b
2xx
yyxxb
i
ii
18
Calculate a
First calculate b, then
xbya
19
Calculate predicted y
After calculating a and b, one can use the regression line formula to calculate predicted values of y for every actual value of x
xy 944.815.ˆ
20
Prediction errors
Prediction errors are the difference between the predicted value of y and the actual value of y
yye ii ˆ
21
Prediction errors
Errors (actualminus predicted)
RegressionLine
Predicted
Actual
22
Ordinary Least Squares: OLS
The regression line is the “best fitting” line through the data points in the graphIt is the line that minimizes the sum of the squared error terms -- hence “least squares”
22 yye iiMinimize:
23
Sums of Squared Errors
-1.0 -0.9 -0.8 -0.7 -0.6
0.7
0.9
1.1
460.0
480.0
500.0
520.0
540.0
560.0
580.0
600.0
620.0
640.0
SS
a
b
24
Sum of Squared Errors
-1.0 -0.9 -0.8 -0.7 -0.60.7 638.7 630.2 622.0 614.3 607.00.8 572.9 567.6 562.8 558.3 554.30.9 537.9 535.9 534.3 533.2 532.41.0 533.7 535.0 536.7 538.8 541.31.1 560.4 565.0 569.9 575.3 581.1
Minimum is 531.57 when a=-.815, b=.944
b
a