social statistics: linear regression
DESCRIPTION
Social Statistics: Linear regression. This week. How to predict and how it can be used in the social and behavioral sciences How to judge the accuracy of predictions INTERCEPT and SLOPE functions Multiple regression. Prediction. - PowerPoint PPT PresentationTRANSCRIPT
Social Statistics: Linear regression
How to predict and how it can be used in the social and behavioral sciences
How to judge the accuracy of predictions
INTERCEPT and SLOPE functions Multiple regression
This week
2
Based on the correlation, you can predict the value of one variable from the value of another.
Based on the previously collected data, calculate the correlation between these two variable, use that correlation and the value of X to predict Y
The higher the absolute value of the correlation coefficient, the more accurate the prediction is of one variable from the other based on that correlation
Prediction
3
Prediction is an activity that computes future outcomes from present ones.
When we want to predict one variable from another, we need to first compute the correlation between the two variables
Logic of prediction
4
Linear regression One independent variable Multi-independent variables
Non-linear regression Power Exponential Quadric Cubic etc
Type of regression
baxy bxaxaxay nn ...2211
cbxaxy 2
baxy xey
dcxbxaxy 23
5
high school GPAFirst-year college GPA
3.5 3.32.5 2.2
4 3.53.8 2.72.8 3.51.9 23.2 3.13.7 3.42.7 1.93.3 3.7
Example
1.5 2 2.5 3 3.5 4 4.50
0.5
1
1.5
2
2.5
3
3.5
4
High Schol GPA
Fir
st-y
ear
coll
ege
GP
A
Regression line, line of best fit Y’ = bX + a
6
Y’ = bX + a
Regression line
nXX
nYXXYb
/)(
)/(22
n
XbYa
Y’ = 0.704X + 0.719
Y’ (read Y prime) is the predicted value of Y
7
Y’ = bX + a b = SLOPE(known_y's,known_x's) a = INTERCEPT(known_y's,known_x's)
Excel
high school GPAFirst-year college GPA
3.5 3.32.5 2.2
4 3.53.8 2.72.8 3.51.9 23.2 3.13.7 3.42.7 1.93.3 3.7
Slope (b) 0.703893443intercept (a) 0.71977459
actual value predicted value3.25 3.007428279
8
Error of estimate Standard error of estimate
The difference between the predicated Y and real Y
Standard error of estimate is very similar to the standard deviation.
How good is our predication
9
You are a talent scout looking for new boxers to train. For a group of 6 pro boxers, you record their reach (inches) and the percentage of wins (wins/total*100) over his career. Create a regression equation to predict the success of a boxer given his reach
Example
10
Boxer Reach(X)
Win-p(Y)
A 68 40
B 80 85
C 76 64
D 82 94
E 65 30
Example
11
Making predictions from our equation What winning percentage would you
predict for “T-rex Arms” Timmy, who has a reach of 62-inches
We would predict 18.44% of Timmy’s fights to be wins
Example
12
Making predictions from our equation What winning percentage would you
predict for “Ape-Arms” Al, who has a reach of 84-inches?
We would predict 98.08% of Al’s fights to be wins
Example
13
Standard Error of Estimate
14
For a variety of reasons, a larger percentage of people are concerned today about the state of the environment than in years past. This has led to the formation of environmental action groups that attempt to alter environmental policies nationally and around the globe. A large number of environmental action groups subsist on the donations of concerned citizens. Based on the following eight countries, examine the data to determine the extent of the relationship between simply being concerned about the environment and actually giving money to environmental groups.
Could you construct a scatterplot of the two variables, placing Percentage Concerned as X-axis and Percentage Donating Money as Y-axis?
Does the relationship between the two variables seem linear? Could you model it?
Find the value of the Pearson correlation coefficient that measures the association between the two variables and offer an interpretation.
Exercise
15
Exercise
Country Percentage Concerned Percentage Donating MoneyAustria 35.5 27.8Denmark 27.2 22.3Netherlands 30.1 44.8Philippines 50.1 6.8Russia 29 1.6Slovenia 50.3 10.7Spain 35.9 7.4United States 33.8 22.8Source : International Social Survey Programme, 2000.
16