correlation and regression
TRANSCRIPT
Chapter 14Correlation and Regression
PowerPoint Lecture Slides
Essentials of Statistics for the Behavioral Sciences Eighth Edition
by Frederick J. Gravetter and Larry B. Wallnau
Chapter 14 Learning Outcomes
•Understand Pearson r as measure of variables’ relationship
1
•Compute Pearson r using definitional or computational formula
2
•Use and interpret Pearson r; understand assumptions & limitations
3
•Test hypothesis about population correlation (ρ) with sample r
4
•Understand the concept of a partial correlation
5
Chapter 14 Learning Outcomes (continued)
•Explain/compute Spearman correlation coefficient (ranks)
6
•Explain/compute point-biserial correlation coefficient (one dichotomous variable)
7
•Explain/compute phi-coefficient for two dichotomous variables
8
•Explain/compute linear regression equation to predict Y values
9
•Evaluate significance of regression equation
10
Tools You Will Need
• Sum of squares (SS) (Chapter 4)– Computational formula– Definitional formula
• z-Scores (Chapter 5)• Hypothesis testing (Chapter 8)• Analysis of Variance (Chapter 12)
– MS values and F-ratios
14.1 Introduction to Correlation
• Measures and describes the relationship between two variables
• Characteristics of relationships– Direction (negative or positive; indicated by the
sign, + or – of the correlation coefficient)– Form (linear is most common)– Strength or consistency (varies from 0 to 1)
• Characteristics are all independent
14.2 The Pearson Correlation
• Measures the degree and the direction of the linear relationship between two variables
• Perfect linear relationship – Every change in X has a corresponding change in Y– Correlation will be –1.00 or +1.00
y separatelY and X of variablity
Y and X ofity covariabilr
Sum of Products (SP)
• Similar to SS (sum of squared deviations)• Measures the amount of covariability
between two variables• SP definitional formula:
))(( YX MYMXSP
SP – Computational formula
• Definitional formula emphasizes SP as the sum of two difference scores
• Computational formula results in easier calculations
• SP computational formula:
n
YXXYSP
Pearson Correlation Calculation
• Ratio comparing the covariability of X and Y (numerator) with the variability of X and Y separately (denominator)
YX SSSS
SPr
Pearson Correlation andz-Scores
• Pearson correlation formula can be expressed as a relationship of z-scores.
N
zz
n
zzr
YX
YX
:Population
1 :Sample
Learning Check• A scatterplot shows a set of data points that fit
very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?
•0.75
A
•0.35
B
•-0.75
C
•-0.35
D
Learning Check - Answer• A scatterplot shows a set of data points that fit
very loosely around a line that slopes down to the right. Which of the following values would be closest to the correlation for these data?
•0.75
A
•0.35
B
•-0.75
C
•-0.35
D
Learning Check
• Decide if each of the following statements is True or False
•A set of n = 10 pairs of X and Y scores has ΣX = ΣY = ΣXY = 20. For this set of scores, SP = –20
T/F
•If the Y variable decreases when the X variable decreases, their correlation is negative
T/F
Learning Check - Answers
True
•The variables change in the same direction, a positive correlation
False
20402010
)20)(20(20 SP
14.3 Using and Interpreting the Pearson Correlation
• Correlations used for:– Prediction– Validity– Reliability– Theory verification
Interpreting Correlations
• Correlation describes a relationship but does not demonstrate causation
• Establishing causation requires an experiment in which one variable is manipulated and others carefully controlled
• Example 14.4 (and Figure 14.5) demonstrates the fallacy of attributing causation after observing a correlation
Correlations and Restricted Range of Scores
• Correlation coefficient value (size) will be affected by the range of scores in the data
• Severely restricted range may provide a very different correlation than would a broader range of scores
• To be safe, never generalize a correlation beyond the sample range of data
Correlations and Outliers
• An outlier is an extremely deviant individual in the sample
• Characterized by a much larger (or smaller) score than all the others in the sample
• In a scatter plot, the point is clearly different from all the other points
• Outliers produce a disproportionately large impact on the correlation coefficient
Correlations and the Strength of the Relationship
• A correlation coefficient measures the degree of relationship on a scale from 0 to 1.00
• It is easy to mistakenly interpret this decimal number as a percent or proportion
• Correlation is not a proportion• Squared correlation may be interpreted as the
proportion of shared variability• Squared correlation is called the coefficient of
determination
Coefficient of Determination
• Coefficient of determination measures the proportion of variability in one variable that can be determined from the relationship with the other variable (shared variability)
2rionDeterminat of oefficientC
14.4 Hypothesis Tests with the Pearson Correlation
• Pearson correlation is usually computed for sample data, but used to test hypotheses about the relationship in the population
• Population correlation shown by Greek letter rho (ρ)
• Non-directional: H0: ρ = 0 and H1: ρ ≠ 0Directional: H0: ρ ≤ 0 and H1: ρ > 0 or Directional: H0: ρ ≥ 0 and H1: ρ < 0
Correlation Hypothesis Test
• Sample correlation r used to test population ρ• Degrees of freedom (df) = n – 2• Hypothesis test can be computed using
either t or F; only t shown in this chapter• Use t table to find critical value with df = n - 2
)2()1( 2
nr
rt
In the Literature
• Report– Whether it is statistically significant
• Concise test results– Value of correlation– Sample size– p-value or level– Type of test (one- or two-tailed)
• E.g., r = -0.76, n = 48, p < .01, two tails
Partial Correlation
• A partial correlation measures the relationship between two variables while mathematically controlling the influence of a third variable by holding it constant
)1)(1(
)(22yzxz
yzxyxyzxy
rr
rrrr
14.5 Alternatives to the Pearson Correlation
• Pearson correlation has been developed– For data having linear relationships– With data from interval or ratio measurement
scales• Other correlations have been developed
– For data having non-linear relationships – With data from nominal or ordinal measurement
scales
Spearman Correlation
• Spearman (rs) correlation formula is used with data from an ordinal scale (ranks)– Used when both variables are measured on an
ordinal scale– Also may be used if measurement scales is interval
or ratio when relationship is consistently directional but may not be linear
Ranking Tied Scores
• Tie scores need ranks for Spearman correlation
• Method for assigning rank– List scores in order from smallest to largest– Assign a rank to each position in the list– When two (or more) scores are tied, compute the
mean of their ranked position, and assign this mean value as the final rank for each score.
Special Formula for the Spearman Correlation
• The ranks for the scores are simply integers• Calculations can be simplified
– Use D as the difference between the X rank and the Y rank for each individual to compute the rs statistic
)1(
61
2
2
nn
Drs
Point-Biserial Correlation
• Measures relationship between two variables– One variable has only two values
(called a dichotomous or binomial variable)• Effect size for independent samples t-test in
Chapter 10 can be measures by r2 – Point-biserial r2 has same value as the r2 computed
from t-statistic– t-statistic tests significance of the mean difference – r statistic measures the correlation size
Point-Biserial Correlation
• Applicable in the same situation as the independent-measures t test in Chapter 10– Code one group 0 and the other 1 (or any two
digits) as the Y score– t-statistic evaluates the significance of mean
difference– Point-Biserial r measures correlation magnitude – r2 quantifies effect size
Phi Coefficient
• Both variables (X and Y) are dichotomous– Both variables are re-coded to values 0 and 1 (or
any two digits)– The regular Pearson formulas is used to calculate r– r2 (coefficient of determination) measures effect
size (proportion of variability in one score predicted by the other)
Learning Check• Participants were classified as “morning people”
or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?
•Pearson correlation
A
•Spearman correlation
B
•Point-biserial correlation
C
•Phi-coefficient
D
Learning Check - Answer• Participants were classified as “morning people”
or “evening people” then measured on a 50-point conscientiousness scale. Which correlation should be used to measure the relationship?
•Pearson correlation
A
•Spearman correlation
B
•Point-biserial correlation
C
•Phi-coefficient
D
Learning Check
• Decide if each of the following statements is True or False
•The Spearman correlation is used with dichotomous data
T/F
•In a non-directional significance test of a correlation, the null hypothesis states that the population correlation is zero
T/F
Learning Check - Answers
•The Spearman correlation uses ordinal (ranked) data
False
•Null hypothesis assumes no relationship; ρ = zero indicates no relationship in the population
True
14.6 Introduction to Linear Equations and Regression
• The Pearson correlation measures a linear relationship between two variables
• Figure 14.13 makes the relationship obvious• The line through the data
– Makes the relationship easier to see– Shows the central tendency of the relationship– Can be used for prediction
• Regression analysis precisely defines the line
Linear Equations
• General equation for a line– Equation: Y = bX + a– X and Y are variables– a and b are fixed constant
Regression
• Regression is a method of finding an equation describing the best-fitting line for a set of data
• How to define a “best fitting” straight line when there are many possible straight lines?
• The answer: a line that is the best fit for the actual data that minimizes prediction errors
Regression
• Ŷ is the value of Y predicted by the regression equation (regression line) for each value of X
• (Y- Ŷ) is the distance each data point is from the regression line: the error of prediction
• The regression procedure produces a line that minimizes total squared error of prediction
• This method is called the least-squared-error solution
Regression Equations
• Regression line equation: Ŷ = bX + a
• The slope of the line, b, can be calculated
• The line goes through (MX,MY) thereforeX
Y
X s
srb
SS
SPb or
XY bMMa
Standard Error of Estimate
• Regression equation makes a prediction• Precision of the estimate is measured by the
standard error of estimate (SEoE)
SEoE =2
)ˆ( 2
n
YY
df
SSresidual
Relationship Between Correlation and Standard Error of Estimate
• As r goes from 0 to 1, SEoE decreases to 0• Predicted variability in Y scores:
SSregression = r2 SSY
• Unpredicted variability in Y scores: SSresidual = (1 - r2) SSY
• Standard Error of Estimate based on r:
2
)1( 2
n
SSr
df
SS Yresidual
Testing Regression Significance
• Analysis of Regression– Similar to Analysis of Variance– Uses an F-ratio of two Mean Square values– Each MS is a SS divided by its df
• H0: the slope of the regression line (b or beta) is zero
Mean Squares and F-ratio
residual
residualresidual df
SSMS
regression
regressionregression df
SSMS
residual
regression
MS
MSF
Learning Check
• A linear regression has b = 3 and a = 4. What is the “predicted Y” (Ŷ) for X = 7?
•14
A•2
5
B
•31
C
•Cannot be determined
D
Learning Check - Answer
• A linear regression has b = 3 and a = 4. What is the predicted Y for X = 7?
•14
A
•25
B
•31
C
•Cannot be determined
D
Learning Check
• Decide if each of the following statements is True or False
•It is possible for the regression equation to place none of the actual data points on the regression line
T/F
•If r = 0.58, the linear regression equation predicts about one third of the variance in the Y scores
T/F
Learning Check - Answers
•The line estimates where points should be but there are almost always prediction errors
True
•When r = .58, r2 = .336 (≈1/3)
True