basic concepts of correlation. definition a correlation exists between two variables when the values...

35
Basic Concepts of Correlation

Upload: adrian-rich

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Basic Concepts of Correlation

Page 2: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Definition

A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way.

Page 3: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Definition

The linear correlation coefficient r measures the strength of the linear relationship between the paired quantitative x- and y-values in a sample.

Page 4: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Exploring the Data

We can often see a relationship between two variables by constructing a scatterplot.

Figure 10-2 following shows scatterplots with different characteristics.

Page 5: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Scatterplots of Paired Data

Figure 10-2

Page 6: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Scatterplots of Paired Data

Figure 10-2

Page 7: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Scatterplots of Paired Data

Figure 10-2

Page 8: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Requirements

1. The sample of paired (x, y) data is a simple random sample of quantitative data.

2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern.

3. The outliers must be removed ONLY if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included.

Page 9: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Notation for the Linear Correlation Coefficient

n = number of pairs of sample data

denotes the addition of the items indicated.

x denotes the sum of all x-values.

x2 indicates that each x-value should be squared and then those squares added.

(x)2 indicates that the x-values should be added and then the total squared.

Page 10: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Notation for the Linear Correlation Coefficient

xy indicates that each x-value should be first multiplied by its corresponding y-value. After obtaining all such products, find their sum.

r = linear correlation coefficient for sample data.

= linear correlation coefficient for population data.

Page 11: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Formula 10-1

nxy – (x)(y)

n(x2) – (x)2 n(y2) – (y)2r =

The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample.

Computer software or calculators can compute r

Formula

Page 12: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Interpreting r

Using Table A-6: If the absolute value of the computed value of r, denoted |r|, exceeds the value in Table A-6, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation.

Using Software: If the computed P-value is less than or equal to the significance level, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation.

Page 13: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

- Ken Carroll, 1975

Correlation of incidence of death due to breast cancer with animal fat intake

Page 14: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

- Ken Carroll, 1975

Correlation of incidence of death due to breast cancer with vegetable fat intake

Page 15: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

From Francis Anscombe , 1973… each plot has the same mean (y=7.5), SD (4.12), r, (0.816), and linear regression line (y = 3 + 0.5x)…

Page 16: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

p p p p r2

df 0.10 0.05 0.02 0.01…8 0.549 0.632 0.716 0.765 0.5859 0.521 0.602 0.685 0.735 0.54010 0.497 0.576 0.658 0.708 0.50111 0.476 0.553 0.634 0.684 0.46712 0.458 0.532 0.612 0.661 0.43713 0.441 0.514 0.592 0.641 0.41014 0.426 0.497 0.574 0.623 0.38815 0.412 0.482 0.558 0.606 0.36716 0.400 0.468 0.542 0.590 0.34817 0.389 0.456 0.528 0.575 0.33118 0.378 0.444 0.516 0.561 0.31519 0.369 0.433 0.503 0.549 0.28820 0.360 0.423 0.492 0.537 0.65130 0.296 0.349 0.409 0.449 0.20240 0.257 0.304 0.358 0.393 0.15450 0.231 0.273 0.322 0.354 0.12560 0.211 0.250 0.295 0.325 0.10680 0.183 0.217 0.256 0.283 0.080100 0.164 0.195 0.230 0.254 0.065

From the previous slide: note that with df of 9r = 0.816 is significant at p < 0.01

A meaningless correlation in anyone’s book!!!

Page 17: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Interpreting the Linear Correlation Coefficient r

Critical Values from Table A-6 and the Computed Value of r

Page 18: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Interpreting r:Explained Variation

The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

Page 19: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Common Errors Involving Correlation

1. Causation: It is wrong to conclude that correlation implies causality.

2. Averages: Averages suppress individual variation and may inflate the correlation coefficient.

3. Linearity: There may be some relationship between x and y even when there is no linear correlation.

Page 20: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Part 1: Basic Concepts of Regression

Page 21: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Regression

The typical equation of a straight liney = mx + b is expressed in the formy = b0 + b1x, where b0 is the y-intercept and b1 is the slope.

^

The regression equation expresses a relationship between x (called the explanatory variable, predictor variable or independent variable), and y (called the response variable or dependent variable).

^

Page 22: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Definitions Regression Equation

Given a collection of paired data, the regression equation

Regression Line

The graph of the regression equation is called the regression line (or line of best fit, or least squares line).

y = b0 + b1x^

algebraically describes the relationship between the two variables.

Page 23: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Notation for Regression Equation

y-intercept of regression equation

Slope of regression equation

Equation of the regression line

PopulationParameter

SampleStatistic

0 b0

1 b1

y = 0 + 1 x y = b0 + b1x

^

Page 24: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Requirements

1. The sample of paired (x, y) data is a random sample of quantitative data.

2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern.

3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors.

Page 25: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Formulas for b0 and b1

Formula 10-3 (slope)

(y-intercept)Formula 10-4

calculators or computers can compute these values

b0y b

1x

b

1r

sy

sx

Page 26: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

The regression line fits the sample points best.

Special Property

Page 27: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Rounding the y-intercept b0 and the Slope b1

Round to three significant digits.

If you use the formulas 10-3 and 10-4, do not round intermediate values.

Page 28: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

- Ken Carroll, 1975

From Francis Anscombe , 1973… each plot has the same mean (y=7.5), SD (4.12), r, (0.816), and linear regression line (y = 3 + 0.5x)…

Page 29: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

We should know that the regression equation is an estimate of the true regression equation. This estimate is based on one particular set of sample data, but another sample drawn from the same population would probably lead to a slightly different equation.

Page 30: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

1. Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well?????.

Using the Regression Equation for Predictions

2. Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables?????

Think… r2

Page 31: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

A

% Fat % Fat Based on actual BD Based on predicted BD

A 25.96% 13.92%

B 7.91% 13.92%B

Page 32: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of
Page 33: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Be VERY skeptical of individual “results” based on linear regression equations being used to make multiple serial predictions

Page 34: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of

Summary

- correlation is used to calculate the strength of a linear relationship between paired variables from a sample

- r value > or < 0 indicates a non-zero association between the 2 variables

- a scatter plot is used to visualize the relationship between the 2 variables

- r CANNOT indicate a causal relationship

- interpret correlation by using r2… indicating the proportion of variation in variable A accounted for by the variation in variable B “significant r only means NOT a “0 association”

- regression equation describes the association between the 2 variables

- regression line is a graph of the regression equation describing the best fit line of the scatter plot

- only with confidence limits can you properly interpret a linear regression line

Page 35: Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of