correlation – pearson’s. what does it do? measures straight-line correlation – how close...

17
Correlation – Pearson’s

Upload: jonathan-wheeler

Post on 31-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Correlation – Pearson’s

Page 2: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

What does it do?• Measures straight-line correlation – how close plotted

points are to a straight line• Takes values between –1 and 1

Perfect negative correlation

-1 +10

No correlation

Perfect positive correlation

Page 3: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Planning to use it?

• You have continuous data (eg lengths, weights…) – it isn’t valid otherwise

• You have at least 5 data pairs (more is better)

• You want to use Pearson’s rather than rank correlation – does the scatter diagram look close to a straight line?

Make sure that…

Page 4: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

How does it work?

• You assume (null hypothesis) there is no correlation

• The test involves calculating totals from your data and substituting into a formula. This works out how far off a straight line your points are

• The calculation can be done automatically on a spreadsheet, and on many graphic calculators

Page 5: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Doing the testThese are the stages in doing the test:1. Write down your hypotheses

2. Work out the totals needed for the formula

3. Use the formula to get a value for the correlation

4. Look at the tables

5. Make a decision Click here for an example

Click here to find out how to calculate a best-fit line

Page 6: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Hypotheses

H0: r = 0 (there is no correlation)

For H1, you have a choice, depending on what alternative you were looking for.

H1: r > 0 (positive correlation)or H1: r < 0 (negative correlation)or H0: r 0 (some correlation)

If you have a good scientific reason for expecting a particular kind of correlation, use one of the first two. If not, use the r 0

Page 7: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Totals• Get your data in table form like this, and complete the extra columns shown

x y x2 y2 xy1 5 1 25 52 7 4 49 144 6 16 36 246 11 36 121 66

• Total each column. This gives you x, y, x2, y2, and xy

Page 8: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Formula

2 22 2

1xy- x ynr =1 1x - x y - yn n

n = number of data pairs

x = sum of x-values, y = sum of y values etc

Page 9: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Tables

This is a Pearson’s correlation coefficient table

This is your number of pairs

These are your significance levels

eg 0.05 = 5%

Page 10: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Make a decision

• If your value is bigger than the tables value (ignoring signs), then you can reject the null hypothesis. Otherwise you must accept it.

• Make sure you choose the right tables value – it depends whether your test is 1 or 2 tailed: If you are using H1: r > 0 or H1: r < 0, you are doing

a 1-tailed test If you are using H1: r 0, you are doing a 2-tailed

test

Page 11: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Soil Salinity & Plant Height

The data below were collected on soil salinity and plant height.

Soil Salinity 28 12 15 16 2 5Plant Height (mm) 10 40 40 52 75 48

Hypotheses:

H0:r = 0 (no correlation)

H1 r 0 (some correlation)

Page 12: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

TotalsSoil Salinity (x) 28 12 15 16 2 5

Plant Height (y) 10 40 40 52 75 48

x2 784 144 225 256 4 25

y2 100 1600 1600 2704 5625 2304

xy 280 480 600 832 150 240

x = 78y = 265x2 =1438 y2 = 13933 xy =2582

NB: You HAVE to work out y2 by squaring all the values and adding up. You CAN’T work out the sum of y, then square.

Page 13: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Formula

We now put all the totals into the formula:

2 2

12582- (78)(265)6

1 11438- 78 13933- 2656 6

-0.8r = 878=

Click here for some hints on working this out on a calculator

Page 14: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Pearson’s on the Calculator• First check if the calculator is “scientific” – that is, it automatically does

multiplication before additionTry 2 + 4 3. If you get 14, it does multiplication 1st

If you get 18, it doesn’t• Work out the top of the fraction.

For a scientific calculator, put it in exactly as shown ((78)(65) means 78 65)

For a non-scientific calculator, put in brackets 2582 – (1/6 78 65)

(-863)• Work out each part of the bottom of the fraction.

Non-scientific calculator: 1438 - (1/6 (782)) (424, 2228.833)• Multiply the two parts from the bottom together (945025.333)• Take the square root of previous answer – keep answer in memory (972.124)• Divide top of fraction by previous answer

Page 15: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

The test

We have used H1 r 0 – so it is a 2-tailed test

Tables value (5% level): 0.8114

Our value: -0.8878

So we can reject H0 – there is some correlation

Page 16: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Calculating a Best-Fit Line

• If Pearson’s is significant, then it’s valid to calculate a best fit (regression) line

• The line has equationy = a + bx

where a and b can be calculated

• This lets you make predictions of the height of a plant given the soil salinity, by putting values of x into the equation

Page 17: Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1

Finding the Line

The line has equation

y = a + bx 22

1xy- x ynb=1x - xn

y-bxa=

2

12582- (78)(265)6=

11438- 786

=-2.035b

44.167-(-2.035)(13)=70.622a=

So for the soil salinity, the line is:

So the equation is: y = 70.622 – 2.035x