correlation – pearson’s. what does it do? measures straight-line correlation – how close...

Post on 31-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Correlation – Pearson’s

What does it do?• Measures straight-line correlation – how close plotted

points are to a straight line• Takes values between –1 and 1

Perfect negative correlation

-1 +10

No correlation

Perfect positive correlation

Planning to use it?

• You have continuous data (eg lengths, weights…) – it isn’t valid otherwise

• You have at least 5 data pairs (more is better)

• You want to use Pearson’s rather than rank correlation – does the scatter diagram look close to a straight line?

Make sure that…

How does it work?

• You assume (null hypothesis) there is no correlation

• The test involves calculating totals from your data and substituting into a formula. This works out how far off a straight line your points are

• The calculation can be done automatically on a spreadsheet, and on many graphic calculators

Doing the testThese are the stages in doing the test:1. Write down your hypotheses

2. Work out the totals needed for the formula

3. Use the formula to get a value for the correlation

4. Look at the tables

5. Make a decision Click here for an example

Click here to find out how to calculate a best-fit line

Hypotheses

H0: r = 0 (there is no correlation)

For H1, you have a choice, depending on what alternative you were looking for.

H1: r > 0 (positive correlation)or H1: r < 0 (negative correlation)or H0: r 0 (some correlation)

If you have a good scientific reason for expecting a particular kind of correlation, use one of the first two. If not, use the r 0

Totals• Get your data in table form like this, and complete the extra columns shown

x y x2 y2 xy1 5 1 25 52 7 4 49 144 6 16 36 246 11 36 121 66

• Total each column. This gives you x, y, x2, y2, and xy

Formula

2 22 2

1xy- x ynr =1 1x - x y - yn n

n = number of data pairs

x = sum of x-values, y = sum of y values etc

Tables

This is a Pearson’s correlation coefficient table

This is your number of pairs

These are your significance levels

eg 0.05 = 5%

Make a decision

• If your value is bigger than the tables value (ignoring signs), then you can reject the null hypothesis. Otherwise you must accept it.

• Make sure you choose the right tables value – it depends whether your test is 1 or 2 tailed: If you are using H1: r > 0 or H1: r < 0, you are doing

a 1-tailed test If you are using H1: r 0, you are doing a 2-tailed

test

Soil Salinity & Plant Height

The data below were collected on soil salinity and plant height.

Soil Salinity 28 12 15 16 2 5Plant Height (mm) 10 40 40 52 75 48

Hypotheses:

H0:r = 0 (no correlation)

H1 r 0 (some correlation)

TotalsSoil Salinity (x) 28 12 15 16 2 5

Plant Height (y) 10 40 40 52 75 48

x2 784 144 225 256 4 25

y2 100 1600 1600 2704 5625 2304

xy 280 480 600 832 150 240

x = 78y = 265x2 =1438 y2 = 13933 xy =2582

NB: You HAVE to work out y2 by squaring all the values and adding up. You CAN’T work out the sum of y, then square.

Formula

We now put all the totals into the formula:

2 2

12582- (78)(265)6

1 11438- 78 13933- 2656 6

-0.8r = 878=

Click here for some hints on working this out on a calculator

Pearson’s on the Calculator• First check if the calculator is “scientific” – that is, it automatically does

multiplication before additionTry 2 + 4 3. If you get 14, it does multiplication 1st

If you get 18, it doesn’t• Work out the top of the fraction.

For a scientific calculator, put it in exactly as shown ((78)(65) means 78 65)

For a non-scientific calculator, put in brackets 2582 – (1/6 78 65)

(-863)• Work out each part of the bottom of the fraction.

Non-scientific calculator: 1438 - (1/6 (782)) (424, 2228.833)• Multiply the two parts from the bottom together (945025.333)• Take the square root of previous answer – keep answer in memory (972.124)• Divide top of fraction by previous answer

The test

We have used H1 r 0 – so it is a 2-tailed test

Tables value (5% level): 0.8114

Our value: -0.8878

So we can reject H0 – there is some correlation

Calculating a Best-Fit Line

• If Pearson’s is significant, then it’s valid to calculate a best fit (regression) line

• The line has equationy = a + bx

where a and b can be calculated

• This lets you make predictions of the height of a plant given the soil salinity, by putting values of x into the equation

Finding the Line

The line has equation

y = a + bx 22

1xy- x ynb=1x - xn

y-bxa=

2

12582- (78)(265)6=

11438- 786

=-2.035b

44.167-(-2.035)(13)=70.622a=

So for the soil salinity, the line is:

So the equation is: y = 70.622 – 2.035x

top related