correlation – pearson’s. what does it do? measures straight-line correlation – how close...
TRANSCRIPT
Correlation – Pearson’s
What does it do?• Measures straight-line correlation – how close plotted
points are to a straight line• Takes values between –1 and 1
Perfect negative correlation
-1 +10
No correlation
Perfect positive correlation
Planning to use it?
• You have continuous data (eg lengths, weights…) – it isn’t valid otherwise
• You have at least 5 data pairs (more is better)
• You want to use Pearson’s rather than rank correlation – does the scatter diagram look close to a straight line?
Make sure that…
How does it work?
• You assume (null hypothesis) there is no correlation
• The test involves calculating totals from your data and substituting into a formula. This works out how far off a straight line your points are
• The calculation can be done automatically on a spreadsheet, and on many graphic calculators
Doing the testThese are the stages in doing the test:1. Write down your hypotheses
2. Work out the totals needed for the formula
3. Use the formula to get a value for the correlation
4. Look at the tables
5. Make a decision Click here for an example
Click here to find out how to calculate a best-fit line
Hypotheses
H0: r = 0 (there is no correlation)
For H1, you have a choice, depending on what alternative you were looking for.
H1: r > 0 (positive correlation)or H1: r < 0 (negative correlation)or H0: r 0 (some correlation)
If you have a good scientific reason for expecting a particular kind of correlation, use one of the first two. If not, use the r 0
Totals• Get your data in table form like this, and complete the extra columns shown
x y x2 y2 xy1 5 1 25 52 7 4 49 144 6 16 36 246 11 36 121 66
• Total each column. This gives you x, y, x2, y2, and xy
Formula
2 22 2
1xy- x ynr =1 1x - x y - yn n
n = number of data pairs
x = sum of x-values, y = sum of y values etc
Tables
This is a Pearson’s correlation coefficient table
This is your number of pairs
These are your significance levels
eg 0.05 = 5%
Make a decision
• If your value is bigger than the tables value (ignoring signs), then you can reject the null hypothesis. Otherwise you must accept it.
• Make sure you choose the right tables value – it depends whether your test is 1 or 2 tailed: If you are using H1: r > 0 or H1: r < 0, you are doing
a 1-tailed test If you are using H1: r 0, you are doing a 2-tailed
test
Soil Salinity & Plant Height
The data below were collected on soil salinity and plant height.
Soil Salinity 28 12 15 16 2 5Plant Height (mm) 10 40 40 52 75 48
Hypotheses:
H0:r = 0 (no correlation)
H1 r 0 (some correlation)
TotalsSoil Salinity (x) 28 12 15 16 2 5
Plant Height (y) 10 40 40 52 75 48
x2 784 144 225 256 4 25
y2 100 1600 1600 2704 5625 2304
xy 280 480 600 832 150 240
x = 78y = 265x2 =1438 y2 = 13933 xy =2582
NB: You HAVE to work out y2 by squaring all the values and adding up. You CAN’T work out the sum of y, then square.
Formula
We now put all the totals into the formula:
2 2
12582- (78)(265)6
1 11438- 78 13933- 2656 6
-0.8r = 878=
Click here for some hints on working this out on a calculator
Pearson’s on the Calculator• First check if the calculator is “scientific” – that is, it automatically does
multiplication before additionTry 2 + 4 3. If you get 14, it does multiplication 1st
If you get 18, it doesn’t• Work out the top of the fraction.
For a scientific calculator, put it in exactly as shown ((78)(65) means 78 65)
For a non-scientific calculator, put in brackets 2582 – (1/6 78 65)
(-863)• Work out each part of the bottom of the fraction.
Non-scientific calculator: 1438 - (1/6 (782)) (424, 2228.833)• Multiply the two parts from the bottom together (945025.333)• Take the square root of previous answer – keep answer in memory (972.124)• Divide top of fraction by previous answer
The test
We have used H1 r 0 – so it is a 2-tailed test
Tables value (5% level): 0.8114
Our value: -0.8878
So we can reject H0 – there is some correlation
Calculating a Best-Fit Line
• If Pearson’s is significant, then it’s valid to calculate a best fit (regression) line
• The line has equationy = a + bx
where a and b can be calculated
• This lets you make predictions of the height of a plant given the soil salinity, by putting values of x into the equation
Finding the Line
The line has equation
y = a + bx 22
1xy- x ynb=1x - xn
y-bxa=
2
12582- (78)(265)6=
11438- 786
=-2.035b
44.167-(-2.035)(13)=70.622a=
So for the soil salinity, the line is:
So the equation is: y = 70.622 – 2.035x