computing in archaeology session 11. correlation and regression analysis © richard haddlesey

Post on 27-Mar-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computing in Computing in ArchaeologyArchaeology

Session 11. Correlation and Session 11. Correlation and regression analysisregression analysis

© Richard Haddlesey www.medievalarchitecture.net

Lecture aimsLecture aims

To introduce correlation and To introduce correlation and regression techniquesregression techniques

The scattergramThe scattergram

In correlation, we are always dealing In correlation, we are always dealing with with pairedpaired scores, and so values of scores, and so values of the the two variablestwo variables taken together taken together will be used to make a scattergramwill be used to make a scattergram

exampleexample

Quantities of New Forrest pottery Quantities of New Forrest pottery recovered from sites at varying distances recovered from sites at varying distances from the kilnsfrom the kilns

SiteSite Distance Distance (km)(km)

QuantityQuantity

11 44 9898

22 2020 6060

33 3232 4141

44 3434 4747

55 2424 6262

Negative correlationNegative correlation

Here we can see that the quantity of pottery decreases as distance from the source increases

Positive correlationPositive correlation

Here we see that the taller a pot, the wider the rim

Curvilinear monotonic relationCurvilinear monotonic relation

Again the further from source, the less quantity of artefacts

Arched relationship Arched relationship (non-monotonic)(non-monotonic)

Here we see the first molar increases with age and is then worn down as the animal gets older

scattergramscattergram

This shows us that scattergrams are This shows us that scattergrams are the most important means of the most important means of studying relationships between studying relationships between two two variablesvariables

REGRESSION

Regression differs from other techniques Regression differs from other techniques we have looked at so far in that it is we have looked at so far in that it is concerned not just with whether or not a concerned not just with whether or not a relationship exists, or the strength of that relationship exists, or the strength of that relationship, but with its naturerelationship, but with its nature

In regression analysis we use an In regression analysis we use an independent variable to estimate (or independent variable to estimate (or predict) the values of a dependent predict) the values of a dependent variablevariable

Regression equationRegression equation

y = f(x)

y = y axis (in this case the y = y axis (in this case the dependentdependent

f = function (of x)f = function (of x)

x = x axisx = x axis

y = f(x)

y = x y = 2x y = x2

General linear equationsGeneral linear equations

y = a + bxy = a + bx

Where y is the dependent variable, x Where y is the dependent variable, x is the independent variable, and the is the independent variable, and the coefficients a and b are constants, coefficients a and b are constants, i.e. they are fixed for a given datai.e. they are fixed for a given data

Therefore:Therefore: If x = 0 then the equation reduces to y = If x = 0 then the equation reduces to y =

a, so a represents the point where the a, so a represents the point where the regression line crosses the y axis (the regression line crosses the y axis (the interceptintercept))

The b constant defines the slope of The b constant defines the slope of gradient of the regression linegradient of the regression line

Thus for the pottery quantity in relation to Thus for the pottery quantity in relation to distance from source, b represents the distance from source, b represents the amount of decrease in pottery quantity amount of decrease in pottery quantity from the sourcefrom the source

y = a + bx

least-squares

least-squares

least-squares

least-squares

y = a + bx

y = a + bx

y = 102.64 – 1.8x

CORRELATION

CORRELATION

1 correlation coefficient

CORRELATION

1 correlation coefficient

2 significance

CORRELATION

1 correlation coefficient• r

2 significance

CORRELATION

1 correlation coefficient• r• -1 to +1

2 significance

• nominal – in name only

• ordinal – forming a sequence

• interval – a sequence with fixed distances

• ratio – fixed distances with a datum point

Levels of measurement:

• nominal

• ordinal

• interval

• ratio

Levels of measurement:

• nominal

• ordinal

• interval Product-Moment Correlation Coefficient• ratio

Levels of measurement:

• nominal

• ordinal Spearman’s Rank Correlation Coefficient• interval • ratio

Levels of measurement:

The Product-MomentCorrelation Coefficient

length (cm) width (cm)

sample – 20 bronze spearheads

n=20

length (cm) width (cm)

r = nΣxy – (Σx)(Σy) g √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

n=20

r = nΣxy – (Σx)(Σy) g √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

n=20

r = nΣxy – (Σx)(Σy) g √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

n=20

r = nΣxy – (Σx)(Σy) g= +0.67 √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

n=20

Test of product moment correlation coefficient

Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random

Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random

Sample statistics needed: n and r

Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random

Sample statistics needed: n and r

Test statistic: TS = r

Test of product moment correlation coefficient

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables approximately random

Sample statistics needed: n and r

Test statistic: TS = r

Table: product moment correlation coefficient table.

n = 20

n = 20 r = 0.67 p<0.01

n = 20 r = 0.67 p<0.01

length (cm) width (cm)

Spearman’s Rank Correlation Coefficient (rs)

Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal

Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal

Sample statistics needed: n and rs

Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal

Sample statistics needed: n and rs

Test statistic: TS = rs

Spearman’s Rank Correlation Coefficient (rs)

H0 : true correlation coefficient = 0

H1 : true correlation coefficient ≠ 0

Assumptions: both variables at least ordinal

Sample statistics needed: n and rs

Test statistic: TS = rs

Table: Spearman’s rank correlation coefficient table

top related