chapter 7 -part 1 correlation. correlation topics zco-relationship between two variables. zlinear vs...

35
Chapter 7 -Part 1 Correlation

Post on 20-Dec-2015

231 views

Category:

Documents


3 download

TRANSCRIPT

Chapter 7 -Part 1

Correlation

Correlation Topics

Co-relationship between two variables.

Linear vs Curvilinear relationships

Positive vs Negative relationships

Strength of relationship

Mythical relationship between Baseball and Football performance

AlBenChuckDavidEdFrankGeorge

Baseball skillVery goodVery poor

GoodTerrible

PoorAverageExcellent

Football skillVery goodVery poor

GoodTerrible

PoorAverageExcellent

Is this a linear relationship?

Baseball skill

predicts football skill.

Football skill predicts

baseball skill.There is a

strong relationship.

First we must arrange the scores in “order”

Baseball skillTerrible

Very PoorPoor

AverageGood

Very GoodExcellent

Football skillTerrible

Very PoorPoor

AverageGood

Very GoodExcellent

DavidBenEdFrankChuckAlGeorge

Then we plot the scores

* Ben

* Ed

* Frank

* Chuck

* Al

* David

* GeorgeExcellent

Terrible

Very Good

Good

Average

Poor

Very Poor

ExcellentTerrible Very GoodGoodAveragePoorVery Poor

FootballSkill

BaseballSkill

This is definitely a linear relationship!

Let’s get more abstract?Excellent

Terrible

Very Good

Good

Average

Poor

Very Poor

ExcellentTerrible Very GoodGoodAveragePoorVery Poor

FootballSkill

BaseballSkill

X

Y

3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

Linear or nonlinear? Let’s look at another set of values.

Football skillTerribleAverageAverage

Very GoodExcellent

GoodPoor

Baseball skillTerrible

Very PoorPoor

AverageGood

Very GoodExcellent

DavidBenEdFrankChuckAlGeorge

Is this a linear relationship?

Is this linear?

* Ben* Ed

* Frank

* Chuck

* Al

* David

* George

Excellent

Terrible

Very Good

Good

Average

Poor

Very Poor

ExcellentTerrible Very GoodGoodAveragePoorVery Poor

FootballSkill

BaseballSkill

NO! It is best described bya curved line.It is a curvilinear relationship!

Positive vs Negative relationships

In a positive relationship, as one value increases the other value tends to increase as well.

Example: The longer a sailboat is, the more it tends to cost. As length goes up, price tends to go up.

In a negative relationship, as one value increases, the other value decreases.

Example: The older a sailboat is, the less it tends to cost. As years go up, price tends to go down.

Positive vs Negative scatterplot

3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

Negativerelationship

Positiverelationship

Correlation Characteristics

Linear vs Curvilinear

The strength of a relationship tells us approximately how the dots will fall around a best fitting line.Perfect - scores fall exactly on a straight line.

Strong - most scores fall near the line.

Moderate - some are near the line, some not.

Weak – lots of scores fall close to the line, but many fall quite far from it.

Independent - the scores are not close to the line and form a circular or square pattern

Strength of a relationship3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

Perfect

Strength of a relationship3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

Strong

Strength of a relationship3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

Moderate

Strength of a relationship3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

Independent

What is this relationship?3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

What is this?3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

What is this?3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

What is this?3

-3

2

1

0

-1

-2

3 -3 2 1 0 -1 -2

Comparing apples to oranges? Use t scores!

You can use correlation to look for the relationship between ANY two values that you can measure of a single subject.

However, there may not be any relationship (independent).

A correlation tells us if scores are consistently similar on two measures, consistently different from each other, or have no real pattern

Comparing apples to oranges? Use t scores!To compare scores on two different

variables, you transform them into tX and tY scores.

tX and tY scores can be directly compared to each other to see whether they are consistently similar, consistently quite different, or show no consistent pattern of similarity or difference

Similar tX and tY scores = positive correlation. dissimilar = negative correlation. No pattern = independence.

When t scores are consistently more similar than different, we have a positive correlation.

When t scores are consistently more different than similar, we have a negative correlation.

When t scores show no consistent pattern of similarity or difference, we have independence.

Comparing variables

Anxiety symptoms, e.g., heartbeat, with number of hours driving to class.

Hat size with drawing ability.Math ability with verbal ability.Number of children with IQ.Turn them all into t scores

Pearson’s Correlation Coefficient

coefficient - noun, a number that serves as a measure of some property.

The correlation coefficient indexes the consistency and direction of a correlation

Pearson’s rho () is the parameter that characterizes the strength and direction of a linear relationship (and only a linear relationship) between two population variables.

Pearsons r is a least squares, unbiased estimate of rho.

Pearson’s Correlation Coefficientr and rho vary from -1.000 to +1.000.

A negative value indicates a negative relationship; a positive value indicates a positive relationship.

Values of r close to 1.000 or -1.000 indicate a strong (consistent) relationship; values close to 0.000 indicate a weak (inconsistent) or independent relationship.

r, strength and direction

Perfect, positive +1.00Strong, positive + .75Moderate, positive + .50Weak, positive + .25Independent .00Weak, negative - .25Moderate, negative - .50Strong, negative - .75 Perfect, negative -1.00

Calculating Pearson’s r

Select a random sample from a population; obtain scores on two variables, which we will call X and Y.

Convert all the scores into t scores.

Calculating Pearson’s r

First, subtract the tY score from the tX score in each pair.

Then square all of the differences and add them up, that is, (tX - tY)2.

Calculating Pearson’s r

Estimate the average squared distance between ZX and ZY by dividing by the sum of squared differences by(nP - 1), that is,

(tX - tY)2 / (nP - 1)

To turn this estimate into Pearson’s r, use the formula

r = 1 - (1/2 (tX - tY)2 / (nP - 1))

Note seeming exceptionUsually we divide a sum of squared

deviations around a mean by df to estimate the variance.

Here the sum of squares is not around a mean and we are not estimating a variance.

So you divide (tX - tY)2 by (nP - 1)

nP - 1 is not df for corr & regression (dfREG = nP - 2)

Example: Calculate t scores for X

DATA2468

10

X=30 N= 5

X=6.00 MSW = 40.00/(5-1) = 10

sX = 3.16

(X - X)2

16404

16

X - X-4-2024

tx=(X-X)/ s

-1.26-0.63 0.00 0.63 1.26

SSW = 40.00

Calculate t scores for Y

DATA9

11101213

Y=55 N= 5 Y=11.00 MSW = 10.00/(5-1) = 2.50

sY = 1.58

(Y - Y)2

40114

Y - Y-2-0-1+1+2

(ty=Y - Y) / s-1.26 0.00-0.63 0.63 1.26

SSW = 10.00

Calculate r

tY

-1.26 0.00-0.63 0.63 1.26

tX

-1.26-0.63 0.00 0.63 1.26

tX - tY

0.00-0.63 0.630.000.00

(tX - tY)2

0.000.400.400.000.00

(tX - tY)2 / (nP - 1)=0.200

r = 1.000 - (1/2 * ( (tX - tY)2 / (nP - 1)))

r = 1.000 - (1/2 * .200) = 1 - .100 = .900

(tX - tY)2=0.80

This is a verystrong, positive relationship.