correlation & regression. correlation t-tests and anova examine the mean differences between two...

43
Correlation & Correlation & Regression Regression

Post on 22-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

Correlation & Correlation & RegressionRegression

Page 2: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

T-tests and ANOVA examine the mean T-tests and ANOVA examine the mean differences between two + levels of one or differences between two + levels of one or more IV’s on a DVmore IV’s on a DV– i.e. differences between males and females (2 i.e. differences between males and females (2

levels of the IV “gender”) on exam scoreslevels of the IV “gender”) on exam scores What if instead of average differences we What if instead of average differences we

were more interested in the relationship were more interested in the relationship between two variables?between two variables?– ““relationship” = how one variable changes as a relationship” = how one variable changes as a

function of another variablefunction of another variable

Page 3: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

– i.e. the relationship between anxiety prior to a i.e. the relationship between anxiety prior to a medical procedure and the patient’s post-op medical procedure and the patient’s post-op recoveryrecovery

This type of question concerns what is This type of question concerns what is called a called a correlationcorrelation– CorrelationCorrelation = relationship between two = relationship between two

variablesvariables NOTENOTE – if we were looking at – if we were looking at averageaverage post-op post-op

recovery (the DV) in groups both high and low in pre-recovery (the DV) in groups both high and low in pre-op anxiety (2 levels of the IV anxiety), we would be op anxiety (2 levels of the IV anxiety), we would be looking at mean differences, and an ANOVA would be looking at mean differences, and an ANOVA would be more appropriate than correlationmore appropriate than correlation

Page 4: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation The easiest means of representing this The easiest means of representing this

relationship/correlation is via the use of a relationship/correlation is via the use of a scatterplotscatterplot– ScatterplotScatterplot = a graph in which the individual data points are = a graph in which the individual data points are

plotted in two-dimensionsplotted in two-dimensions

Depression

50403020100-10

Pe

ssim

ism

8

7

6

5

4

3

2

1

0

Page 5: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Depression

50403020100-10

Pe

ssim

ism

8

7

6

5

4

3

2

1

0

Predictor VariablePredictor Variable = traditionally the variable on the x-axis (in this case = traditionally the variable on the x-axis (in this case “Depression”)“Depression”)

Criterion VariableCriterion Variable = traditionally the variable on the y-axis (in this case = traditionally the variable on the y-axis (in this case “Pessimism”)“Pessimism”)

Best-Fit Line/Regression LineBest-Fit Line/Regression Line = the line that represents the area in = the line that represents the area in space that each data point is minimally distant from/that best space that each data point is minimally distant from/that best represents the datarepresents the data

Page 6: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Regression LineRegression Line– ““Best” fit = line Best” fit = line

that minimizes that minimizes average distance average distance from all data points from all data points (i.e. residuals)(i.e. residuals)

– Residual = Amount Residual = Amount that data point that data point deviates from this deviates from this line line

Page 7: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

It is important to note that although the It is important to note that although the predictor is usually the variable on the x-predictor is usually the variable on the x-axis, and the criterion on the y-axis, that axis, and the criterion on the y-axis, that often these definitions are not adhered to often these definitions are not adhered to and the variables are named randomlyand the variables are named randomly

Also, because one variable is called the Also, because one variable is called the predictor predictor does not meandoes not mean that it “predicts” that it “predicts” the criterion in the sense that it can tell you the criterion in the sense that it can tell you what the criterion is before it occurswhat the criterion is before it occurs

– i.e. to say that depression predicts pessimism i.e. to say that depression predicts pessimism does not mean that depression comes first and does not mean that depression comes first and causes you to be pessimistic!causes you to be pessimistic!

Page 8: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Correlation does not equal causation!Correlation does not equal causation!– the only way that you can say that one the only way that you can say that one

variable predicts another variable predicts another in timein time is through the is through the design of your experimentdesign of your experiment if depression were assessed in January and if depression were assessed in January and

pessimism were assessed in December, and the two pessimism were assessed in December, and the two were found to be related, then you can say that one were found to be related, then you can say that one predicts the other predicts the other in timein time

statistical prediction ≠ “prediction”statistical prediction ≠ “prediction”

– if the two variables were measured at the if the two variables were measured at the same time, we do not know which one caused same time, we do not know which one caused the other onethe other one

Page 9: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

– to determine causation (that one variable to determine causation (that one variable caused another) we need to show several caused another) we need to show several things:things:A.A. that the predictor preceded the criterion in time that the predictor preceded the criterion in time

(this also shows that the criterion did not cause the (this also shows that the criterion did not cause the predictor)predictor)

B.B. that other variables did not cause both the criterion that other variables did not cause both the criterion and the predictor at the same time, resulting in and the predictor at the same time, resulting in their relationshiptheir relationship

IVIV DVDV

Var 1Var 1

Page 10: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

– i.e. if we were studying the relationship i.e. if we were studying the relationship (correlation) between two variables: the length (correlation) between two variables: the length of grass and ice cream consumptionof grass and ice cream consumption If they were measured simultaneously it would be If they were measured simultaneously it would be

impossible to tell which caused whichimpossible to tell which caused which If both were measured at two time points, July and If both were measured at two time points, July and

December, we would find that they both increase and December, we would find that they both increase and decrease at the same time (i.e. one does not seem to decrease at the same time (i.e. one does not seem to cause the other) – cause the other) – no causationno causation

If we measured temperature as well, we would find If we measured temperature as well, we would find that both are correlated because increases in that both are correlated because increases in temperature causes both, which explains why the temperature causes both, which explains why the increase and decrease at the same timeincrease and decrease at the same time

Page 11: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Correlation is represented by the Correlation is represented by the Pearson Pearson Product-Moment Correlation Coefficient (r)Product-Moment Correlation Coefficient (r)– can range from -1 to 1, where 1 represents a can range from -1 to 1, where 1 represents a

strong strong positive relationshippositive relationship, -1 a strong , -1 a strong negative relationshipnegative relationship, and 0 , and 0 no relationship no relationship between the two variablesbetween the two variables both strong positive and negative relationships are, both strong positive and negative relationships are,

none-the-less, robust relationships and are generally none-the-less, robust relationships and are generally meaningful – meaningful – a negative relationship is not bada negative relationship is not bad

– only used when the two variables are only used when the two variables are continuous/dimensionalcontinuous/dimensional

Page 12: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Positive Relationship (r = .82)Positive Relationship (r = .82)– As BDI2TOT increases, MASQGDD also increasesAs BDI2TOT increases, MASQGDD also increases

BDI2TOT

50403020100-10

MA

SQ

GD

D70

60

50

40

30

20

10

0

Page 13: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Negative Relationship (r = -.679)Negative Relationship (r = -.679)– As MASQAD increases, TMMSREP decreasesAs MASQAD increases, TMMSREP decreases

MASQAD

12010080604020

TM

MS

RE

P6

5

4

3

2

1

Page 14: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrealtionCorrealtion

No Relationship (r = .00)No Relationship (r = .00)– Information about Explanatory Flexibility tells Information about Explanatory Flexibility tells

you nothing about Emotional Insightyou nothing about Emotional Insight

Explanatory Flexibility

3.53.02.52.01.51.0.50.0-.5

AS

IS -

Em

otio

na

l In

sig

ht

8

7

6

5

4

3

2

1

Page 15: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Pearson’s Pearson’s rr is heavily reliant on the is heavily reliant on the covariancecovariance

covcovxyxy = =

If variance =If variance =

……then cov is just average variability then cov is just average variability in both x and yin both x and y

1

N

yyxx

11

2

N

xxxx

N

xx

Page 16: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation Error variance = average amount each Error variance = average amount each

point deviates from best-fit line = point deviates from best-fit line = standard standard error of the estimateerror of the estimate = = ssy.xy.x

ssy.x y.x = =

If If Ŷ is point on best fit line (predicted value Ŷ is point on best fit line (predicted value of Y), then of Y), then ssy.x y.x = standard deviation of = standard deviation of residuals or variance of residuals/error = residuals or variance of residuals/error = error varianceerror variance

2

ˆ 2

N

YY

Page 17: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Pearson’s Pearson’s rr = cov = covxyxy/s/sxxssyy

Correlation = amount of shared Correlation = amount of shared variability/variability/√(√(total variability)total variability)– Since it’s like a %, Since it’s like a %, rr ranges from 0 – ranges from 0 –

(-)1.00(-)1.00– In fact, by squaring r (rIn fact, by squaring r (r22) = % variability ) = % variability

that is shared between x and ythat is shared between x and y Previous example of BDI2 and MASQGDD, r Previous example of BDI2 and MASQGDD, r

= .82; r= .82; r22 = .67 = .67 67% of variance in BDI2 is 67% of variance in BDI2 is predicted by MASQGDDpredicted by MASQGDD

Page 18: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Hypotheses in Correlation:Hypotheses in Correlation:– HH00 = = ρρ = 0 = 0

ρρ (rho) = correlation in population (rho) = correlation in population (parameter)(parameter)

– HH11 = = ρρ ≠ 0 ≠ 0

Page 19: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Assumptions of Correlation Assumptions of Correlation (Pearson’s (Pearson’s rr))

1.1. Nonlinear/Curvilinear RelationshipsNonlinear/Curvilinear Relationships If the relationship between the two If the relationship between the two

variables is not linear, and is instead U-variables is not linear, and is instead U-shaped or bell-shaped (like our normal shaped or bell-shaped (like our normal distribution), our attempts at finding a distribution), our attempts at finding a best-fit best-fit lineline will fail, and it will seem as will fail, and it will seem as though our two variables are unrelated (r though our two variables are unrelated (r will approximate 0), when in fact the will approximate 0), when in fact the relationship exists, but is nonlinearrelationship exists, but is nonlinear

Page 20: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

VAR00001

121086420

VA

R0

00

02

10

8

6

4

2

0

Above is an example of a curvilinear relationship, although the two Above is an example of a curvilinear relationship, although the two variables are clearly related, their correlation is only r = -.205variables are clearly related, their correlation is only r = -.205– Note how the best-fit line does not represent the data points wellNote how the best-fit line does not represent the data points well

Page 21: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Assumptions of Correlation Assumptions of Correlation (Pearson’s (Pearson’s rr))

2.2. NormalityNormality– Both variables must be normally Both variables must be normally

distributed, otherwise correlation will distributed, otherwise correlation will appear smaller than it isappear smaller than it is

– If our data is non-normal, correlation If our data is non-normal, correlation coefficients other than coefficients other than rr can be used can be used

Page 22: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation We can also calculate r if our data is We can also calculate r if our data is

ordinal instead of continuous/dimensionalordinal instead of continuous/dimensional– Remember: data on an ordinal scale is Remember: data on an ordinal scale is rankedranked, ,

which means that we can tell that one number which means that we can tell that one number is higher than another, but not how much is higher than another, but not how much higher (interval scales have this), and there is higher (interval scales have this), and there is no zero point (ratio scales have this) – i.e. 1no zero point (ratio scales have this) – i.e. 1stst place, 2place, 2ndnd place, etc. = ordinal data place, etc. = ordinal data

– Correlation here is represented by Correlation here is represented by Spearman’s Spearman’s rrss Difference between r and rDifference between r and rss is that r is that rss requires that the requires that the

data be data be monotonicmonotonic, or constantly rising or falling – if , or constantly rising or falling – if data are arranged in rank order, they can only go up data are arranged in rank order, they can only go up or down, you can’t go from 1or down, you can’t go from 1stst place to 9 place to 9thth place to 2 place to 2ndnd place if the places are arranged in orderplace if the places are arranged in order

Page 23: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Other correlation coefficientsOther correlation coefficients– The The Point Biserial CorrelationPoint Biserial Correlation coefficient coefficient

(r(rpbpb) - If one variable is ) - If one variable is continuous/dimensional and the other continuous/dimensional and the other dichotomousdichotomous (a nominal scale where the (a nominal scale where the variable can take only two possible variable can take only two possible values)values) Dichotomous variables – e.g. Gender Dichotomous variables – e.g. Gender

(Male/Female), Yes/No answers, Race (if it is (Male/Female), Yes/No answers, Race (if it is coded as Caucasian or Minority), etc.coded as Caucasian or Minority), etc.

Page 24: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Other correlation coefficientOther correlation coefficient– Phi (Phi (ΦΦ) – when both variables are ) – when both variables are

dichotomousdichotomous

Page 25: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation Factors that bias correlation coefficients:Factors that bias correlation coefficients:

1.1. Range RestrictionRange Restriction– Typically, restricting range reduces correlationsTypically, restricting range reduces correlations

Full Dataset (r = .82)Full Dataset (r = .82) Only BDI > 30 (r = .490)Only BDI > 30 (r = .490)

BDI2TOT

50403020100-10

MA

SQ

GD

D

70

60

50

40

30

20

10

0

BDI2TOT

504030

MA

SQ

GD

D

70

60

50

40

30

Page 26: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

– However, restricting range However, restricting range increasesincreases correlations correlations if the relationship is curvilinear because it makes if the relationship is curvilinear because it makes the variable linearthe variable linear

Full Dataset (r = -.205)Full Dataset (r = -.205) Only Var1 ≥ 5 (r = -.982)Only Var1 ≥ 5 (r = -.982)

VAR00001

121086420

VA

R0

00

02

10

8

6

4

2

0

VAR00001

1110987654

VA

R0

00

02

10

8

6

4

2

0

Page 27: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation– Problems of range restriction are common in Problems of range restriction are common in

psychological research, because researchers psychological research, because researchers want their group to be as different from each want their group to be as different from each other as possible to increase the effect sizes other as possible to increase the effect sizes that they obtainthat they obtain Remember: The formula for effect size for ANOVA Remember: The formula for effect size for ANOVA

(Cohen’s d) is the mean for Group 1 – the mean for (Cohen’s d) is the mean for Group 1 – the mean for Group 2 divided by the sGroup 2 divided by the spp

– To get highly different groups, researchers To get highly different groups, researchers sample those high and low on a particular sample those high and low on a particular variablevariable I.e. comparing those highest on aggression to those I.e. comparing those highest on aggression to those

lowest on aggressionlowest on aggression This is identical to only looking at BDI2 scores higher This is identical to only looking at BDI2 scores higher

than 30, when looking at the full range of scores, than 30, when looking at the full range of scores, correlations will be more accuratecorrelations will be more accurate

Page 28: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Factors that bias correlation Factors that bias correlation coefficients:coefficients:

2.2. Heterogenous SubsamplesHeterogenous Subsamples– This is a problem when there is an This is a problem when there is an

interaction present (i.e. our age by gender interaction present (i.e. our age by gender interaction mentioned in the discussion of interaction mentioned in the discussion of Factorial ANOVA)Factorial ANOVA)

Page 29: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

– If males’ performance increases as they age, and womens’ If males’ performance increases as they age, and womens’ performance remains the same, when the two genders are performance remains the same, when the two genders are averaged together and age and performance are correlated averaged together and age and performance are correlated regardless of gender, the correlation will be smallerregardless of gender, the correlation will be smaller

– Strong correlation of age and performance for males + Strong correlation of age and performance for males + weak correlation of age and performance for females = weak correlation of age and performance for females = biased correlation when the two are added togetherbiased correlation when the two are added together

AGE

OldMediumYoung

Pe

rfo

rma

nce

40

30

20

10

0

GENDER

Male

Female

Page 30: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation Factors that bias correlation coefficients:Factors that bias correlation coefficients:

3.3. OutliersOutliers

No Outliers (r = .989)No Outliers (r = .989) Outlier (r = .522) Outlier (r = .522)

VAR00001

121086420

VA

R0

00

02

50

40

30

20

10

0

VAR00001

121086420

VA

R0

00

02

5000000000

4000000000

3000000000

2000000000

1000000000

0

-1000000000

Page 31: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Testing correlations for significanceTesting correlations for significance– just like t- and F-statistics, r-statistics just like t- and F-statistics, r-statistics

can be tested for significancecan be tested for significance– just like t- and F-statistics, with just like t- and F-statistics, with

increasing sample size (n), smaller increasing sample size (n), smaller correlations (r’s) will be significantcorrelations (r’s) will be significant with 25 people, r with 25 people, r ≥ .396 is significant at p ≥ .396 is significant at p

< .05, with 1000 people you only need an r < .05, with 1000 people you only need an r ≥ .062 (see Table E.2, page 515 in your ≥ .062 (see Table E.2, page 515 in your text)text)

Page 32: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Testing correlations for significanceTesting correlations for significance– the r-statistic is also its own, built-in effect the r-statistic is also its own, built-in effect

size statisticsize statistic Cohen’s conventions for r: .1 = small, .3 = Cohen’s conventions for r: .1 = small, .3 =

medium, and .5 = large effectsmedium, and .5 = large effects

– by squaring r (rby squaring r (r22), you also get a relatively ), you also get a relatively unbiased effect size estimate that is unbiased effect size estimate that is interpreted identically to interpreted identically to ηη22 and and ωω22

Remember: Remember: ηη22 and and ωω22 represent the represent the percent of percent of variability in one variable accounted for by the variability in one variable accounted for by the otherother

Page 33: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

CorrelationCorrelation

Testing correlations for significanceTesting correlations for significance Therefore, if: Therefore, if:

– r = .5, p = .00001, you can state that your two r = .5, p = .00001, you can state that your two variables are strongly (effect size) and reliably (p-variables are strongly (effect size) and reliably (p-value) relatedvalue) related

– r = .5, p = .65, you can conclude that your two r = .5, p = .65, you can conclude that your two variables are strongly related, but that you variables are strongly related, but that you probably didn’t have enough subjects for this to probably didn’t have enough subjects for this to be represented in your p-valuebe represented in your p-value

– r = .1, p = .00001, you can conclude that large r = .1, p = .00001, you can conclude that large sample size inflated your p-value, and your sample size inflated your p-value, and your variables are probably not relatedvariables are probably not related

– r = .1, p = .65, you can conclude that your two r = .1, p = .65, you can conclude that your two variables are neither strongly nor reliably relatedvariables are neither strongly nor reliably related

Page 34: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression

The best-fit line allows us to make The best-fit line allows us to make educated guesses about what a score educated guesses about what a score is on one variable given a score on the is on one variable given a score on the otherother

ExtrapolateExtrapolate = make educated guesses what a = make educated guesses what a score would be that is either higher or lower score would be that is either higher or lower than any actual score obtainedthan any actual score obtained

InterpolateInterpolate = make educated guesses what a = make educated guesses what a score would be that is in the range of the score would be that is in the range of the scores obtained, but that was not actually scores obtained, but that was not actually obtainedobtained

Page 35: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression

– Range of scores on Depression = 0 – 49Range of scores on Depression = 0 – 49– Range of scores on Pessimism = 1 – 7Range of scores on Pessimism = 1 – 7– Extrapolation – What pessimism score would be associated Extrapolation – What pessimism score would be associated

with a depression score of 50? (~6.8)with a depression score of 50? (~6.8)– Interpolation – What pessimism score would be associated with Interpolation – What pessimism score would be associated with

a depression score of 45? (~5.5)a depression score of 45? (~5.5)

Depression

50403020100-10

Pe

ssim

ism

8

7

6

5

4

3

2

1

0

Page 36: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression

Interested in linear relationship between 2 Interested in linear relationship between 2 variables = use correlationvariables = use correlation

Interested in linear relationship(s) between Interested in linear relationship(s) between 3+ dimensional variables = regression3+ dimensional variables = regression– DV = Symptoms of paranoiaDV = Symptoms of paranoia– IV = Treatment vs. Control groups IV = Treatment vs. Control groups ANOVA ANOVA

IV discrete (dichotomous/polychotomous)IV discrete (dichotomous/polychotomous)

– IV = # of sessions of treatment IV = # of sessions of treatment Regression Regression IV dimensional/continuousIV dimensional/continuous

Page 37: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression

– DV = Criterion, IV’s = PredictorsDV = Criterion, IV’s = Predictors– Criterion = bCriterion = b11xx11 +b +b22xx22 + b + b33xx33… + a… + a

xx11 = predictor #1; b = predictor #1; b11 = slope of x = slope of x11 and DV; a = intercept and DV; a = intercept Slope = rate of changeSlope = rate of change

– b = .75 = 1 pt. increase in IV associated with .75 pt. increase b = .75 = 1 pt. increase in IV associated with .75 pt. increase in DVin DV

– I.e. for every 1 pt. increase in pessimism, Dep increases .75 pt.I.e. for every 1 pt. increase in pessimism, Dep increases .75 pt.

Depression

50403020100-10

Pe

ssim

ism

8

7

6

5

4

3

2

1

0

Page 38: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression SlopeSlope

– Slope w/ raw data = Slope w/ raw data = bb I.e. I.e. bb = .45 in prediction of GPA from IQ = .45 in prediction of GPA from IQ 1 pt. 1 pt.

increase in IQ associated with ½ pt. increase in GPAincrease in IQ associated with ½ pt. increase in GPA– Slope w/ standardized data = Slope w/ standardized data = ββ

Standardize data (i.e. convert to Standardize data (i.e. convert to zz-score) to -score) to compare slopes between experimentscompare slopes between experiments

ββ = = bbxxss/s/sinterceptintercept I.e. I.e. ββ = .53 = .53 1 s.d. increase in IQ associated with 1 s.d. increase in IQ associated with

½ s.d. increase in GPA½ s.d. increase in GPA bb more interpretable if scale of variables is more interpretable if scale of variables is

meaningfulmeaningful Intercept = value of DV when IV = 0Intercept = value of DV when IV = 0

– In previous ex., Pess = ~3 when Dep = 0, so a = ~3In previous ex., Pess = ~3 when Dep = 0, so a = ~3

Page 39: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression

Regression can test:Regression can test:– The overall ability of all of your IV’s to The overall ability of all of your IV’s to

predict your criterion (overall predict your criterion (overall model/omnibus Rmodel/omnibus R22))

– The ability of each IV to predict your The ability of each IV to predict your criterion (criterion (bb or or ββ)) Each of these statistics is associated with a Each of these statistics is associated with a pp--

value & tested for significancevalue & tested for significance

– Can also be used to make predictions based Can also be used to make predictions based on best-fit/regression line (less common)on best-fit/regression line (less common)

Page 40: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression

Hypotheses in Regression:Hypotheses in Regression:– HH00 = = bb//ββ/R/R2 2 (in population) = 0(in population) = 0

– HH11 = = bb//ββ/R/R2 2 (in population) ≠ 0(in population) ≠ 0

Page 41: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

RegressionRegression

Assumptions of RegressionAssumptions of Regression– Linearity of RegressionLinearity of Regression

Variables linearly related to one anotherVariables linearly related to one another

– Normality in ArraysNormality in Arrays Actual values of DV normally distributed around Actual values of DV normally distributed around

predicted values (i.e. regression line) – AKA predicted values (i.e. regression line) – AKA regression line is good approximation of population regression line is good approximation of population parameterparameter

– Homogeneity of Variance in ArraysHomogeneity of Variance in Arrays Assumes that variance of criterion is equal for all Assumes that variance of criterion is equal for all

levels of predictor(s)levels of predictor(s) Sound familiar?Sound familiar?

– Variance of DV equal for all levels of IV(s)Variance of DV equal for all levels of IV(s)

Page 42: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

Correlation/RegressionCorrelation/Regression

Correlation & Regression can also Correlation & Regression can also answer other kinds of questions:answer other kinds of questions:– Can test difference between 2 Can test difference between 2

independent independent r r ’s/’s/b b ’s’s rra & ba & b > > rrc & dc & d

Is the correlation between depression and Is the correlation between depression and anxiety using the BDI and BAI larger than anxiety using the BDI and BAI larger than the same correlation using the MASQ-AD the same correlation using the MASQ-AD and MASQ-AA subscales?and MASQ-AA subscales?

Page 43: Correlation & Regression. Correlation T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV T-tests and ANOVA

Correlation/RegressionCorrelation/Regression

– Can test difference between 2 Can test difference between 2 dependent dependent rr ‘s/ ‘s/bb ‘s ‘s rra & ba & b > > rrb & cb & c

Is the correlation between rumination and Is the correlation between rumination and depression as high as between rumination depression as high as between rumination and generalized anxiety? and generalized anxiety?

Is the correlation between rumination and Is the correlation between rumination and depression @ Time 1 the same at Time 2, 4 depression @ Time 1 the same at Time 2, 4 weeks later?weeks later?

Don’t worry about how to do Don’t worry about how to do calculations by handcalculations by hand