data example - the university of tennessee at · web viewnonlinear relationships can be...

73
Lecture 7_Two Independent Variable Regression Why do regression with two (or more) independent variables? Theoretical Reasons 1. To assess relationship of Y to an X while statistically controlling for the effects of other X(s). This is often referred to as the assessment of the unique relationship of Y to an X. Why? Simple correlations of Y with X are potentially contaminated by X’s relationship with other variables and by Y’s relationship with other variables. So multiple regression assesses the relationship of Y to X while “statistically holding the other Xs constant.” Thanks, Mathematicians!! Hospitalist Study Example: We wanted to determine whether there were differences in Charges and in Lengths of Stay of patients of Hospitalists vs. Nonhospitalists. The question: Was the use of hospitalists more cost effective than the old way? But there were likely many factors that differed between the Hospitalist Group and the Nohospitalist Group such as different diseases in the two groups, different patient ages, different severity of illness between the two, etc.. If we just performed a t-test comparing outcome means between the hospitalist and nonhospitalist groups, any difference we found might have been due to other factors. So we conducted a multiple regression of the outcome onto a variable representing the two groups including and therefore controlling for 20+ other variables including age, gender, ethnic group, type of illness, severity, etc. In doing so, we reduced the possibility that any difference in the outcome found between the two groups was due to the Lecture 7: Two IV Regression - 1Printed on 03/05/22

Upload: phamthu

Post on 31-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Lecture 7_Two Independent Variable RegressionWhy do regression with two (or more) independent variables?

Theoretical Reasons

1. To assess relationship of Y to an X while statistically controlling for the effects of other X(s).

This is often referred to as the assessment of the unique relationship of Y to an X.

Why? Simple correlations of Y with X are potentially contaminated by X’s relationship with other variables and by Y’s relationship with other variables.

So multiple regression assesses the relationship of Y to X while “statistically holding the other Xs constant.” Thanks, Mathematicians!!

Hospitalist Study Example:

We wanted to determine whether there were differences in Charges and in Lengths of Stay of patients of Hospitalists vs. Nonhospitalists.

The question: Was the use of hospitalists more cost effective than the old way?

But there were likely many factors that differed between the Hospitalist Group and the Nohospitalist Group such as different diseases in the two groups, different patient ages, different severity of illness between the two, etc.. If we just performed a t-test comparing outcome means between the hospitalist and nonhospitalist groups, any difference we found might have been due to other factors. So we conducted a multiple regression of the outcome onto a variable representing the two groups including and therefore controlling for 20+ other variables including age, gender, ethnic group, type of illness, severity, etc.

In doing so, we reduced the possibility that any difference in the outcome found between the two groups was due to the other factors, leaving the conclusion that difference was due uniquely to type of physician and not due to reasons associated with the controlled variables.

Freshman Seminar Example:

UTC was interested in determining whether or not requiring students to attend a 1-hour freshman seminar course would be worthwhile. We wished to compare overall GPA and persistence into the 2nd semester of students who had taken the Freshman Seminar with students who had not taken it. But the two groups differed potentially in term of academic preparation and other factors. For that reason, we made the comparison controlling for a number of factors, including ACT, HSGPA, gender, ethnicity, and others. After controlling for differences in these other factors, any difference in GPA (about .15) and persistence (several percentages difference) could be attributed as being most likely due to the seminar.

Lecture 7: Two IV Regression - 1 Printed on 05/06/23

2. To develop more refined explanations of variation of a dependent variable.

Cassie Lane’s thesis

Doctors typically use a child’s age as an indicator of whether or not the child will be able to understand a consent form.

In fact, a multiple regression allowed Cassie Lane to conclude that when you control for cognitive ability, understanding is not related at all to the age of the child. She found that virtually all the variation was due to CA.

Cassie’s thesis suggested that it is the child’s cognitive ability rather than age that is most predictive of understanding of the issues involving a consent form.Understanding is more strongly related to CA than it is to age.

Practical Reason

3. To simply increase predictability of a dependent variable.

Our Validation study used for the I-O program

We could use just UGPA as an admission test. But UGPA’s correlation with graduate performance is pretty low.

So we added GRE scores as predictors along with UGPA, simply to increase validity.

No, we did not add the GRE because we get kick-backs from ETS.

GRE and UGPA do OK, but leave about 75% of the variance in grades unexplained.

Technical Reasons

4. Representing categorical variables in regression analyses.

Simple regression canNOT be used when the independent variable is a categorical variable with 3 or more categories. But categorical variables can be used as independent variables if they’re represented in special ways in multiple regression analyses. These group-coding techniques are covered later on.

5. Representing nonlinear relationships using linear regression programs.

Nonlinear relationships can be representing using garden-variety regression programs using special techniques that aren’t very hard to implement. Perhaps more on these later.

Lecture 7: Two IV Regression - 2 Printed on 05/06/23

Some Issues we’ll consider this semester. Start here on 3/1/16. Start here on 2/27/18.

1. Understanding the differences between simple relationships and partialled relationships.

A simple relationship is that between one IV and a DV. Every time you compute a Pearson r, you assess the strength and direction of simple linear relationship between the two variables.

Problem: Simple relationships may be contaminated by covariation of X with other variables which also influence Y. As X varies, so do many other variables, some of which may affect Y. Examining the simple relationship does not take into account the changes in those other variables and their possible effects on Y. The result is that a simple relationship provides ambiguous information regarding whether X truly explains or even predicts Y.

A partialled relationship is the relationship between Y and X while statistically holding other IVs constant, i.e., uncontaminated by the other IVs. Partialled relationships are nearly always different from simple relationships. They provide clearer information on whether X explains or predicts Y.

2. Determining the relative importance of predictors.

Until fairly recently there was no universally agreed-upon way to make such a determination. Recently, a method called dominance analysis, has shown promise of providing answers to this question.

3. Dealing with high intercorrelations among predictors.

This is the problem called multicollinearity. For example, Wonderlic Personnel Test (WPT) scores and ACT scores are nearly multicollinear as predictors of GPA. We’ll look at the results of multicollinearity later.

4. Evaluating the significance of sets of independent variables.

This is a technical issue whose solution is quite straightforward. For example, what is the effect of adding three GRE scores (Verbal, Quantitative, and Analytic Writing) to our prediction of graduate school performance? What will be the effect of adding all five of the Big Five to our prediction equation?

5. Determining the ideal subset of predictors.

Having too many predictors in an analysis may lead to inaccurate estimates of the unique relationships. Too few may lead to lack of predictability. So there are techniques for determining just the right number of predictors.

6. Cross validation.

Generalizing results across samples. Any regression analysis will be influenced to some extent by the unique characteristics of the sample on which the analysis was performed. Many investigators use a separate sample to evaluate how much the results will generalize across samples. This technique is called cross validation.

Lecture 7: Two IV Regression - 3 Printed on 05/06/23

Notation involved in Two Independent Variables Regression

Definition of multiple regression analysis: The development of a combination rule relating a single dv to two or more IV’s so as to maximize the “correspondence” between the dv and the combination of the iv’s.

Correspondence: Least squares criterion

Maximizing the correspondence involves minimizing the sum of squared differences between observed Y’s and Y’s predicted from the combination. Called Ordinary Least Squares (OLS) analysis.

Our estimated prediction formula is written in its full glory as

Predicted Y = aY.12 + bY1.2*X1 + bY2.1*X2 ARGH!!

Shorthand version

Predicted Y = a + b1*X1 + b2*X2

Many textbooks write the equation in the following way:

Predicted Y = B0 + B1*X1 + B2*X2 We’ll use this.

I may, in haste, forget to subscript, leading to

Predicted Y = B0 + B1*X1 + B2*X2 This is synonymous with the immediately preceding.

Lecture 7: Two IV Regression - 4 Printed on 05/06/23

An artificial data example

SUPPOSE AN COMPANY IS ATTEMPTING TO PREDICT 1ST YEAR SALES.

TWO PREDICTORS ARE AVAILABLE.

THE FIRST IS A TEST OF VERBAL ABILITY. SCORES RANGE FROM 0 -100.

THE SECOND IS A MEASURE OF EXTRAVERSION. SCORES RANGE FROM 0 - 150.

THE DEPENDENT VARIABLE IS 1ST YEAR SALES IN 1000'S.

BELOW ARE THE DATA FOR 25 HYPOTHETICAL CURRENT EMPLOYEES. THE QUESTION IS: WHAT IS THE BEST LINEAR COMBINATION OF THE Xs (TESTS) FOR PREDICTION OF 1ST YEAR SALES.

(WE'LL SEE THAT OUR BEST LINEAR COMBINATION OF THE TWO PREDICTORS CAN BE A COMBINATION WHICH EXCLUDES ONE OF THEM.)

Note that one of the predictors is an ability and the other predictor is a personality characteristic.

THE DATA ID SALES VERBAL EXTRAV 1 722 45 92 2 910 38 90 3 1021 43 70 4 697 46 79 5 494 47 61 6 791 41 100 7 1025 44 113 8 1425 58 86 9 1076 37 98 10 1065 51 115 11 877 53 111 12 815 45 92 13 1084 38 114 14 1034 56 114 15 887 54 99 16 886 40 117 17 1209 45 126 18 782 48 66 19 854 37 80 20 489 33 61 21 1214 52 103 22 1528 66 125 23 1148 74 134 24 1015 58 87 25 1128 60 95

Here, the form of the equation would be Predicted SALES = B0 + B1*VERBAL + B2*EXTRAV

Lecture 7: Two IV Regression - 5 Printed on 05/06/23

First – Two simple regressions . . .

VERBALCoefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) 303.597 214.578 1.415 .171

VERBAL 13.719 4.351 .549 3.153 .004

a. Dependent Variable: SALES

So, if only VERBAL is the predictor, the prediction equation is

Predicted SALES = 303.597 + 13.719* VERBAL.

EXTRAVCoefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) 238.843 195.816 1.220 .235

EXTRAV 7.498 1.975 .621 3.797 .001

a. Dependent Variable: SALES

So, if only EXTRAV1 is the predictor, the prediction equation is

Predicted SALES = 238.843 + 7.498 * EXTRAV1.

What if we want BOTH predictors at the same time?

The equation will be

Predicted SALES = Constant + Slope1*VERBAL + Slope2*EXTRAV1

So, what will be Constant? What will be Slope1? What will be Slope2?

Could we average the two individual Constants?Could we simply use each simple regression slope?

Is there any way to compute the two-predictor parameters from the simple regression parameters? There is no easy way without a computer.

Lecture 7: Two IV Regression - 6 Printed on 05/06/23

The two-predictor SPSS Analysis

Lecture 7: Two IV Regression - 7 Printed on 05/06/23

Estimates and Model fit are already checked in SPSS by default.

You should check Descriptives and Part and partial correlations.

The SPSS Output

Regression

Descriptive StatisticsMean Std. Deviation N

SALES 967.04 246.772 25VERBAL 48.36 9.882 25EXTRAV 97.12 20.429 25

CorrelationsSALES VERBAL EXTRAV

Pearson Correlation SALES 1.000 .549 .621VERBAL .549 1.000 .423EXTRAV .621 .423 1.000

Sig. (1-tailed) SALES . .002 .000VERBAL .002 . .018EXTRAV .000 .018 .

N SALES 25 25 25VERBAL 25 25 25EXTRAV 25 25 25

Variables Entered/Removeda

ModelVariables Entered

Variables Removed Method

1 EXTRAV, VERBALb

. Enter

a. Dependent Variable: SALESb. All requested variables entered.

Model Summary

Model R R SquareAdjusted R

SquareStd. Error of the

Estimate1 .697a .486 .439 184.859a. Predictors: (Constant), EXTRAV, VERBAL

ANOVAa

Model Sum of Squares df Mean Square F Sig.1 Regression 709710.045 2 354855.022 10.384 .001b

Residual 751798.915 22 34172.678

Total 1461508.960 24

a. Dependent Variable: SALESb. Predictors: (Constant), EXTRAV, VERBAL

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig.Correlations

B Std. Error Beta Zero-order Partial Part1 (Constant) -9.887 219.017 -.045 .964

VERBAL 8.726 4.213 .349 2.071 .050 .549 .404 .317EXTRAV 5.714 2.038 .473 2.804 .010 .621 .513 .429

a. Dependent Variable: SALES

Lecture 7: Two IV Regression - 8 Printed on 05/06/23

This output is obtained by checking the “Descriptives” box in the REGRESSION dialog box.

Use it to make sure the data don’t have outliers or other problems.

Two key output tables in more detail.

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .697a .486 .439 184.859

a. Predictors: (Constant), EXTRAV, VERBAL

R The value under R is the multiple R – the correlation between Y’s and the combination of X’s. Since it’s the correlation between Y and the combination of multiple X’s, it’s called the multiple correlation.

In most textbooks, multiple correlations are typically printed as R while simple correlations are typically printed as r. In SPSS, however, all correlations are printed as R.

R Square

R square is also called the coefficient of determination. That probably refers to the fact that it’s a coefficient which measures the extent to which variation in Y is determined by variation in the combination of Xs.

It’s also the proportion of variance in Y linearly related to the combination of multiple predictors.

Coefficient of determination ranges from 0 to 1.0: Y is not related to the linear combination of X’s.1: Y is perfectly linearly related to the combination of X’s.

Adjusted R Square

R square made slightly smaller to compensate for chance factors that increase as the number of predictors increases. More on this later.

Std. Error of the Estimate

This is the standard deviation of the residuals. It’s a measure of how poorly predicted the Ys are. The larger the value of this statistic, the more poorly predicted they are.

It’s roughly a measure of the average size of the residuals.

It varies inversely with R.

Lecture 7: Two IV Regression - 9 Printed on 05/06/23

Lecture 7: Two IV Regression - 10 Printed on 05/06/23

Adjusted R Square – ad nauseum – skipped in 2018 lecture

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .697a .486 .439 184.859

a. Predictors: (Constant), EXTRAV, VERBAL

It is an estimate of the population R2 adjusted downward to compensate for spurious upward bias due to the number of predictors. The more predictors, the greater the downward adjustment.

Rationale: For a given sample, as the number of predictors increases, holding sample size, N, constant, the value of regular R2 will increase simply due to chance factors alone.

You can try this at home. Take numbers from whatever sources you can find. Make them predictors of a criterion. R2 will increase with each set of random predictors you add.

In fact, you can generate perfect prediction of any criterion using random predictors. All you need is N-1 predictors, where N is sample size.

So, if I have 25 sales values, I could predict them perfectly with 24 different random predictors. If I have 100 GPAs, I could predict them perfectly with 99 different random predictors.

This is just not right. It’s not. That’s why we have the adjusted R2.

The adjustment formula thus reduces (shrinks) R2 to compensate for this capitalization on chance. The greater the number of predictors for a given sample, the greater the adjustment.

Lecture 7: Two IV Regression - 11 Printed on 05/06/23

The adjustment formula. - skipped in 2018 lecture

n-1Adjusted R-squared = 1-(1-R2) ---------------- n-K-1

Suppose R2 were .81. The adjusted R2 for various no.’s of predictors is given below. N=20 for this example for various numbers of predictors, K. R SQUARE N K ADJ R SQ .81 20 0 .81 .81 20 1 .80 .81 20 2 .79 .81 20 3 .77 .81 20 4 .76 .81 20 5 .74 .81 20 6 .72 .81 20 7 .70 .81 20 8 .67 .81 20 9 .64 .81 20 10 .60 .81 20 11 .55

Use: I typically make sure that R2 and Adjusted R2 are “close” to each other. If there is a noticeable difference, say > 10%, then I ask myself – “Is my sample size too small for the number of predictors I’m using?” The answer for this small problem is almost always, “Yes”.

Using Adjusted R2 to compare regression analyses with different numbers of predictors.

Christiansen, N. D., & Robie, C. (2011). Further consideration of the use of narrow trait scales. Canadian Journal of Behavioral Science, 43, 183-194.

Compared simple regressions involving individual Big Five domain scale scores with multiple regressions involving 4 Big Five facet scores.

Since the facet regressions involved 4 predictors while the domain scale score regressions involved only 1 predictor, it would be expected that the facet regressions would have larger R2 values simply due to the difference in # of predictors.

So the authors compared adjusted R2 values which controlled for the differences due to # of predictors, and were able to conclude that “ . . . on average the sets of facet scores explained an additional 9% of the variance in performance beyond summated composites . . .”

Lecture 7: Two IV Regression - 12 Printed on 05/06/23

K

11109876543210

Va

lue

AD

JR

SQ

. 9

. 8

. 7

. 6

. 5

No. of predictors

Downward Adjustment

The Coefficients Table – the 2 nd major output table

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

Correlations

B Std. Error Beta Zero-order Partial Part

1 (Constant) -9.887 219.017 -.045 .964

VERBAL 8.726 4.213 .349 2.071 .050 .549 .404 .317

EXTRAV 5.714 2.038 .473 2.804 .010 .621 .513 .429

a. Dependent Variable: SALES

Interpretation of the regression parameters, called partial regression coefficients

B0, the intercept parameter of the equation.

B0 Expected value of Y when all X’s are 0.

The Slopes, B1, B2, B3, etc. Called the partial regression coefficients.

Three equivalent interpretations of the Slopes

Bi Among persons equal on all the other iv’s, it’s the expected difference in Y between two people who differ by 1 on Xi

Bi Holding constant the other X values, it’s the expected difference in Y between two people who differ by 1 on Xi

Bi Partialling out the effects of the other X values, it’s the expected difference in Y between two people who differ by 1 on Xi

So, Predicted SALES = -9.887 + 8.726*VERBAL + 5.714*EXTRAV

B0: If VERBAL were 0 and EXTRAV were 0, we’d expect sales to be -9.887. (The person would owe the company.

B1: Among persons equal on EXTRAV1, we’d expected an 8.726 difference in SALES between two persons who differed by 1 point on VERBAL.

B2: Among persons equal on VERBAL, we’d expected a 5.714 differences in SALES between two persons who differed by 1 point on EXTRAV1.

Don’t compare B values to determine which predictor is more important or efficacious. Test question.

Lecture 7: Two IV Regression - 13 Printed on 05/06/23

The Standard Errors.Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

Correlations

B Std. Error Beta Zero-order Partial Part

1 (Constant) -9.887 219.017 -.045 .964

VERBAL 8.726 4.213 .349 2.071 .050 .549 .404 .317

EXTRAV 5.714 2.038 .473 2.804 .010 .621 .513 .429

a. Dependent Variable: SALES

Each standard error is the amount by which the estimate would be expected to vary from sample to sample if the regression were repeated many times.

We can use the Standard error to get an idea of how far from 0 an estimate is.

Consider B0 = -9.887.

If the regression were replicated many time, we’d expect different values of B0. They would form a distribution. The standard deviation of that distribution would be about 219.017.

So if the distribution has standard deviation of 219, then 9.887 is only about .045 of a standard deviation from 0.

Consider, B1 = 8.726.

If the regression were replicated many times, we’d expect different values of B1. They would form a distribution. The standard deviation of that distribution would be about 4.213.

So if the distribution has standard deviation of 4.2, then 8.726 is about 2 standard deviations from 0.

Consider B2 = 5.714

The estimated standard deviation of B2 on repeated regressions is 2.038.

So the value of 5.714 is about 2.8 standard deviations from 0.

As we’ll see below, the t values in the Coefficients box tell us how far each estimate is from 0 in units of the Std. Error.

Lecture 7: Two IV Regression - 14 Printed on 05/06/23

Standardized Partial Regression coefficients, also called the Betas, the βs.

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

Correlations

B Std. Error Beta Zero-order Partial Part

1 (Constant) -9.887 219.017 -.045 .964

VERBAL 8.726 4.213 .349 2.071 .050 .549 .404 .317

EXTRAV 5.714 2.038 .473 2.804 .010 .621 .513 .429

a. Dependent Variable: SALES

If all variables could be converted to Z-scores, the βs would be the slopes of the regression.

The Y-intercept would be 0

Predicted ZY = β1*ZX1 + β2*ZX2

Predicted ZSALES = .349*ZVERBAL + .473*ZEXTRAV1

Interpretation

Betai Among persons equal on the other X’s, it’s t he expected difference in ZY between two people who differ by one SD on Xi

That is, the number of standard deviations difference in Y between two people who differ by one SD on Xi among persons equal on the other X's.

Use of the Beta’s

The Beta’s represent one (now outmoded) possible way of comparing the “importance” of predictors. - The larger the β, the greater the variation in Y for “statistical” variation in X. But note that the issue of “importance” of predictors is much debated, and the method of dominance analysis supersedes previous views, of which this is one.

So, while it used to be acceptable to compare beta values to determine predictor importance, that practice has been replaced by the use of dominance analysis. (See last lecture.)

Lecture 7: Two IV Regression - 15 Printed on 05/06/23

t valuesCoefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t

Sig.

Correlations

B Std. Error Beta Zero-order Partial Part

1 (Constant) -9.887 219.017 -.045 .964

VERBAL 8.726 4.213 .349 2.071 .050 .549 .404 .317

EXTRAV 5.714 2.038 .473 2.804 .010 .621 .513 .429

a. Dependent Variable: SALES

Each t value is the B value divided by the Std. Error: B / Std. Error.

tB0 = -9.887 / 219.017 = -.045.

tB1 = 8.726 / 4.213 = 2.071.

tB2 = 5.714 / 2.038 = 2.804.

Each t is a test of the “significance” of the difference of partial regression coefficient from 0.

The t for the Constant simply tests the null hypothesis that in the population the intercept is 0.

Null Hypothesis:

Version 1: In the population, the partial regression coefficient is 0.That is, in the population, the relationship of Y to X, holding the other Xs constant, is 0.

Version 2: In the population the predictor does not add significantly to predictability.

Version 3: In the population, the increment to R2 associated with adding the predictor to an equation containing the other predictors is 0.

Version 4: In the population, among persons equal on the other predictors, there is no relationship of Y to X.

The above are all ways of describing what the t-test in the output tests. All different conceptualizations of the null are rejected or retained simultaneously.

Test question:Which of the following is the appropriate definition of the t of 2.071 in the coefficients table?a. It tests the significance of the relationship of SALES to VERBAL.b. It tests the significance of the relationship of SALES to VERBAL among persons equal on EXTRAV1.

Lecture 7: Two IV Regression - 16 Printed on 05/06/23

Part (also called semipartial) correlations . . . Symbolized as sr.

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

Correlations

B Std. Error Beta Zero-order Partial Part

1 (Constant) -9.887 219.017 -.045 .964

VERBAL 8.726 4.213 .349 2.071 .050 .549 .404 .317

EXTRAV 5.714 2.038 .473 2.804 .010 .621 .513 .429

a. Dependent Variable: SALES

sri: The correlation of Y with variation in Xi that is unique from variation in the other Xs.

sri: The correlation of Y with the unique variation in Xi.

Answers the question: If I got rid of the contamination associated with the other variables, what would be the correlation of Y with the pure X variation?

Part correlations are correlations in which the effects of the other predictors have been removed from X.

Lecture 7: Two IV Regression - 17 Printed on 05/06/23

Partial correlations . . . Skipped in lecture in 2018

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

Correlations

B Std. Error Beta Zero-order Partial Part

1 (Constant) -9.887 219.017 -.045 .964

VERBAL 8.726 4.213 .349 2.071 .050 .549 .404 .317

EXTRAV 5.714 2.038 .473 2.804 .010 .621 .513 .429

a. Dependent Variable: SALES

Partial correlations are correlations between the unique aspects of Y with the unique aspects of X. Symbolized as pr.

pri: The correlation of that variation in Y that is independent of the other Xs with the variation in Xi that is independent of variation in the other Xs.

pri: The correlation of the unique variation in Y with the unique variation in Xi.

Partial correlations (they should be called full partial correlations) are correlations in which the effects of other predictors have been removed from both X and Y.

Answers the question: If I got rid of the contamination associated with the other variables from both Y and X, what would be the correlation of the pure Y variation with the pure X variation?

Lecture 7: Two IV Regression - 18 Printed on 05/06/23

Examples of Regression Involving 2 Independent Variables

Example I. Uncorrelated Predictors. X1 and X2 are uncorrelated. X1 and X2 jointly contribute to the effect.

It’s not often that independent variables are uncorrelated, but occasionally it happens. Pray for such situations, because the independence of the Xs facilitates both analysis and interpretation.

Example. College GPA predicted by Cognitive Ability (X1) and Conscientiousness (X2)

We all know that college grade points are determined at least partly by cognitive ability – how smart a student is. But we all know many smart people who don’t have GPAs as high as we think they should. And we all know students who seem to be performing beyond their potential – overachievers. Recent work on personality theory has led to the identification of at least one personality characteristic, called conscientiousness, that is essentially uncorrelated with cognitive ability but which is positively correlated with college GPA.

Representation of “uncorrelatedness” of IVs. Note how the fact that variables are uncorrelated is represented in each of the above diagrams. In the path diagram on the left, there is no connecting arrow between Cognitive Ability and Conscientiousness.

When IVs are uncorrelated: Multiple R2 vs. sum of r2’s.

In the very special case in which the two IVs are uncorrelated with each other, the multiple R2 is equal to the sum of the simple regression r2’s.

R2X1,X2 = r2

X1 + r2X2

Lecture 7: Two IV Regression - 19 Printed on 05/06/23

College GPA

Conscien-tiousness

CognitiveAbility

Note that there is no double-headed arrow between Cognitive Ability and Conscientiousness. It’s been left out of the diagram to show that there is zero correlation between the two Xs.

Example of essentially uncorrelated predictors from Reddock, Biderman, & Nguyen, IJSA, 2011.

Prediction of End-of-semester GPA from Wonderlic and Conscientiousness.

The fact that when predictors are uncorrelated, the r2s add up to multiple R2 is one reason that data analysts prefer r2 over r as a measure of strength of relationship. It’s one of the few additive things you’ll find in psychology.

I challenge you to find two psychological characteristics that can be added together to achieve a result that can be viewed as the sum of the two.

We have so, so far to go as a science.

That additivity means that when you combine predictors (as long as they’re uncorrelated with each other) you’ll know just what the result of that combination will be.

Important: The predictors must be uncorrelated with each other for this additivity to work.By the way – this example illustrates two predictors that are generally uncorrelated – cognitive ability (intelligence) and conscientiousness.

Lecture 7: Two IV Regression - 20 Printed on 05/06/23

r2 r .075

+.064 .140

Predictors not completely uncorrelated.

These computations are included to show that when predictors are uncorrelated, the sum of the simple r2s is equal to the multiple R2. In this case they’re almost equal.

So, to summarize the uncorrelated predictors example . . .

Simple Regression with just conscientiousness as a predictor

Predicted eosgpa = 0.7091.900 + .211 * forcon R = .274; R2 = .075

Simple Regression with just wpt as a predictor

Predicted eosgpa = 2.395 + .025 * wpt R = .252; R2 = .063

Multiple Regression with both greq and grev as predictors

Predicted eosgpa = 1.253 + .221 * forcon + .027 * wpt R = .381; R2 = .145

In general . . .

1. The multiple regression equation is kind of predictable from the simple regression equations.

If the predictors are perfectly uncorrelated (these are not), the slopes will be the same. The multiple regression intercept will not be the same as either of the simple regression intercepts, however.

2. The multiple regression multiple R2 is predictable from the simple r2s.

If the predictors are perfectly uncorrelated, the simple r2s will sum to the multiple R2.

Lecture 7: Two IV Regression - 21 Printed on 05/06/23

Example II. Somewhat Correlated Predictors. Start here on 3/6/17.

X1 and X2 are noncausally correlated. X1 and X2 jointly contribute to the effect.

The situation illustrated here is probably the modal situation.

Quantitative ability and Verbal ability and their effect on statistics course grades. In a statistics course with word problems, both quantitative ability and verbal ability would contribute to performance in the course. These two abilities are positively correlated; they’re both aspects of the general characteristic called cognitive ability.

Note that the only difference between this path diagram and the first one is that the independent variables are correlated. This is indicated by the presence of an arrow between them in the path diagram.

Multiple R2 vs. sum of r2’s. Alas, adding z correlated predictor does not add the “full” r2 associated with that predictor.

This means that the R2 for the two predictors won’t be as large as the sum of the two individual r2s.

R2X1,X2 <= r2

X1 + r2X2 and usually < r2

X1 + r2X2

This is because each simple r2 has a little bit of the r2 associated with the other variable in it. So adding them adds the overlap twice.

Lecture 7: Two IV Regression - 22 Printed on 05/06/23

Verbal Ability Score

Grade in Stat Course with Word Problems

QuantitativeAbility Score

Example of correlated predictors from Validation of Formula Score data. (Valdat09).

Of course, here the interpretation of the predictors is important . . .

grev: Among persons equal on greq, the relationship of p511g to grev is significantly positive.

greq: Among persons equal to grev, the relationship of p511g to greq is significantly positive.

We could also give the B values as expected changes in p511g for unit increases in grev or greq. But in psychology, we’re often mainly interested in the direction and significance of the relationship, not the actual values of the partial regression parameters.

Lecture 7: Two IV Regression - 23 Printed on 05/06/23

r2

.108

.151

.259

Predictors are significantly positively correlated.

Multiple R2 is less than sum of simple r2s.

So, to summarize the somewhat correlated predictors example . . .

Simple Regression with just grev as a predictor

Predicted p511g = 0.709 + .000341 * grev R = .329; R2 = .108

Simple Regression with just greq as a predictor

Predicted p511g = 0.665 + .000371 * greq R = .389; R2 = .151

Multiple Regression with both greq and grev as predictors

Predicted p511g = 0.573 + .000261 * grev + .000315 * greq R = .460; R2 = .211

In general . . .

1. The multiple regression equation is not predictable from the simple regression equations.

2. The multiple regression multiple R2 is unpredictably less than the sum of simple r2s.

Lecture 7: Two IV Regression - 24 Printed on 05/06/23

Doing Multiple Regression using rdmr.

R Load Packages Rcmdr

In rcmdrData Import data From SPSS file

Mike - The data are valdat09 in the Validation folder.

Lecture 7: Two IV Regression - 25 Printed on 05/06/23

Statistics Fit Models Linear Model

Lecture 7: Two IV Regression - 26 Printed on 05/06/23

The R Output

> LinearModel.1 <- lm(p511g ~ greq + grev, data=validation)> summary(LinearModel.1)

Call:lm(formula = p511g ~ greq + grev, data = validation)

Residuals: Min 1Q Median 3Q Max -0.235009 -0.043379 0.002829 0.049125 0.201680

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.729e-01 3.089e-02 18.549 < 2e-16 ***greq 3.148e-04 4.661e-05 6.755 5.99e-11 ***grev 2.611e-04 5.049e-05 5.171 3.93e-07 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.07205 on 350 degrees of freedom (87 observations deleted due to missingness)Multiple R-squared: 0.2112, Adjusted R-squared: 0.2067 F-statistic: 46.87 on 2 and 350 DF, p-value: < 2.2e-16

The SPSS Output, for comparison . . .

Since the SPSS and R results are identical, I will use SPSS to illustrate most of the following examples, since it’s output is easier to read than that of R.

Lecture 7: Two IV Regression - 27 Printed on 05/06/23

Example III. Mediation - the search for real causes.

Causation typically involves chains of variables.

For example, the relationship of Exposure to cold weather to Frequency of colds

Cold weather -> Indoor heating -> Dry air -> Drying of mucous membranes -> Greater opportunity for rhinoviruses to invade -> More rhinoviruses -> Greater Frequency of colds

In the example, Cold Weather is the distal (distant) cause of Colds. It’ll be symbolized as X here.

But presence of rhino viruses is the proximal (proximate) cause. It’ll be symbolized as M.

When the proximal cause comes between the distal cause and the outcome, we say that the proximal cause is a mediator of the relationship between the distal cause and the outcome.

Knowing the causal chain gives us more opportunity to understand and to predict and control the outcome. Knowing that it’s not the cold temperature but the dryness of the air that is a key factor, we could install humidifiers to make the air less dry. For example, suppose we don’t have control over the dryness of the air. We might use our knowledge of the causal chain to control colds by making sure the mucous membranes were not dry (if that were possible).

Another, recently supported example Start here on 3/6/18.

Conscientiousness leads to higher performance.

Conscientiousness is the distal cause of better performance.

A mediator: Time on task. (Time on task is the proximal cause.)

A mediation theory of the effect of conscientiousness on performance

Conscientiousness -> More Time on task -> Higher performance

Because time on task comes between conscientiousness and performance, we say that it mediates the conscientiousness -> performance relationship.

Lecture 7: Two IV Regression - 28 Printed on 05/06/23

X: Conscientiousness Y: AcademicPerformance

M: Time on task

Y: AcademicPerformance

X: Conscientiousness

Identifying Mediation Two classic articles on mediation are

Baron, R.M., & Kenny D. A. (1986). The moderator-mediator variables distinction in social psychological research: conceptual, strategic and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182.

Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods and Instrumentation, 36, 717-731.

The Preacher & Hayes procedures have become the standard procedure for investigating mediation. However, the Baron & Kenny procedure is better for understanding what we’re doing, so I’ll focus on it in this lecture.

Following Baron & Kenny, mediation is identified by a specific pattern of correlations among 3 variables.

The variables are

Y: The dependent variable, the effect or outcomeX: The distal “cause” of the dependent variable, Y.M: The mediator, the proximal cause of Y

Loosely speaking, if there is mediation: X causes M which in turn causes Y.

The pattern of correlations consistent with mediation is as follows:

1. X is correlated significantly with Y. 2. X is correlated significantly with M.3. M is correlated significantly with Y when controlling for X.

If the above 3 hold, then we say that M mediates the X-to-Y relationship to some extent.

4. X is not correlated with Y when controlling for M.

If 4 holds in addition to 1-3, then we say that M mediates the X-to-Y relationship completely.If 4 does not hold, then we say that M partially mediates the X-Y relationship.

Diagrammatically (The red arrows are the relationships in question.)

1.

2.

3.

4.

Lecture 7: Two IV Regression - 29 Printed on 05/06/23Y

M

X

YX

M

X

Y

M

X

Steps 1-4 determine if M completely mediates the X-Y relationship.

Steps 1-3 determine if M mediates the X-Yrelationship to some extent.

Example

Biderman, M., Sebren, J., & Nguyen, N. (2008). Time on task mediates the conscientiousness – performance relationship. Personality and Individual Differences, 44, 887-897.

Y: Scores on 1st Test in PSY 101X: Conscientiousness factor scores.M: Scores on a “How much time did you study” scale composed of 9 items.

Step 1: Test of relationship of criterion to independent variable. Is there anything to mediate?

StandardizedVariable Coefficien t t p

X: C Factor Score .20 2.36 < .05.

Step 2: Test of relationship of conscientiousness (X) to time-on-task (M).

StandardizedVariable Coefficient t p

X: C Factor Scores .20 2.40 < .05

Step 3: Test of relationship of study time (M) to test performance (Y) controlling for

conscientiousness – some mediation.

StandardizedVariable Coefficient t p

M: Time-on-tas .37 4.53 < .001

X: C Factor Scores .13 1.55 NS

Step 4: Test of relationship of criterion to distal cause controlling for mediator – full mediation..

StandardizedVariable Coefficient t p

M: Time-on-task .37 4.53 < .001

X: C Factor Scores .13 1.55 NS

So, the conclusion here is that Time-on-task completely mediates the Conscientiousness -> Test Performance relationship. This is called full mediation.

Conscientiousness leads to more studying. More studying leads to better test scores. Conscientiousness has no direct relationship to test performance.

Lecture 7: Two IV Regression - 30 Printed on 05/06/23

?

b

Y

M

X

YX

M

X

a

b

Y

M

X

Example IV. Multicollinearity: Highly Correlated Predictors. A situation which can sometimes lead to embarrassment on the part of regression analysts is one in which the IV’s in an analysis are highly correlated with each other.

In such cases, the multiple regression results may be very different from the simple regression results. It is here that the training of the MR analyst is put to a crucial test. For only through understanding of the key difference between simple regression (in which no attempt is made to control for / adjust for / hold constant other variables) and multiple regression, in which all other variables in the analysis are statistically controlled for, can the differences be interpreted.Suppose two predictors of Sales in an organization were available - a measure of Extraversion and a measure of gregariousness. Suppose that unbeknownst to the investigator the two measures are essentially tapping the same construct. Suppose that this construct is positively related to Sales. Hypothetical data follow.data list free /id greg EXTRAV sales.begin data. 1 39 35 370 2 59 60 580 3 50 50 640 4 55 54 480 5 57 52 360 6 58 51 570 7 62 58 530 8 40 49 490 9 36 30 390 10 52 54 420 11 41 42 460 12 49 46 250 13 44 45 800 14 62 68 680 15 58 65 600 16 51 49 510 17 49 48 580 18 70 67 760 19 47 47 500 20 40 50 360 21 52 47 520 22 46 49 330 23 25 26 160 24 58 54 530 25 49 55 620end data.

In multiple regression, the test of significance of each variable can be thought of as a test of the extent to which that variable adds to prediction of Y over and above the ability of the other variable(s) to predict Y.

When two variables are highly correlated, they're practically identical. That means neither gives us much information about Y that cannot be gotten by the other. So the test of each may be not significant, meaning that neither adds significantly to the prediction of Y over and above the ability of the other. The example here illustrates this possibility.

Lecture 7: Two IV Regression - 31 Printed on 05/06/23

ErrorSales

Extra/ov

Greg

Note - the curved arrow between Greg and EXTRAV was made large for pedagogical reasons. In practice, it’s not customary to use the size of the symbol to represent the size of correlations.

Simple Regression involving GregCo e ffic ie n tsa

4 7 .1 5 3 1 2 9 .3 6 6 .3 6 4 .7 1 9

9 .0 5 6 2 .5 4 2 .5 9 6 3 .5 6 2 .0 0 2

(Co n s ta nt )

GREG

Mo d e l1

B Std . E rro r

Un s ta n d a rd i z e dCo e f f i c i e n ts

Be ta

Sta n d a rdi z e d

Co e f f i c i en ts

t S i g .

De p e n d e n t Va ri a b l e : SAL ESa .

Simple regression involving EXTRAV (also spelled EXTROV in the following)Co e ffic ie n tsa

3 4 .8 8 2 1 2 4 .3 5 5 .2 8 1 .7 8 2

9 .2 8 7 2 .4 3 9 .6 2 2 3 .8 0 8 .0 0 1

(Co n s ta nt )

EXT ROV

Mo d e l1

B Std . E rro r

Un s ta n d a rd i z e dCo e f f i c i e n ts

Be ta

Sta n d a rdi z e d

Co e f f i c i en ts

t S i g .

De p e n d e n t Va ri a b l e : SAL ESa .

Multiple regression of Extrav and Greg

Corre la tions

1.000 .596 .622

.596 1 .000 .889

.622 .889 1 .000

SALES

GREG

EXTROV

Pears onCorre la tion

SALES GREG EXTROV

Co e ffic ie n tsa

1 5 .3 8 2 1 3 0 .7 5 8 .11 8 .9 0 7

3 .1 4 2 5 .5 0 3 .2 0 7 .5 7 1 .5 7 4

6 .5 4 0 5 .4 11 .4 3 8 1 .2 0 9 .2 4 0

(Co n s ta nt )

GREG

EXT ROV

Mo d e l1

B Std . E rro r

Un s ta n d a rd i z e dCo e f f i c i e n ts

Be ta

Sta n d a rdi z e d

Co e f f i c i en ts

t S i g .

De p e n d e n t Va ri a b l e : SAL ESa .

When E is in, Adding G does not increase R2 significantly.When G is in, adding E does not increase R2 significantly.

What’s going on? Both simple regressions WERE significant!!! Why are both partial coefficients not significant??Have I lost my mind???!!! (Don’t answer that.)

Lecture 7: Two IV Regression - 32 Printed on 05/06/23

r2 = .355

So Greg by itself significantly predicts Sales.

R2 = .396

EXTR

SALES

r2 = .383

So EXTRAV by itself significantly predicts Sales.

0

EXTRAV and GREG are nearly the same variable.

This means that when you hold one constant, there is no significant relationship of Y to the other.

.90

GREG0

The Answer: There is no contradiction.

1. The simple correlations. Each simple correlation was significant because variation in each X was accompanied by variation in the other X.

2. The partial correlations. But when we controlled for the other variable, because they’re so highly correlated, there was no meaningful remaining variation in the X remaining to correlate with Y.

So, the simple correlations give correct results – there is significant correlation in the relationship of each X with Y when the other X is allowed to covary.

And the multiple regression gives correct results – there is no significant correlation of either X with Y when holding the other X constant.

What to do in a situation like this.

For prediction.

You could use both Xs, but why would you? Use the one with the largest simple correlation – EXTRAVersion, in this case.

For theory

Both Xs are probably measuring the same thing. Get one of them out of your theory or combine them into a single measure.

Most important point

Don’t do just the multiple regression analysis. Doing so might have caused us to arrive at a terribly incorrect conclusion.

Lecture 7: Two IV Regression - 33 Printed on 05/06/23

Example V. Spurious Correlation between X2 and Y due to a third variable. Do churches cause crime? The simple correlation of no. of crimes in cities vs. number of churches in the same cities is positive, leading to the suggestion that having more churches leads to more crimes.This is a well-known example that illustrates how reliance on simple regression analysis could lead to the wrong inference about causality. A second independent variable, size of the city, helps unravel the mystery. Population size, a third variable, is found to be the culprit. Cities with larger populations have both more churches and more crime.

The issue is that two variables that would not normally be correlated are correlated due to the influence of a third variable on both.

In order to unmask the real relationship, we perform the following regression

We estimate the relationships represented by both single-headed arrows. Note, though, that in this analysis, the relationship of Number of Churches No. of Crimes is a partial relationship, which will be Not Significant in the analysis since Population Size has been controlled for.

Why the double-headed arrow between the predictors, Population Size and Number of Churches? Even though it’s clear that Population Size causes Number of Churches, there is a double-headed, not a single-headed arrow connecting them in the analysis diagram. That’s because it’s not common to attempt to indicate causality between predictors in the diagram of a multiple regression analysis – they’re treated as being just correlated.

Lecture 7: Two IV Regression - 34 Printed on 05/06/23

Naive Analysis:

Reality:

X2

A dashed line is used here to indicate that the correlation, while observed, is spurious, a result of a 3rd variable causing both.Y

X1

X2

PopulationSize

Number of Churches

No. of Crimes

X1

Number of Churches

Analysis to reveal reality:Y

No. of Crimes

PopulationSize

Number of Churches

No. of CrimesX2 Y

Example of spurious correlations: Evluative content hidden in Big Five items biases relationships of Big 5 scale scores to Self-esteem.

Here are correlations of Big 5 scale scores to the Rosenberg Self-esteem scale from a study conducted in 2010. N = 206.

The simple correlations in red are significantly different from 0 (p < .05).

Big Five Scales: Ext Agr Con Sta Opn

Correlations of Big 5 scales w Self-esteem .28 .19 .38 .24 .36Each correlation is significantly different from 0.

The problem here is that it really doesn’t make sense that ALL of the Big Five characteristics would be correlated with Self-esteem. It might make sense that \Extraversion, perhaps, would be correlated positively with Self-esteem. But Agreeableness? Conscientiousness? Openness to Experience?

It’s possible, since all of these are from self-report scales, that some factor influencing ALL THE RESPONSES may have inflated the correlations, as described in Lecture 3 on Scale Construction (p. 13 ff). We’ve since discovered that evaluative content common to all items is responded to differently by persons who are happy vs those who are unhappy when filling out the questionnaire. The personal characteristic represented by such differences is often called, affect, representing the level of happiness of the respondent at the time the questionnaire was completed. We figured out how to measure affect from the responses to the Big Five questionnaire. Thanks, Factor Analysis.

So, we measured Affect from the Big 5 data using a Confirmatory Factor Analysis model (covered in the Advanced SPSS course), added the Affect scores to the data file containing the Big 5, self-esteem, and performed multiple regression analyses in which the relationship of Self-esteem to each Big Five score was estimated controlling for Affect.

The model we investigated was

Since we don’t assume causal relations between predictors in regression models, the actual regression model applied was

Lecture 7: Two IV Regression - 35 Printed on 05/06/23

Affect

Big 5Scale score

Self-Esteem

Affect

Self-Esteem

Big 5Scale score

Here are the coefficients Tables from the analyses involving Self-esteem (rberg in the table)The Part Correlations on the right are correlations in which the effect of Affect has been removed.

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients

t Sig. Correlations

B Std. Error Beta Zero-order Partial Part

1

(Constant) 5.666 .371 15.271 .000

eorig50 E scale score of original 50-item questionnaire

-.003 .077 -.003 -.038 .970 .285 -.003 -.002

Affect .925 .211 .403 4.392 .000 .401 .295 .282

a. Dependent Variable: rberg Rosenberg Self Esteem ScaleCoefficientsa

Model Unstandardized Coefficients Standardized Coefficients

t Sig. Correlations

B Std. Error Beta Zero-order Partial Part

1

(Constant) 6.241 .509 12.259 .000

aorig50 -.111 .095 -.095 -1.164 .246 .188 -.081 -.075

Affect 1.054 .187 .459 5.646 .000 .401 .368 .362

a. Dependent Variable: rberg Rosenberg Self Esteem Scale

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients

t Sig. Correlations

B Std. Error Beta Zero-order Partial Part

1

(Constant) 4.247 .293 14.475 .000

corig50 .307 .063 .305 4.870 .000 .381 .323 .296

Affect .758 .143 .331 5.286 .000 .401 .348 .322

a. Dependent Variable: rberg Rosenberg Self Esteem Scale

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients

t Sig. Correlations

B Std. Error Beta Zero-order Partial Part

1

(Constant) 5.629 .302 18.635 .000

sorig50 .005 .070 .006 .077 .939 .242 .005 .005

Affect .911 .183 .397 4.980 .000 .401 .330 .320

a. Dependent Variable: rberg Rosenberg Self Esteem Scale

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients

t Sig. Correlations

B Std. Error Beta Zero-order Partial Part

1

(Constant) 4.292 .337 12.748 .000

oorig50 .281 .069 .264 4.092 .000 .359 .276 .253

Affect .743 .148 .324 5.011 .000 .401 .332 .310

a. Dependent Variable: rberg Rosenberg Self Esteem Scale

I would say that those relationships that were NOT reduced to nearly 0 are relationships that should be investigated further. So only TWO of the Big 5 are related to Self-esteem – C and O!!! (????) Why is self-esteem related to openness-to-experience?

Lecture 7: Two IV Regression - 36 Printed on 05/06/23

>

By the way – these results have been replicated with 6 datasets involving the NEO-FFI-3, HEXACO PI-R, and Big Five Inventory (BFI) original and revised questionnaires.

Lecture 7: Two IV Regression - 37 Printed on 05/06/23

Example VI. Ruling Out a predictor.Discovering that a predictor does not predict when controlling for another IV.Blood pressure vs. Age, Weight :

The example here is from real data. The data are blood pressure measures of kids. Blood pressure is the criterion. Child weight and age are the independent variables. Obviously, weight and age are positively correlated. We’ll see that in this case, the MR result is quite different from the results from simple regressions. Two scatterplots will suffice for the simple regression analyses.

Systolic BP vs. Age

2.5 5 7.5 10 12.5 15 17.5

age

60

80

100

120

140

160

sys1

R Sq Linear = 0.194

Systolic BP vs. Weight

0.0 30.0 60.0 90.0 120.0 150.0

wt

60

80

100

120

140

160

sys1

R Sq Linear = 0.259

Both simple relationships are positive and significant.From this we might conclude that older kids tend to have higher BPs than younger kids.We might also conclude that heavier kids tend to have higher BPs than kids not so heavy.Wow – these two results suggest that older kids who are heavy might have incredibly high systolic blood pressure.

Lecture 7: Two IV Regression - 38 Printed on 05/06/23

Simple r = .440

Based on this result, we might conclude that systolic blood pressure is influenced by age.

Many personality psychologists would kill for .44 correlations.

Simple r =.509

Based on this result, we might conclude that systolic blood pressure is influenced by weight.

Again, a .509 correlations is very large in behavioral data.

RegressionDescriptive Statistics

115.53 13.006 880

8.72 4.165 880

37.740 21.3649 880

sys1

age

wt

Mean Std. Dev iation N

Correl ations

1. 000 .440 .509

.440 1. 000 .828

.509 .828 1. 000

. .000 .000

.000 . .000

.000 .000 .

880 880 880

880 880 880

880 880 880

sys1

age

wt

sys1

age

wt

sys1

age

wt

Pearson Correlat ion

Sig. (1-t ailed)

N

sys1 age wt

Variables Entered/Removedb

wt, agea . EnterModel1

Variables EnteredVariablesRemov ed Method

Al l reques ted v ariables entered.a.

Dependent Variable: s y s 1b.

Model Summary

.510a .260 .259 11.198Model1

R R Square Adjusted R SquareStd. Error ofthe Est imate

Predictors: (Constant ), wt , agea.

ANO VAb

38715. 077 2 19357. 538 154. 360 . 000a

109980. 373 877 125. 405

148695. 450 879

Regr ession

Residual

Tot al

M odel1

Sum of Squar es df M ean Squar e F Sig.

Pr edict or s: ( Const ant ) , wt , agea.

Dependent Var iable: sys1b.

Lecture 7: Two IV Regression - 39 Printed on 05/06/23

1. The Multiple R is .510. Note that this value is only .001 larger than the simple r with WT of .509. So weight correlates .509 with BP. Weight + Age correlates .510 with BP. Hmm, this incredibly small difference between the multiple R and the simple r of one of the predictors is also a red flag.

Nothing unusual in the univariate descriptive statistics.

1. The simple r of SYS1 (blood pressure) and AGE is .440 (p<.001) as we would expect from the above scatterplot.

2. The simple r of SYS1 and WT is .509 (p<.001), also consistent with the above.

3. The high correlation of AGE with WT (.828) is a red flag. It indicates that we should not expect the MR results to mirror the simple regression findings. And they don’t.

Relationship of SYS1 to the combination of WT and AGE is statistically significant. But that’s because the simple correlation of sys1 with wt is so large that useless age can be included in the combination and the combination will remain significant.

Co e ffic ie n ts a

1 0 3 .3 3 6 .8 7 7 11 7 .8 6 0 .0 0 0

.1 8 7 .1 6 2 .0 6 0 1 .1 5 5 .2 4 8 .4 4 0 .0 3 9 .0 3 4

.2 8 0 .0 3 2 .4 6 0 8 .8 7 3 .0 0 0 .5 0 9 .2 8 7 .2 5 8

(Co n s ta n t)

a g e

wt

Mo d e l1

B Std . Erro r

Un s ta n d a rd iz e d Co e ffic ie n ts

Be ta

Sta n d a rd iz e dCo e ffic ie n ts

t Sig . Ze ro -o rd e r Pa rtia l Pa rt

Co rre la tio n s

De p e n d e n t Va ria b le : s y s 1a .

The multiple regression results give a different picture than that which we would have obtained from considering only the simple regression results.

1. AGE: Among persons equal on WT, there is NO relationship of sys1 to age (t = 1.155, p > .05).

This means that controlling for WT eliminated the relationship of SYS1 to AGE. Older kids do NOT have higher BPs than younger kids if those kids have the same WT. Said another way: There appears to be no UNIQUE relationship of BP to AGE. Instead the BP~age relationship that we observed occurred because variation in Age is contaminated by variation in weight.

2. WT: Among persons equal on AGE, there is a positive relationship of BP to WT. Heavier kids tend to have higher systolic blood pressure than kids not so heavy when they have the same age. This holds when AGE is allowed to vary with WT in the simple correlation and also when AGE is controlled. So this conclusion is gathering credibility.

3. These two results – no unique relationship to age + a unique relationship to weight – suggest (not prove) that only weight is a determiner of blood pressure.

This result is an example of one in which the multiple regression results really don’t mirror the simple regression results. It is something to be expected when the independent variables are correlated with each other, as they were here.

Implications from this example . . .

1. The results of multiple regressions may not reflect the results of simple regressions.

2. The results of a multiple regression may appear to be at odds with the results of simple regressions.

3. The results of multiple regressions must be interpreted correctly – as results that hold only among persons equal on the other independent variables.

This is an example of the use of multiple regression to control or statistically hold constant a 2nd variable. If you were able to pick people of equal weight, there would be no relationship of BP to their ages.

Lecture 7: Two IV Regression - 40 Printed on 05/06/23

Example VII. Incremental validity. Part of an ongoing UTC research program has been the examination of personality as a predictor of academic performance. The interest is in 1) whether or not personality, by itself, is a predictor and 2) whether or not it adds to the validity of other predictors, such as ACT. This second question concerns what is called the incremental validity of personality as a predictor.

Incremental validity: Validity (significance) of a predictor when controlling for other predictors.

We usually say, incremental validity with respect to (or over) the other predictors.E.g. incremental validity of personality over ACT as a predictor of GPA.

We’re not the only ones asking this question. Here’s a reference to a fairly recent study asking the same question about a measure of perceptual speed and accuracy and about conscientiousness.

Mount, M. K., Oh, I., & Burns, M. (2008). Incremental validity of perceptual speed and accuracy over general mental ability. Personnel Psychology, 61, 113-139.

They had five hypotheses, two of which were hypotheses involving incremental validity.

Hypothesis 1: General Mental Ability (GMA) will positively predict warehouse workers’ task performance.

Hypothesis 2: Conscientiousness will positively predict all three dimensions of warehouse workers’ performance: task performance, Org Citizenship Behaviors (OCBs), and Rules Compliance (RC).

Hypothesis 3: The Number Correct on the Perceptual Speed test will positively predict warehouse workers’ task performance.

Hypothesis 4: The Number Correct on the Perceptual Speed test will show incremental validity in task performance after controlling for the effects of GMA.

Hypothesis 5: Conscientiousness will show incremental validity in task performance after controlling for the effects of both GMA and Number Correct on the PS test.

Lecture 7: Two IV Regression - 41 Printed on 05/06/23

Our Question: Incremental Validity of Conscientiousness over ACT

If UTC, which requires the ACT, added a measure of Conscientiousness, would adding a measure of Conscientiousness increase our ability to predict who would do well and who would do poorly?

1. Is ACT, by itself, a predictor of academic performance? Jus’ askin’.

Coefficientsa

Model Unstandardized Coefficients

Standardized

Coefficients

t Sig.

B Std. Error Beta

1(Constant) .862 .288 2.997 .003ACtComp .097 .013 .482 7.394 .000

a. Dependent Variable: eosgpa

So ACT IS a valid predictor of eosgpa (end-of-semester GPA). FYI, r2 = .232. (r=.482)

2. Is personality, by itself, a predictor of academic performance? Again, jus’ askin’.

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients

t Sig.

B Std. Error Beta

1(Constant) 1.562 .307 5.088 .000forcon .279 .060 .325 4.618 .000

a. Dependent Variable: eosgpa

The answer is, Yes, Conscientiousness is a valid predictor of academic performance, as measured by GPA.

So – ACT by itself is a significant predictor of GPA. And Conscientiousness, by itself, is also a significant predictor.

The previous example involving age, weight and blood pressure should have us sensitized to the possibility that two variables with significant simple correlations with a criterion may not both have unique relationships with that criterion. So the next question is:

Will conscientiousness add significantly to the prediction already afforded by ACT? That is, will conscientiousness add to validity of predictions already achieved by ACT - will it show incremental validity after controlling for ACT?

Why bother with conscientiousness? For example, it could be argued that conscientiousness is a part of the ACT score, for example, that ACT leads to conscientiousness AND to gpa. If that were the case, conscientiousness is merely a by-product of ACT and wouldn’t add anything to prediction of GPA over what we can get using only ACT scores. This is a testable hypothesis, the best kind.

Lecture 7: Two IV Regression - 42 Printed on 05/06/23

3. Does conscientiousness have incremental validity?

Correlationseosgpa ACtComp forcon

Pearson Correlationeosgpa 1.000 .482 .325ACtComp .482 1.000 .168forcon .325 .168 1.000

Model SummaryModel R R Square Adjusted R

SquareStd. Error of the

Estimate1 .541a .293 .285 .580324a. Predictors: (Constant), forcon, ACtComp

Coefficientsa

Model Unstandardized Coefficients

Standardized Coefficients

t Sig. Correlations

B Std. Error Beta Zero-order Partial Part

1(Constant) -.035 .358 -.099 .921ACtComp .088 .013 .439 6.912 .000 .482 .458 .433forcon .215 .055 .251 3.943 .000 .325 .282 .247

a. Dependent Variable: eosgpa

The answer is, Yes, Conscientiousness yields incremental validity in the prediction of GPA over and above ACT comprehensive scores.

When controlling for ACT, Conscientiousness adds significantly to prediction of gpa.

r2 for ACT only is .232 (see previous page).

R2 for ACT + FORCON is .293, an increase of .061.

The significance of the increase in R2 is tested by the t-score on the FORCON line in the Coefficients Table. t=3.943, p < .01.

The prediction equation, although it would probably not be used here is

Predicted GPA = 0.-0.035 + 0.088*ACTCOMP + 0.215*FORCON.

Lecture 7: Two IV Regression - 43 Printed on 05/06/23

What have we learned from the examples?

I. Uncorrelated predictors.

If predictors are uncorrelated, simple r2s (not rs) are additive – the R2 of the multiple regression is equal to the sum of the r2s of the simple regressions.

II. Somewhat Correlated Predictors.

All bets are off regarding the relationship of the multiple regression to the simple correlations.

III. Mediation – the search for real causes.

Appropriately chosen multiple regression analyses can be used to identify mediation.

IV. – Multicollinearity – Highly correlated predictors

This example demonstrates that two or more predictors can have significant simple correlations with a criterion while none of them has a significant unique correlation with the same criterion when controlling for the others.

V. Spurious correlations between X2 and YU due to a third variable.

Appropriately framed multiple regression analysis can identify spurious correlations.

VI. – Ruling out a predictor.

This example shows that a variable’s relationship to a criterion can completely disappear when another variable is controlled for. In the example, a simple relationship of blood pressure to age was found, but when weight was held constant, there was no unique relationship of BP to age – the relationship was completely due to the age’s relationship to weight, and it was weight that was the true determiner of blood pressure. The simple relationship was due to the fact that older people are heavier and have slightly higher BSs because of their greater weight, not because of their greater age.

VII. – Incremental Validity – Conscientiousness over ACT scores

This example shows that in many cases, we’re not only interested in the absolute relationship of a predictor to a criterion, but also interested in whether or not it adds to our ability to predict over and above other predictors. The amount that a predictor adds to our ability to predict is called the incremental validity of the predictor.

Lecture 7: Two IV Regression - 44 Printed on 05/06/23