us.sagepub.com · 2020. 9. 6. · created date: 1/14/2020 1:54:25 pm

34
99 4.1 INTRODUCTION Bivariate regression involves one predictor and one quantitative outcome variable. Adding a second predictor shows how statistical control works in regression analysis. The previous chap- ter described two ways to understand statistical control. In the previous chapter, the outcome variable was denoted Y, the predictor of interest was denoted X 1 , and the control variable was called X 2 . 1. We can control for an X 2 variable by dividing data into groups on the basis of X 2 scores and then analyzing the X 1 , Y relationship separately within these groups. Results are rarely reported this way in journal articles; however, examining data this way makes it clear that the nature of an X 1 , Y relationship can change in many ways when you control for an X 2 variable. 2. Another way to control for an X 2 variable is obtaining a partial correlation between X 1 and Y, controlling for X 2 . This partial correlation is denoted r 1Y.2 . Partial correlations are not often reported in journal articles either. However, thinking about them as correlations between residuals helps you understand the mechanics of statistical control. A partial correlation between X 1 and Y, controlling for X 2 , can be understood as a correlation between the parts of the X 1 scores that are not related to X 2 , and the parts of the Y scores that are not related to X 2 . This chapter introduces the method of statistical control that is most widely used and reported. This method involves using both X 1 and X 2 as predictors of Y in a multiple linear regression. This analysis provides information about the way X 1 is related to Y, controlling for X 2 , and also about the way X 2 is related to Y, controlling for X 1 . This is called “multiple” regression because there are multiple predictor variables. Later chapters discuss analyses with more than two predictors. It is called “linear” because all pairs of variables must be linearly related. The equation to predict a raw score for the Y outcome variable from raw scores on X 1 and X 2 is as follows: Y= b 0 + b 1 X 1 + b 2 X 2 . (4.1) There is also a standardized (or unit-free) form of this predictive equation to predict z scores for Y from z scores on X 1 and X 2 : CHAPTER 4 REGRESSION ANALYSIS AND STATISTICAL CONTROL Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher. Do not copy, post, or distribute

Upload: others

Post on 31-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • 99

    4.1 INTRODUCTION

    Bivariate regression involves one predictor and one quantitative outcome variable. Adding a second predictor shows how statistical control works in regression analysis. The previous chap-ter described two ways to understand statistical control. In the previous chapter, the outcome variable was denoted Y, the predictor of interest was denoted X1, and the control variable was called X2.

    1. We can control for an X2 variable by dividing data into groups on the basis of X2 scores and then analyzing the X1, Y relationship separately within these groups. Results are rarely reported this way in journal articles; however, examining data this way makes it clear that the nature of an X1, Y relationship can change in many ways when you control for an X2 variable.

    2. Another way to control for an X2 variable is obtaining a partial correlation between X1 and Y, controlling for X2. This partial correlation is denoted r1Y.2. Partial correlations are not often reported in journal articles either. However, thinking about them as correlations between residuals helps you understand the mechanics of statistical control. A partial correlation between X1 and Y, controlling for X2, can be understood as a correlation between the parts of the X1 scores that are not related to X2, and the parts of the Y scores that are not related to X2.

    This chapter introduces the method of statistical control that is most widely used and reported. This method involves using both X1 and X2 as predictors of Y in a multiple linear regression. This analysis provides information about the way X1 is related to Y, controlling for X2, and also about the way X2 is related to Y, controlling for X1. This is called “multiple” regression because there are multiple predictor variables. Later chapters discuss analyses with more than two predictors. It is called “linear” because all pairs of variables must be linearly related. The equation to predict a raw score for the Y outcome variable from raw scores on X1 and X2 is as follows:

    Y′ = b0 + b1X1 + b2X2. (4.1)

    There is also a standardized (or unit-free) form of this predictive equation to predict z scores for Y from z scores on X1 and X2:

    CHAPTER 4

    REGRESSION ANALYSIS AND STATISTICAL CONTROL

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 100 APPLIED STATISTICS II

    z′Y = β1zX1 + β2zX2. (4.2)

    Equation 4.2 corresponds to the path model in Figure 4.1.The information from the sample that is used for this regression is the set of bivariate

    correlations among all predictors: r12, r1Y, and r2Y. The values of the coefficients for paths from zX1 and zX2 to zY (denoted β1 and β2 in Figure 4.1) are initially unknown. Their values can be found from the set of three bivariate correlations, as you will see in this chapter. The β1 path coefficient represents the strength of prediction of zY from zX1, controlling for zX2. The β2 path coefficient represents the strength of prediction of zY from zX2, controlling for zX1. A regression analysis that includes zX1 and zX2 as predictors of zY, as shown in Equation 4.1, provides estimates for these β coefficients. In regression, the predictive contribution of each independent variable (e.g., zX1) is represented by a β coefficient, and the strengths of associations are assessed while statistically controlling for all other independent variables (in this example, controlling for zX2).

    This analysis provides information that is relevant to the following questions:

    1. How well does the entire set of predictor variables (X1 and X2 together) predict Y? Both a statistical significance test and an effect size are provided.

    2. How much does each individual predictor variable (X1 alone, X2 alone) contribute to prediction of Y? Each predictor variable has a significance test to evaluate whether its b slope coefficient differs significantly from zero, effect size information (i.e., the percentage of variance in Y that can be predicted by X1 alone, controlling for X2), and the percentage of variance in Y that can be predicted by X2 alone, controlling for X1.

    The b1 and b2 regression coefficients in Equation 4.1 are partial slopes. That is, b1 repre-sents the number of units of change in Y that are predicted for each one-unit increase in X1 when X2 is statistically controlled or partialled out of X1. In many research situations, X1 and X2 are partly redundant (or correlated) predictors of Y; in such situations, we need to control for,

    Figure 4.1 Path Model: Standardized Regression to Predict zY From Correlated Predictors zX1 and zX2

    ZX1

    β1 β2

    ZX2

    r12

    ZY

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 101

    or partial out, the part of X1 that is correlated with or predictable from X2 to avoid “double counting” the information that is contained in both the X1 and X2 variables.

    To understand why this is so, consider a trivial prediction problem. Suppose that you want to predict people’s total height in inches (Y) from two measurements that you make using a yardstick: distance from hip to top of head (X1) and distance from waist to floor (X2). You cannot predict Y by summing X1 and X2, because X1 and X2 contain some duplicate information (the distance from waist to hip). The X1 + X2 sum would overestimate Y because it includes the waist-to-hip distance twice. When you perform a multiple regression of the form shown in Equation 4.1, the b coefficients are adjusted so that information included in both variables is not double counted. Each variable’s contribution to the prediction of Y is estimated using computations that partial out other predictor variables; this corrects for, or removes, any information in the X1 score that is predictable from the X2 score (and vice versa).

    To compute coefficients for the bivariate regression equation Y′ = b0 + bX, we need the correlation between X and Y (rXY), as well as the means and standard deviations of X and Y. In regression analysis with two predictor variables, we need the means and standard deviations of Y, X1, and X2 and the correlation between each predictor variable and the outcome vari-able Y (r1Y and r2Y). We also need to know about (and adjust for) the correlation between the predictor variables (r12).

    Multiple regression is a frequently reported analysis that includes statistical control. Most published regression analyses include more than two predictor variables. Later chapters discuss analyses that include larger numbers of predictors. All techniques covered later in this book incorporate similar forms of statistical control for correlation among multiple predic-tors (and later, correlations among multiple outcome variables).

    4.2 HYPOTHETICAL RESEARCH EXAMPLE

    Suppose that a researcher measures age (X1) and weight (X2) and uses these two variables to predict blood pressure (Y). Data are in the file ageweightbp.sav. In this situation, it would be reasonable to expect that the predictor variables would be correlated with each other to some extent (e.g., as people get older, they often tend to gain weight). It is plausible that both predic-tor variables might contribute unique information toward the prediction of blood pressure. For example, weight might directly cause increases in blood pressure, but in addition, there might be other mechanisms through which age causes increases in blood pressure; for example, age-related increases in artery blockage might also contribute to increases in blood pressure. In this analysis, we might expect to find that the two variables together are strongly predictive of blood pressure and that each predictor variable contributes significant unique predictive information. Also, we would expect that both coefficients would be positive (i.e., as age and weight increase, blood pressure should also tend to increase).

    Many outcomes are possible when two variables are used as predictors in a multiple regres-sion. The overall regression analysis can be either significant or not significant, and each predic-tor variable may or may not make a statistically significant unique contribution. As we saw in the discussion of partial correlation, the assessment of the contribution of an individual predictor variable controlling for another variable can lead to the conclusion that a predictor provides use-ful information even when another variable is statistically controlled. Conversely, a predictor can become nonsignificant when another variable is statistically controlled. The same types of inter-pretations (e.g., spuriousness, possible mediated relationships) described for partial correlation outcomes can be considered possible explanations for multiple regression results. In this chapter, we will examine the two-predictor situation in detail; comprehension of the two-predictor situa-tion is extended to regression analyses with more than two predictors in later chapters.

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 102 APPLIED STATISTICS II

    When we include two (or more) predictor variables in a regression, we sometimes choose one or more of the predictor variables because we hypothesize that they might be causes of the Y variable or at least useful predictors of Y. On the other hand, sometimes rival predictor variables are included in a regression because they are correlated with, confounded with, or redundant with a primary explanatory variable; in some situations, researchers hope to dem-onstrate that a rival variable completely “accounts for” the apparent correlation between the primary variable of interest and Y, while in other situations, researchers hope to show that rival variables do not completely account for any correlation of the primary predictor vari-able with the Y outcome variable. Sometimes a well-chosen X2 control variable can be used to partial out sources of measurement error in another X1 predictor variable (e.g., verbal ability is a common source of measurement error when written tests are used to assess skills that are largely nonverbal, such as playing tennis or mountain survival). An X2 variable may also be included as a predictor because the researcher suspects that the X2 variable may “suppress” the relationship of another X1 predictor variable with the Y outcome variable.

    4.3 GRAPHIC REPRESENTATION OF REGRESSION PLANE

    For bivariate (one-predictor) regression, a two-dimensional graph (the scatterplot of Y values for each value of X) is sufficient. The regression prediction equation Y′ = b0 + bX corresponds to a line on this scatterplot. If the regression fits the data well, most actual Y scores fall relatively close to the regression line. The b coefficient represents the slope of this line (for a one-unit increase in X, the regression equation predicts a b-unit increase in Y′).

    Figure 4.2 Three-Dimensional Graph of Multiple Regression Plane With X1 and X2 as Predictors of Y

    200

    180

    160

    140

    120

    100

    100

    160

    130

    190

    220 3040

    5060

    7080

    Y

    Blo

    od

    Pre

    ssu

    re

    X2 W

    eightX1

    Age

    Source: Reprinted with permission from Palmer, M., http://ordination.okstate.edu/plane.jpg.

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 103

    When we add a second predictor variable, X2, we need a three-dimensional graph to represent the pattern on scores for three variables. Imagine a cube with X1, X2, and Y dimen-sions; the data points form a cluster in this three-dimensional space. For a good fit, we need a regression plane that has the actual points clustered close to it in this three-dimensional space. See Figure 4.2 for a graphic representation of a regression plane.

    A more concrete way to visualize this situation is to imagine the X1, X2 points as loca-tions on a tabletop (where X1 represents the location of a point relative to the longer side of the table and X2 represents the location along the shorter side). You could draw a grid on the top of the table to show the location of each subject’s X1, X2 pair of scores on the flat plane represented by the tabletop. When you add a third variable, Y, you need to add a third dimension to show the location of the Y score that corresponds to each particular pair of X1, X2 score values; the Y values can be represented by points that float in space above the top of the table. For example, X1 can be age, X2 can be weight, and Y can be blood pressure. The regression plane can then be represented by a piece of paper held above the tabletop, oriented so that it is centered within the cluster of data points that float in space above the table. The b1 slope represents the degree of tilt in the paper in the X1 direction, parallel to the width of the table (i.e., the slope to predict blood pressure from age for a specific weight). The b2 slope represents the slope of the paper in the X2 direction, paral-lel to the length of the table (i.e., the slope to predict blood pressure from weight at some specific age).

    Thus, the partial slopes b1 and b2, described earlier, can be understood in terms of this graph. The b1 partial slope (in the regression equation Y′ = b0 + b1X1 + b2X2) has the following verbal interpretation: For a one-unit increase in scores on X1, the best fitting regression equa-tion makes a b1-point increase in the predicted Y′ score (controlling for or partialling out any changes associated with the other predictor variable, X2).

    4.4 SEMIPARTIAL (OR “PART”) CORRELATION

    The previous chapter described how to calculate and interpret a partial correlation between X1 and Y, controlling for X2. One way to obtain rY1.2 (the partial correlation between X1 and Y, controlling for X2) is to perform a simple bivariate regression to predict X1 from X2, run another regression to predict Y from X2, and then correlate the residuals from these two regressions (X*1 and Y*). This correlation is denoted by r1Y.2, which is read as “the partial correlation between X1 and Y, controlling for X2.” This partial r tells us how X1 is related to Y when X2 has been removed from or partialled out of both the X1 and the Y variables. The squared partial r correlation, r

    2Y1.2,

    can be interpreted as the proportion of variance in Y that can be predicted from X1 when all the variance that is linearly associated with X2 is removed from both the X1 and the Y variables.

    Partial correlations are sometimes reported in studies where the researcher wants to assess the strength and nature of the X1, Y relationship with the variance that is linearly asso-ciated with X2 completely removed from both variables. This chapter introduces a slightly different statistic (the semipartial or part correlation) that provides information about the partition of variance between predictor variables X1 and X2 in regression in a more conve-nient form. A semipartial correlation is calculated and interpreted slightly differently from the partial correlation, and a different notation is used. The semipartial (or “part”) correlation between X1 and Y, controlling for X2, is denoted by rY(1.2). Another common notation for the semipartial correlation is sri, where Xi is the predictor variable. In this notation for semipartial correlation, it is implicit that the outcome variable is Y; the predictive association between Xi and Y is assessed while removing the variance from Xi that is shared with any other predictor variables in the regression equation. The parentheses around 1.2 indicate that X2 is partialled out of only X1. It is not partialled out of Y, which is outside the parentheses.

    To obtain this semipartial correlation, we remove the variance that is associated with X2 from only the X1 predictor (and not from the Y outcome variable). For example, to obtain

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 104 APPLIED STATISTICS II

    the semipartial correlation rY(1.2), the semipartial correlation that describes the strength of the association between Y and X1 when X2 is partialled out of X1, do the following:

    1. First, run a simple bivariate regression to predict X1 from X2. Obtain the residuals (X*1) from this regression. X*1 represents the part of the X1 scores that is not predictable from or correlated with X2.

    2. Then, correlate X*1 with Y to obtain the semipartial correlation between X1 and Y, controlling for X2. Note that X2 has been partialled out of, or removed from, only the other predictor variable, X1; the variance associated with X2 has not been partialled out of or removed from Y, the outcome variable.

    This is called a semipartial correlation because the variance associated with X2 is removed from only one of the two variables (and not removed entirely from both X1 and Y as in partial correlation analysis).

    It is also possible to compute the semipartial correlation, rY(1.2), directly from the three bivariate correlations (r12, r1Y, and r2Y):

    rr r r

    r1.

    Y

    Y Y

    1.2

    1 2 12

    12

    2

    ( )=

    − ×

    −( ) (4.3)

    In many data sets, the partial and semipartial correlations (between X1 and Y, controlling for X2) yield similar values. The squared semipartial correlation has a simpler interpreta-tion than the squared partial correlation when we want to describe the partitioning of vari-ance among predictor variables in a multiple regression. The squared semipartial correlation between X1 and Y, controlling for X2—that is, r

    2Y(1.2) or sr

    21—is equivalent to the proportion

    of the total variance of Y that is predictable from X1 when the variance that is shared with X2 has been partialled out of X1. It is more convenient to report squared semipartial correlations (instead of squared partial correlations) as part of the results of regression analysis.

    4.5 PARTITION OF VARIANCE IN Y IN REGRESSION WITH TWO PREDICTORS

    In multiple regression analysis, one goal is to obtain a partition of variance for the dependent variable Y (blood pressure) into variance that can be accounted for or predicted by each of the predictor variables, X1 (age) and X2 (weight), taking into account the overlap or correlation between the predictors. Overlapping circles can be used to represent the proportion of shared variance (r2) for each pair of variables in this situation, as shown in Figure 4.3. Each circle has a total area of 1 (this represents the total variance of zY, for example). For each pair of variables, such as X1 and Y, the squared correlation between X1 and Y (i.e., r

    2Y1) corresponds to the pro-

    portion of the total variance of Y that overlaps with X1, as shown in Figure 4.3.The total variance of the outcome variable (such as Y, blood pressure) corresponds to

    the entire circle in Figure 4.3 with sections that are labeled a, b, c, and d. We will assume that the total area of this circle corresponds to the total variance of Y and that Y is given in z-score units, so the total variance or total area a + b + c + d in this diagram corresponds to a value of 1.0. As in earlier examples, overlap between circles that represent different variables corresponds to squared correlation; the total area of overlap between X1 and Y (which cor-responds to the sum of Areas a and c) is equal to r21Y, the squared correlation between X1 and Y. One goal of multiple regression is to obtain information about the partition of variance in the outcome variable into the following components. Area d in the diagram corresponds

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 105

    to the proportion of variance in Y that is not predictable from either X1 or X2. Area a in this diagram corresponds to the proportion of variance in Y that is uniquely predictable from X1 (controlling for or partialling out any variance in X1 that is shared with X2). Area b cor-responds to the proportion of variance in Y that is uniquely predictable from X2 (controlling for or partialling out any variance in X2 that is shared with the other predictor, X1). Area c corresponds to a proportion of variance in Y that can be predicted by either X1 or X2. We can use results from a multiple regression analysis that predicts Y from X1 and X2 to deduce the proportions of variance that correspond to each of these areas, labeled a, b, c, and d, in this diagram.

    We can interpret squared semipartial correlations as information about variance parti-tioning in regression. We can calculate zero-order correlations among all these variables by running Pearson correlations of X1 with Y, X2 with Y, and X1 with X2. The overall squared zero-order bivariate correlations between X1 and Y and between X2 and Y correspond to the areas that show the total overlap of each predictor variable with Y as follows:

    a + c = r2Y1,

    b + c = r2Y2.

    The squared partial correlations and squared semipartial r’s can also be expressed in terms of areas in the diagram in Figure 4.3. The squared semipartial correlation between X1 and Y, controlling for X2, corresponds to Area a in Figure 4.3; the squared semipartial cor-relation sr21 can be interpreted as “the proportion of the total variance of Y that is uniquely predictable from X1.” In other words, sr

    21 (or r

    2Y[1.2]) corresponds to Area a in Figure 4.3.

    Figure 4.3 Partition of Variance of Y in a Regression With Two Predictor Variables, X1 and X2

    Y

    X1 X2

    d

    b

    c

    a

    Note: The areas a, b, c, and d correspond to the following proportions of variance in Y, the outcome variable: Area a sr21, the proportion of variance in Y that is predictable uniquely from X1 when X2 is statistically con-trolled or partialled out; Area b sr22, the proportion of variance in Y that is predictable uniquely from X2 when X1 is statistically controlled or partialled out; Area c, the proportion of variance in Y that could be explained by either X1 or X2 (Area c can be obtained by subtraction, e.g., c = 1 – [a + b + d]); Area a + b + c R

    2Y.12, the overall

    proportion of variance in Y predictable from X1 and X2 combined; Area d 1 – R2Y.12, the proportion of variance

    in Y that is not predictable from either X1 or X2.

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 106 APPLIED STATISTICS II

    The squared partial correlation has a somewhat less convenient interpretation; it cor-responds to a ratio of areas in the diagram in Figure 4.3. When a partial correlation is cal-culated, the variance that is linearly predictable from X2 is removed from the Y outcome variable, and therefore, the proportion of variance that remains in Y after controlling for X2 corresponds to the sum of Areas a and d. The part of this remaining variance in Y that is uniquely predictable from X1 corresponds to Area a; therefore, the squared partial correlation between X1 and Y, controlling for X, corresponds to the ratio a/(a + d). In other words, pr

    21

    (or r2Y1.2) corresponds to a ratio of areas, a/(a + d).We can reconstruct the total variance of Y, the outcome variable, by summing Areas a,

    b, c, and d in Figure 4.3. Because Areas a and b correspond to the squared semipartial correla-tions of X1 and X2 with Y, it is more convenient to report squared semipartial correlations (instead of squared partial correlations) as effect size information for a multiple regression. Area c represents variance that could be explained equally well by either X1 or X2.

    In multiple regression, we seek to partition the variance of Y into components that are uniquely predictable from individual variables (Areas a and b) and areas that are explainable by more than one variable (Area c). We will see that there is more than one way to interpret the variance represented by Area c. The most conservative strategy is not to give either X1 or X2 credit for explaining the variance that corresponds to Area c in Figure 4.3. Areas a, b, c, and d in Figure 4.3 correspond to proportions of the total variance of Y, the outcome variable, as given in the table below the overlapping circles diagram.

    In words, then, we can divide the total variance of scores on the Y outcome variable into four components when we have two predictors: the proportion of variance in Y that is uniquely predictable from X1 (Area a, sr

    21), the proportion of variance in Y that is uniquely

    predictable from X2 (Area b, sr2

    2), the proportion of variance in Y that could be predicted from either X1 or X2 (Area c, obtained by subtraction), and the proportion of variance in Y that can-not be predicted from either X1 or X2 (Area d, 1 – R

    2Y.12).

    Note that the sum of the proportions for these four areas, a + b + c + d, equals 1 because the circle corresponds to the total variance of Y (an area of 1.00). In this chapter, we will see that information obtained from the multiple regression analysis that predicts scores on Y from X1 and X2 can be used to calculate the proportions that correspond to each of these four areas (a, b, c, and d). When we write up results, we can comment on whether the two variables combined explained a large or a small proportion of variance in Y; we can also note how much of the variance was predicted uniquely by each predictor variable.

    If X1 and X2 are uncorrelated with each other, then there is no overlap between the circles that correspond to the X1 and X2 variables in this diagram and Area c is 0. However, in most applications of multiple regression, X1 and X2 are correlated with each other to some degree; this is represented by an overlap between the circles that represent the variances of X1 and X2. When some types of suppression are present, the value obtained for Area c by taking 1.0 – Area a – Area b – Area d can actually be a negative value; in such situations, the overlapping circle dia-gram may not be the most useful way to think about variance partitioning. The partition of vari-ance that can be made using multiple regression allows us to assess the total predictive power of X1 and X2 when these predictors are used together and also to assess their unique contributions, so that each predictor is assessed while statistically controlling for the other predictor variable.

    In regression, as in many other multivariable analyses, the researcher can evaluate results in relation to several different questions. The first question is, Are the two predictor vari-ables together significantly predictive of Y? Formally, this corresponds to the following null hypothesis:

    H0: RY.12 = 0. (4.4)

    In Equation 4.4, an explicit notation is used for R (with subscripts that specifically indi-cate the dependent and independent variables). That is, RY.12 denotes the multiple R for a

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 107

    regression equation in which Y is predicted from X1 and X2. In this subscript notation, the variable to the left of the period in the subscript is the outcome or dependent variable; the numbers to the right of the period represent the subscripts for each of the predictor variables (in this example, X1 and X2). This explicit notation is used when it is needed to make it clear exactly which outcome and predictor variables are included in the regression.

    In most reports of multiple regression, these subscripts are omitted, and it is understood from the context that R2 stands for the proportion of variance explained by the entire set of predictor variables that are included in the analysis. Subscripts on R and R2 are generally used only when it is necessary to remove possible ambiguity. Thus, the formal null hypothesis for the overall multiple regression can be written more simply as follows:

    H0: R = 0. (4.5)

    Recall that multiple R refers to the correlation between Y and Y′ (i.e., the correlation between observed scores on Y and the predicted Y′ scores that are formed by summing the weighted scores on X1 and X2, Y′ = b0 + b1X1 + b2X2).

    A second set of questions that can be addressed using multiple regression involves the unique contribution of each individual predictor. Sometimes, data analysts do not test the significance of individual predictors unless the F for the overall regression is statistically sig-nificant. Requiring a significant F for the overall regression before testing the significance of individual predictor variables used to be recommended as a way to limit the increased risk for Type I error that arises when many predictors are assessed; however, the requirement of a significant overall F for the regression model as a condition for conducting significance tests on individual predictor variables probably does not provide much protection against Type I error in practice.

    For each predictor variable in the regression—for instance, for Xi—the null hypothesis can be set up as follows:

    H0: bi = 0, (4.6)

    where bi represents the unknown population raw-score slope1 that is estimated by the sample

    slope. If the bi coefficient for predictor Xi is statistically significant, then there is a significant increase in predicted Y values that is uniquely associated with Xi (and not attributable to other predictor variables).

    It is also possible to ask whether X1 is more strongly predictive of Y than X2 (by compar-ing β1 and β2). However, comparisons between regression coefficients must be interpreted very cautiously; factors that artifactually influence the magnitude of correlations can also artifactually increase or decrease the magnitude of slopes.

    4.6 ASSUMPTIONS FOR REGRESSION WITH TWO PREDICTORS

    For the simplest possible multiple regression with two predictors, as given in Equation 4.1, the assumptions that should be satisfied are basically the same as the assumptions for Pearson cor-relation and bivariate regression. Ideally, all the following conditions should hold:

    1. The Y outcome variable should be a quantitative variable with scores that are approximately normally distributed. Possible violations of this assumption can be assessed by looking at the univariate distributions of scores on Y. The X1 and X2 predictor variables should be normally distributed and quantitative, or one or

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 108 APPLIED STATISTICS II

    both of the predictor variables can be dichotomous (or dummy) variables. If the outcome variable, Y, is dichotomous, then a different form of analysis (binary logistic regression) should be used.

    2. The relations among all pairs of variables (X1, X2), (X1, Y), and (X2, Y) should be linear. This assumption of linearity can be assessed by examining bivariate scatterplots for all possible pairs of these variables. Scatterplots should not have any extreme bivariate outliers.

    3. There should be no interactions between variables, such that the slope that predicts Y from X1 differs across groups that are formed on the basis of scores on X2. An alternative way to state this assumption is that the regressions to predict Y from X1 should be homogeneous across levels of X2. This can be qualitatively assessed by grouping subjects on the basis of scores on the X2 variable and running a separate X1, Y scatterplot or bivariate regression for each group; the slopes should be similar across groups. If this assumption is violated and if the slope relating Y to X1 differs across levels of X2, then it would not be possible to use a flat plane to represent the relation among the variables as in Figure 4.2. Instead, you would need a more complex surface that has different slopes to show how Y is related to X1 for different values of X2. (Chapter 7, on moderation, demonstrates how to include interaction terms in regression models and how to test for the statistical significance of interactions between predictors.)

    4. Variance in Y scores should be homogeneous across levels of X1 (and levels of X2); this assumption of homogeneous variance can be assessed in a qualitative way by examining bivariate scatterplots to see whether the range or variance of Y scores varies across levels of X. Formal tests of homogeneity of variance are possible, but they are rarely used in regression analysis. In many real-life research situations, researchers do not have a sufficiently large number of scores for each specific value of X to set up a test to verify whether the variance of Y is homogeneous across values of X.

    As in earlier analyses, possible violations of these assumptions can generally be assessed reasonably well by examining the univariate frequency distribution for each variable and the bivariate scatterplots for all pairs of variables. Many of these problems can also be identified by graphing the standardized residuals from regression, that is, the zY – z′Y prediction errors. Some problems with assumptions can be detected by examining plots of residuals in bivariate regression; the same issues should be considered when examining plots of residuals for regres-sion analyses that include multiple predictors. That is, the mean and variance of these residuals should be fairly uniform across levels of z′Y, and there should be no pattern in the residuals (there should not be a linear or curvilinear trend). Also, there should not be extreme outliers in the plot of standardized residuals. Some of the problems that are detectable through visual examination of residuals can also be noted in univariate and bivariate data screening; however, examination of residuals may be uniquely valuable as a tool for the discovery of multivariate outliers. A multivari-ate outlier is a case that has an unusual combination of values of scores for variables such as X1, X2, and Y (even though the scores on the individual variables may not, by themselves, be outliers). A more extensive discussion of the use of residuals for the assessment of violations of assump-tions and the detection and possible removal of multivariate outliers is provided in Chapter 4 of Tabachnick and Fidell (2018). Multivariate or bivariate outliers can have a disproportionate impact on estimates of b or β slope coefficients (just as they can have a disproportionate impact on estimates of Pearson’s r). That is, sometimes omitting a few extreme outliers results in drastic changes in the size of b or β coefficients. It is undesirable to have the results of a regression analy-sis depend to a great extent on the values of a few extreme or unusual data points.

    If extreme bivariate or multivariate outliers are identified in preliminary data screening, it is necessary to decide whether the analysis is more believable with these outliers included, with the outliers excluded, or using a data transformation (such as log of X) to reduce the

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 109

    impact of outliers on slope estimates. If outliers are identified and modified or removed, the rationale and decision rules for the handling of these cases should be clearly explained in the write-up of results.

    The hypothetical data for this example consist of data for 30 cases on three variables (in the file ageweightbp.sav): blood pressure (Y), age (X1), and weight (X2). Before running the multiple regression, scatterplots for all pairs of variables were examined, descriptive statistics were obtained for each variable, and zero-order correlations were computed for all pairs of variables using the methods described in previous chapters. It is also a good idea to examine histograms of the distribution of scores on each variable to assess whether scores on continu-ous predictor variables are reasonably normally distributed without extreme outliers.

    A matrix of scatterplots for all possible pairs of variables was obtained through the SPSS menu sequence → → , followed by clicking on the “Matrix Scatter” icon, shown in Figure 4.4. The names of all three variables (age, weight, and blood pressure) were entered in the dialog box for matrix scatterplots, which appears in Figure 4.5. The SPSS output shown in Figure 4.6 shows the matrix scatterplots for all pairs of variables: X1 with Y, X2 with Y, and X1 with X2. Examination of these scatterplots suggested that relations between all pairs of variables were reasonably linear and there were no bivariate outliers. Vari-ance of blood pressure appeared to be reasonably homogeneous across levels of the predictor variables. The bivariate Pearson correlations for all pairs of variables appear in Figure 4.7.

    On the basis of preliminary data screening (including histograms of scores on age, weight, and blood pressure that are not shown here), it was judged that scores were reason-ably normally distributed, relations between variables were reasonably linear, and there were no outliers extreme enough to have a disproportionate impact on the results. Therefore, it seemed appropriate to perform a multiple regression analysis on these data; no cases were dropped, and no data transformations were applied.

    If there appear to be curvilinear relations between any variables, then the analysis needs to be modified to take this into account. For example, if Y shows a curvilinear pattern across levels of X1, one way to deal with this is to recode scores on X1 into group membership codes (e.g., if X1 represents income in dollars, this could be recoded as three groups: low, middle, and high income levels); then, an analysis of variance (ANOVA) can be used to see whether means on Y differ across these groups (on the basis of low, medium, or high X scores). Another possible way to incorporate nonlinearity into a regression analysis is to include X2 (and perhaps higher powers of X, such as X3) as a predictor of Y in a regression equation of the following form:

    Y′ = b0 + b1X1 + b2X

    2 + b3X3 + ···. (4.7)

    Figure 4.4 SPSS Dialog Box to Request Matrix Scatterplots

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 110 APPLIED STATISTICS II

    Figure 4.6 Matrix of Scatterplots for Age, Weight, and Blood Pressure

    Blood pressureWeightAge

    Blo

    od p

    ress

    ure

    Wei

    ght

    Age

    Figure 4.5 SPSS Scatterplot Matrix Dialog Box

    Note: This generates a matrix of all possible scatterplots between pairs of listed variables (e.g., age with weight, age with blood pressure, and weight with blood pressure).

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 111

    Figure 4.7 Bivariate Correlations Among Age, Weight, and Blood Pressure

    Correlations

    1 .563** .782 **

    .001 .000

    30 30 30

    .563 ** 1 .672 **

    .001 .000

    30 30 30

    .782** .672 ** 1

    .000 .000

    30 30 30

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Age

    Weight

    BloodPressure

    Age WeightBlood

    Pressure

    Correlation is significant at the 0.01 level (2-tailed).**

    In practice, it is rare to encounter situations where powers of X higher than X2, such as X3 or X4 terms, are needed. Curvilinear relations that correspond to a U-shaped or inverse U-shaped graph (in which Y is a function of X and X2) are more common.

    Finally, if an interaction between X1 and X2 is detected, it is possible to incorporate one or more interaction terms into the regression equation using methods that will be described in later chapters. A regression equation that does not incorporate an interac-tion term when there is in fact an interaction between predictors can produce mislead-ing results. When we do an ANOVA, most programs automatically generate interaction terms to represent interactions among all possible pairs of predictors. However, when we do regression analyses, interaction terms are not generated automatically; if we want to include interactions in our models, we must add them explicitly. The existence of pos-sible interactions among predictors is therefore easy to overlook when regression analysis is used.

    4.7 FORMULAS FOR REGRESSION WITH TWO PREDICTORS

    4.7.1 Computation of Standard-Score Beta Coefficients

    The coefficients to predict z′Y from zX1, zX2 (z′Y = β1zX1 + β2zX2) can be calculated directly from the zero-order Pearson’s r’s among the three variables Y, X1, and X2, as shown in Equa-tions 4.8 and 4.9. In a subsequent section, a simple path model is used to show how these formulas were derived:

    β =−

    r r r

    r1,Y Y

    1

    1 12 2

    12

    2 (4.8)

    and

    β =−

    r r r

    r1.Y Y

    2

    2 12 1

    12

    2 (4.9)

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 112 APPLIED STATISTICS II

    4.7.2 Formulas for Raw-Score (b) Coefficients

    Given the beta coefficients and the means (MY, MX1, and MX2) and standard deviations (SDY, SDX1, and SDX2) of Y, X1, and X2, respectively, it is possible to calculate the b coefficients for the raw-score prediction equation shown in Equation 4.1 as follows:

    bSD

    SD,Y

    X1

    11

    = ×β (4.10)

    and

    bSD

    SD.Y

    2X2

    2= ×β (4.11)

    Note that these equations are analogous to Equation 4.1 for the computation of b from r (or β) in a bivariate regression, where b = (SDY/SDX)rXY. To obtain b from β, we need to restore the information about the scales on which Y and the predictor variable are measured (information that is not contained in the unit-free beta coefficient). As in bivariate regression, a b coefficient is a rescaled version of β, that is, rescaled so that the coefficient can be used to make predictions from raw scores rather than z scores.

    Once we have estimates of the b1 and b2 coefficients, we can compute the intercept b0:

    b0 = MY – b1 MX1 – b2 MX2. (4.12)

    This is analogous to the way the intercept was computed for a bivariate regression, where b0 = MY – bMX. There are other by-hand computational formulas to compute b from the sums of squares and sums of cross products for the variables; however, the for-mulas shown in the preceding equations make it clear how the b and β coefficients are related to each other and to the correlations among variables. In a later section of this chapter, you will see how the formulas to estimate the beta coefficients can be deduced from the correlations among the variables, using a simple path model for the regression. The computational formulas for the beta coefficients, given in Equations 4.8 and 4.9, can be understood conceptually: They are not just instructions for computation. These equations tell us that the values of the beta coefficients are influenced not only by the correlation between each X predictor variable and Y but also by the correlations between the X predictor variables.

    4.7.3 Formulas for Multiple R and Multiple R 2

    The multiple R can be calculated by hand. First of all, you could generate a predicted Y′ score for each case by substituting the X1 and X2 raw scores into the equation and computing Y′ for each case. Then, you could compute Pearson’s r between Y (the actual Y score) and Y′ (the predicted score generated by applying the regression equation to X1 and X2). Squaring this Pearson correlation yields R2, the multiple R squared; this tells you what proportion of the total variance in Y is predictable from X1 and X2 combined.

    Another approach is to examine the ANOVA source table for the regression (part of the SPSS output). As in the bivariate regression, SPSS partitions SStotal for Y into SSregression + SSresidual. Multiple R2 can be computed from these sums of squares:

    RSS

    SS.regression

    total

    2 = (4.13)

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 113

    A slightly different version of this overall goodness-of-fit index is called the “adjusted” or “shrunken” R2. This is adjusted for the effects of sample size (N) and number of predictors. There are several formulas for adjusted R2; Tabachnick and Fidell (2018) provided this example:

    R R NN k

    1 1 11adj

    2 2( )= − − −− −

    , (4.14)

    where N is the number of cases, k is the number of predictor variables, and R2 is the squared multiple correlation given in Equation 4.13. R2adj tends to be smaller than R

    2; it is much smaller than R2 when N is relatively small and k is relatively large. In some research situations where the sample size N is very small relative to the number of variables k, the value reported for R2adj is actually negative; in these cases, it should be reported as 0. For computations involving the parti-tion of variance (as shown in Figure 4.14), the unadjusted R2 was used rather than the adjusted R2.

    4.7.4 Test of Significance for Overall Regression: F Test for H0: R = 0

    As in bivariate regression, an ANOVA can be performed to obtain sums of squares that represent the proportion of variance in Y that is and is not predictable from the regression, the sums of squares can be used to calculate mean squares (MS), and the ratio MSregression/MSresidual provides the significance test for R. N stands for the number of cases, and k is the number of predictor variables. For the regression examples in this chapter, the number of predictor variables, k, equals 2.

    FSS k

    SS N k

    /

    / 1,regression

    residual( )

    =− −

    (4.15)

    with (k, N – k – 1) degrees of freedom (df).If the obtained F ratio exceeds the tabled critical value of F for the predetermined alpha

    level (usually α = .05), then the overall multiple R is judged statistically significant.

    4.7.5 Test of Significance for Each Individual Predictor: t Test for H0: b

    i = 0

    Recall that many sample statistics can be tested for significance by examining a t ratio of the following form; this kind of t ratio can also be used to assess the statistical significance of a b slope coefficient.

    tSample statistic Hypothesized population parameter

    SE     

    .sample statistic 

    =−

    The output from SPSS includes an estimated standard error (SEb) associated with each raw-score slope coefficient (b). This standard error term can be calculated by hand in the fol-lowing way. First, you need to know SEest, the standard error of the estimate, which can be computed as

    SE SD R NN

    12

    .est Y

    2( )= − × − (4.16)

    SEest describes the variability of the observed or actual Y values around the regression prediction at each specific value of the predictor variables. In other words, it gives us some

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 114 APPLIED STATISTICS II

    idea of the typical magnitude of a prediction error when the regression equation is used to generate a Y′ predicted value. Using SEest, it is possible to compute an SEb term for each b coefficient, to describe the theoretical sampling distribution of the slope coefficient. For pre-dictor Xi, the equation for SEbi is as follows:

    SESE

    X M.

    bi

    est

    i Xi

    2( )=

    ∑ − (4.17)

    The hypothesized value of each b slope coefficient is 0. Thus, the significance test for each raw-score bi coefficient is obtained by the calculation of a t ratio, bi divided by its cor-responding SE term:

    tb

    SEN k dfwith  1   .

    i

    i

    bi

    ( )= − − (4.18)

    If the t ratio for a particular slope coefficient, such as b1, exceeds the tabled critical value of t for N – k – 1 df, then that slope coefficient can be judged statistically significant. Gener-ally, a two-tailed or nondirectional test is used.

    Some multiple regression programs provide an F test (with 1 and N – k – 1 df) rather than a t test as the significance test for each b coefficient. Recall that when the numerator has only 1 df, F is equivalent to t2.

    4.7.6 Confidence Interval for Each b Slope Coefficient

    A confidence interval (CI) can be set up around each sample bi coefficient, using SEbi.To set up a 95% CI, for example, use the t distribution table to look up the critical value

    of t for N – k – 1 df that cuts off the top 2.5% of the area, tcrit:

    Upper bound of 95% CI = bi + tcrit × SEbi. (4.19)

    Lower bound of 95% CI = bi – tcrit × SEbi. (4.20)

    4.8 SPSS REGRESSION

    To run the SPSS linear regression procedure and to save the predicted Y′ scores and the unstan-dardized residuals from the regression analysis, the following menu selections were made: → → .

    In the SPSS Linear Regression dialog box (which appears in Figure 4.8), the name of the dependent variable (blood pressure) was entered in the box labeled “Dependent”; the names of both predictor variables were entered in the box labeled “Independent(s).” CIs for the b slope coefficients and values of the part and partial correlations were requested in addition to the default output by clicking the Statistics button and checking the boxes for CIs and for part and partial correlations. Note that the value that SPSS calls a “part” correlation is called the “semipartial” correlation by most textbook authors. The part correlations are needed to calculate the squared part or semipartial correlation for each predictor variable and to work out the partition of variance for blood pressure. Finally the Plots button was clicked, and a graph of standardized residuals against standardized predicted scores was requested to evalu-ate whether assumptions for regression were violated. The resulting SPSS syntax was copied into the Syntax Editor by clicking the Paste button; this syntax appears in Figure 4.9.

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 115

    The resulting output for the regression to predict blood pressure from both age and weight appears in Figure 4.10, and the plot of the standardized residuals for this regression appears in Figure 4.11. The overall regression was statistically significant: R = .83, F(2, 27) = 30.04, p < .001. Thus, blood pressure could be predicted at levels significantly above chance from scores on age and weight combined. In addition, each of the individual predictor vari-ables made a statistically significant contribution. For the predictor variable age, the raw-score regression coefficient b was 2.16, and this b slope coefficient differed significantly from 0, on

    Figure 4.8 SPSS Linear Regression Dialog Box for a Regression to Predict Blood Pressure From Age and Weight

    Figure 4.9 Syntax for the Regression to Predict Blood Pressure From Age and Weight (Including Part and Partial Correlations and a Plot of Standardized Residuals)

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 116 APPLIED STATISTICS II

    Figure 4.11 Plot of Standardized Residuals From Linear Regression to Predict Blood Pressure From Age and Weight

    10-1-2

    Regression standardized predicted value

    2

    1

    0

    -1

    -2Reg

    ress

    ion

    sta

    nd

    ard

    ized

    res

    idu

    al

    ScatterplotDependent variable: blood pressure

    Figure 4.10 Output From SPSS Linear Regression to Predict Blood Pressure From Age and Weight

    Variables Ente red/Removedb

    Weight,Agea . Enter

    Model1

    VariablesEntered

    VariablesRemoved Method

    M ode l Summaryb

    .831a .690 .667 36.692Model

    1R R Square

    AdjustedR Square

    Std. Error ofthe Estimate

    ANOVAb

    80882.13 2 40441.066 30.039 .000a

    36349.73 27 1346.286117231.9 29

    RegressionResidualTotal

    Model1

    Sum ofSquares df Mean Square F Sig.

    Coefficientsa

    -28.046 27.985 -1.002 .325 -85.466 29.3732.161 .475 .590 4.551 .000 1.187 3.135 .782 .659 .488

    .490 .187 .340 2.623 .014 .107 .873 .672 .451 .281

    (Constant)AgeWeight

    Model

    1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    StandardizedCoefficients

    t Sig. Lower Bound Upper Bound95% Confidence Interval for B

    Zero-order Partial PartCorrelations

    Residuals Statisticsa

    66.13 249.62 177.27 52.811 30-74.752 63.436 .000 35.404 30

    -2.104 1.370 .000 1.000 30-2.037 1.729 .000 .965 30

    Predicted ValueResidualStd. Predicted ValueStd. Residual

    Minimum Maximum MeanStd.

    Deviation N

    All requested variables entered.a.

    Dependent Variable: BloodPressureb.

    Predictors: (Constant), Weight, Agea.

    Dependent Variable: BloodPressureb.

    Predictors: (Constant), Weight, Agea.

    Dependent Variable: BloodPressureb.

    Dependent Variable: BloodPressurea.

    a. Dependent Variable: BloodPressure

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 117

    the basis of a t value of 4.55 with p < .001. The corresponding effect size for the proportion of variance in blood pressure uniquely predictable from age was obtained by squaring the value of the part correlation of age with blood pressure to yield sr2age = .24. For the predictor vari-able weight, the raw-score slope b = .50 was statistically significant: t = 2.62, p = .014; the cor-responding effect size was obtained by squaring the part correlation for weight, sr2weight = .08. The pattern of residuals that is shown in Figure 4.11 does not indicate any problems with the assumptions. These regression results are discussed and interpreted more extensively in the model “Results” section that appears near the end of this chapter.

    4.9 CONCEPTUAL BASIS: FACTORS THAT AFFECT THE MAGNITUDE AND SIGN OF β AND b COEFFICIENTS IN MULTIPLE REGRESSION WITH TWO PREDICTORS

    It may be intuitively obvious that the predictive slope of X1 depends, in part, on the value of the zero-order Pearson correlation of X1 with Y. It may be less obvious, but the value of the slope coef-ficient for each predictor is also influenced by the correlation of X1 with other predictors, as you can see in Equations 4.8 and 4.9. Often, but not always, we will find that an X1 variable that has a large correlation with Y also tends to have a large beta coefficient; the sign of beta is often, but not always, the same as the sign of the zero-order Pearson’s r. However, depending on the magnitudes and signs of the r12 and r2Y correlations, a beta coefficient (like a partial correlation) can be larger, smaller, or even opposite in sign compared with the zero-order Pearson’s r1Y. The magnitude of a β1 coefficient, like the magnitude of a partial correlation pr1, is influenced by the size and sign of the correlation between X1 and Y; it is also affected by the size and sign of the correlation(s) of the X1 variable with other variables that are statistically controlled in the analysis.

    In this section, we will examine a path diagram model of a two-predictor multiple regres-sion to see how estimates of the beta coefficients are found from the correlations among all three pairs of variables involved in the model: r12, rY1, and rY2. This analysis will make several things clear. First, it will show how the sign and magnitude of the standard-score coefficient βi for each Xi variable are related to the size of rYi, the correlation of that particular predictor with Y, and also the size of the correlation of Xi and all other predictor variables included in the regression (at this point, this is the single correlation r12).

    Second, it will explain why the numerator for the formula to calculate β1 in Equation 4.8 has the form rY1 – r12rY2. In effect, we begin with the “overall” relationship between X1 and Y, represented by rY1; we subtract from this the product r12 × rY2, which represents an indirect path from X1 to Y via X2. Thus, the estimate of the β1 coefficient is adjusted so that it only gives the X1 variable “credit” for any relationship to Y that exists over and above the indirect path that involves the association of both X1 and Y with the other predictor variable X2.

    Finally, we will see that the formulas for β1, pr1, and sr1 all have the same numerator: rY1 – r12rY2. All three of these statistics (β1, pr1, and sr1) provide somewhat similar information about the nature and strength of the relation between X1 and Y, controlling for X2, but they are scaled slightly differently (by using different divisors) so that they can be interpreted and used in different ways.

    Consider the regression problem in which you are predicting z scores on y from z scores on two independent variables X1 and X2. We can set up a path diagram to represent how two predictor variables are related to one outcome variable (Figure 4.12).

    The path diagram in Figure 4.12 corresponds to this regression equation:

    z′Y = β1 zX1 + β2 zX2. (4.21)

    Path diagrams represent hypothetical models (often called “causal” models, although we cannot prove causality from correlational analyses) that represent our hypotheses about the

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 118 APPLIED STATISTICS II

    nature of the relations between variables. In this example, the path model is given in terms of z scores (rather than raw X scores) because this makes it easier to see how we arrive at estimates of the beta coefficients. When two variables in a path model diagram are connected by a double-headed arrow, it represents a hypothesis that the two variables are correlated or confounded (but there is no hypothesized causal connection between the variables). Pear-son’s r between these predictors indexes the strength of this confounding or correlation. A single-headed arrow (X → Y) indicates a theorized causal relationship (such that X causes Y), or at least a directional predictive association between the variables. The “path coefficient” or regression coefficient (i.e., a beta coefficient) associated with it indicates the estimated strength of the predictive relationship through this direct path. If there is no arrow connect-ing a pair of variables, it indicates a lack of any direct association between the pair of variables, although the variables may be connected through indirect paths.

    The path diagram that is usually implicit in a multiple regression analysis has the follow-ing general form: Each of the predictor (X) variables has a unidirectional arrow pointing from X to Y, the outcome variable. All pairs of X predictor variables are connected to each other by double-headed arrows that indicate correlation or confounding, but no presumed causal linkage, among the predictors. Figure 4.12 shows the path diagram for the standardized (z score) variables in a regression with two correlated predictor variables zX1 and zX2. This model corresponds to a causal model in which zX1 and zX2 are represented as “partially redundant” or correlated causes or predictors of zY.

    Our problem is to deduce the unknown path coefficients or standardized regression coefficients associated with the direct (or causal) path from each of the zX predictors, β1 and β2, in terms of the known correlations r12, rY1, and rY2. This is done by applying the tracing rule, as described in the following section.

    4.10 TRACING RULES FOR PATH MODELS

    The idea behind path models is that an adequate model should allow us to reconstruct the observed correlation between any pair of variables (e.g., rY1), by tracing the paths that lead from X1 to Y through the path system, calculating the strength of the relationship for each path, and then summing the contributions of all possible paths from X1 to Y.

    Kenny (1979) provided a clear and relatively simple statement about the way in which the paths in this causal model can be used to reproduce the overall correlation between each pair of variables:

    The correlation between Xi and Xj equals the sum of the product of all the path coefficients [these are the beta weights from a multiple regression] obtained from each

    Figure 4.12 Path Diagram for Standardized Multiple Regression to Predict z′Y From zX1 and zX2

    zX1

    r12

    zX2

    β1 β2

    z Y

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 119

    of the possible tracings between Xi and Xj. The set of tracings includes all possible routes from Xi to Xj given that (a) the same variable is not entered twice and (b) a variable is not entered through an arrowhead and left through an arrowhead. (p. 30)

    In general, the traced paths that lead from one variable, such as zX1, to another variable, such as z′Y, may include one direct path and also one or more indirect paths.

    We can use the tracing rule to reconstruct exactly the observed correlation between any two variables from a path model from correlations and the beta coefficients for each path. Initially, we will treat β1 and β2 as unknowns; later, we will be able to solve for the betas in terms of the correlations.

    Now, let’s look in more detail at the multiple regression model with two independent variables (represented by the diagram in Figure 4.12). The path from zX1 to zX2 is simply r12, the observed correlation between these variables. We will use the labels β1 and β2 for the coefficients that describe the strength of the direct, or unique, relationship of X1 and X2, respectively, to Y. β1 indicates how strongly X1 is related to Y after we have taken into account, or partialled out, the indirect relationship of X1 to Y involving the path via X2. β1 is a partial slope: the number of standard deviation units of change in zY we predict for a 1-SD change in zX1 when we have taken into account, or partialled out, the influence of zX2. If zX1 and zX2 are correlated, we must somehow correct for the redundancy of informa-tion they provide when we construct our prediction of Y; we don’t want to double-count information that is included in both zX1 and zX2. That is why we need to correct for the correlation of zX1 with zX2 (i.e., take into account the indirect path from zX1 to zY via zX2) to get a clear picture of how much predictive value zX1 has that is unique to zX1 and not somehow related to zX2.

    For each pair of variables (zX1 and zY, zX2 and zY), we need to work out all possible paths from zXi to zY; if the path has multiple steps, the coefficients along that path are multiplied with each other. After we have calculated the strength of association for each path, we sum the contributions across paths. For the path from zX1 to z′Y, in the diagram above, there is one direct path from zX1 to z′Y, with a coefficient of β1. There is also one indirect path from zX1 to z′Y via zX2, with two coefficients en route (r12 and β2); these are multiplied to give the strength of association represented by the indirect path, r12 × β2. Finally, we should be able to recon-struct the entire observed correlation between zX1 and zY (rY1) by summing the contributions of all possible paths from zX1 to z′Y in this path model. This reasoning based on the tracing rule yields the equation below:

    Total correlation = Direct path + Indirect path.

    rY1 = β1 + r12 × β2. (4.22)

    Applying the same reasoning to the paths that lead from zX2 to z′Y, we arrive at a second equation of this form:

    rY2 = β2 + r12 × β1. (4.23)

    Equations 4.22 and 4.23 are called the normal equations for multiple regression; they show how the observed correlations (rY1 and rY2) can be perfectly reconstructed from the regression model and its parameter estimates β1 and β2. We can solve these equations for val-ues of β1 and β2 in terms of the known correlations r12, rY1, and rY2 (these equations appeared earlier as Equations 4.8 and 4.9):

    r r r

    r1,Y Y

    1

    1 12 2

    12

    2β =−

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 120 APPLIED STATISTICS II

    andr r r

    r1.Y Y

    2

    2 12 1

    12

    2β =−

    The numerator for the betas is the same as the numerator of the partial correlation. Essentially, we take the overall correlation between X1 and Y and subtract the correlation we would predict between X1 and Y due to the relationship through the indirect path via X2; whatever is left, we then attribute to the direct or unique influence of X1. In effect, we “explain” as much of the association between X1 and Y as we can by first looking at the indi-rect path via X2 and only attributing to X1 any additional relationship it has with Y that is above and beyond that indirect relationship. We then divide by a denominator that scales the result (as a partial slope or beta coefficient, in these two equations, or as a partial correlation, as in the previous chapter).

    Note that if the value of β1 is zero, we can interpret it to mean that we do not need to include a direct path from X1 to Y in our model. If β1 = 0, then any statistical relationship or correlation that exists between X1 and Y can be entirely explained by the indirect path involv-ing X2. Possible explanations for this pattern of results include the following: that X2 causes both X1 and Y and the X1, Y correlation is spurious, or that X2 is a mediating variable, and X1 influences Y only through its influence on X2. This is the basic idea that underlies path analysis or so-called causal modeling: If we find that we do not need to include a direct path between X1 and Y, then we can simplify the model by dropping a path. We will not be able to prove causality from path analysis; we can only decide whether a causal or theoretical model that has certain paths omitted is sufficient to reproduce the observed correlations and, there-fore, is “consistent” with the observed pattern of correlations.

    4.11 COMPARISON OF EQUATIONS FOR β, b, pr, AND sr

    By now, you may have recognized that β, b, pr, and sr are all slightly different indexes of how strongly X1 predicts Y when X2 is controlled. Note that the (partial) standardized slope or β coefficient, the partial r, and the semipartial r all have the same term in the numerator: They are scaled differently by dividing by different terms, to make them interpretable in slightly different ways, but generally, they are similar in magnitude. The numerators for partial r (pr), semipartial r (sr), and beta (β) are identical. The denominators differ slightly because they are scaled to be interpreted in slightly different ways (squared partial r as a proportion of variance in Y when X2 has been partialled out of Y; squared semipartial r as a proportion of the total variance of Y; and beta as a partial slope, the number of standard deviation units of change in Y for a one-unit SD change in X1). It should be obvious from looking at the formulas that sr, pr, and β tend to be similar in magnitude and must have the same sign. (These equations are all repetitions of equa-tions given earlier, and therefore, they are not given new numbers here.)

    Standard-score slope coefficient β:

    r r

    r

    r

    1.Y Y

    1

    1 12 2

    12

    2β =−

    Raw-score slope coefficient b (a rescaled version of the β coefficient):

    bSD

    SD.Y

    X1 1

    1

    = β ×

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 121

    Partial correlation to predict Y from X1, controlling for X2 (removing X2 completely from both X1 and Y):

    pr or rr r r

    r r   

    1 1.

    Y

    Y Y

    Y

    1 12

    1 2 12

    2

    2

    12

    2( ) ( )=

    − × −

    Semipartial (or part) correlation to predict Y from X1, controlling for X2 (removing X2 only from X1, as explained in this chapter):

    sr or rr r r

    r   

    1.

    Y

    Y Y

    1 1.2

    1 2 12

    12

    2( )=

    −( )

    Because these equations all have the same numerator (and they differ only in that the different divisors scale the information so that it can be interpreted and used in slightly dif-ferent ways), it follows that your conclusions about how X1 is related to Y when you control for X2 tend to be fairly similar no matter which of these four statistics (b, β, pr, or sr) you use to describe the relationship. If any one of these four statistics exactly equals 0, then the other three also equal 0, and all these statistics must have the same sign. They are scaled or sized slightly differently so that they can be used in different situations (to make predictions from raw vs. standard scores and to estimate the proportion of variance accounted for relative to the total variance in Y or only the variance in Y that isn’t related to X2).

    The difference among the four statistics above is subtle: β1 is a partial slope (how much change in zY is predicted for a 1-SD change in zX1 if zX2 is held constant). The partial r describes how X1 and Y are related if X2 is removed from both variables. The semipartial r describes how X1 and Y are related if X2 is removed only from X1. In the context of multiple regression, the squared semipartial r (sr2) provides the most convenient way to estimate effect size and variance partitioning. In some research situations, analysts prefer to report the b (raw-score slope) coefficients as indexes of the strength of the relationship among variables. In other situations, standardized or unit-free indexes of the strength of relationship (such as β, sr, or pr) are preferred.

    4.12 NATURE OF PREDICTIVE RELATIONSHIPS

    When reporting regression, it is important to note the signs of b and β coefficients, as well as their size, and to state whether these signs indicate relations that are in the predicted direction. Researchers sometimes want to know whether a pair of b or β coefficients differ significantly from each other. This can be a question about the size of b in two different groups of subjects: For instance, is the β slope coefficient to predict salary from years of job experience significantly different for male versus female subjects? Alternatively, it could be a question about the size of b or β for two different predictor variables in the same group of subjects (e.g., Which variable has a stronger predictive relation to blood pressure: age or weight?).

    It is important to understand how problematic such comparisons usually are. Our esti-mates of β and b coefficients are derived from correlations; thus, any factors that artifactually influence the sizes of correlations such that the correlations are either inflated or deflated estimates of the real strength of the association between variables can also potentially affect our estimates of β and b. Thus, if women have a restricted range in scores on drug use (rela-tive to men), a difference in Pearson’s r and the beta coefficient to predict drug use for women versus men might be artifactually due to a difference in the range of scores on the outcome variable for the two groups. Similarly, a difference in the reliability of measures for the two groups could create an artifactual difference in the size of Pearson’s r and regression coefficient

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 122 APPLIED STATISTICS II

    estimates. It is probably never possible to rule out all possible sources of artifact that might explain the different sizes of r and β coefficients (in different samples or for different predic-tors). If a researcher wants to interpret a difference between slope coefficients as evidence for a difference in the strength of the association between variables, the researcher should demonstrate that the two groups do not differ in range of scores, distribution shape of scores, reliability of measurement, existence of outliers, or other factors that may affect the size of correlations. However, no matter how many possible sources of artifact are considered, com-parison of slopes and correlations remains problematic. Later chapters describe use of dummy variables and interaction terms to test whether two groups, such as women versus men, have significantly different slopes for the prediction of Y from some Xi variable. More sophisticated methods that can be used to test equality of specific model parameters, whether they involve comparisons across groups or across different predictor variables, are available within the con-text of structural equation modeling (SEM) analysis using programs such as Amos.

    4.13 EFFECT SIZE INFORMATION IN REGRESSION WITH TWO PREDICTORS

    4.13.1 Effect Size for Overall Model

    The effect size for the overall model—that is, the proportion of variance in Y that is pre-dictable from X1 and X2 combined—is estimated by computation of an R

    2. This R2 is shown in the SPSS output; it can be obtained either by computing the correlation between observed Y and predicted Y′ scores and squaring this correlation or by taking the ratio SSregression/SStotal:

    RSS

    SS.regression

    total

    2 = (4.24)

    Note that this formula for the computation of R2 is analogous to the formulas given in earlier chapters for eta squared (η2 = SSbetween/SStotal for an ANOVA; R

    2 = SSregression/SStotal for multiple regression). R2 differs from η2 in that R2 assumes a linear relation between scores on Y and scores on the predictors. On the other hand, η2 detects differences in mean values of Y across different values of X, but these changes in the value of Y do not need to be a linear function of scores on X. Both R2 and η2 are estimates of the proportion of variance in Y scores that can be predicted from independent variables. However, R2 (as described in this chapter) is an index of the strength of linear relationship, while η2 detects patterns of association that need not be linear.

    For some statistical power computations, such as those presented by Green (1991), a dif-ferent effect size for the overall regression equation, called f2, is used:

    f2 = R2/(1 – R2). (4.25)

    4.13.2 Effect Size for Individual Predictor Variables

    The most convenient effect size to describe the proportion of variance in Y that is uniquely predictable from Xi is the squared semipartial correlation between Xi and Y, con-trolling for all other predictors. This semipartial (also called the part) correlation between each predictor and Y can be obtained from the SPSS regression procedure by checking the box for the part and partial correlations in the optional statistics dialog box. The semipartial or part correlation (sr) from the SPSS output can be squared by hand to yield an estimate of the proportion of uniquely explained variance for each predictor variable (sr2).

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 123

    If the part correlation is not requested, it can be calculated from the t statistic associated with the significance test of the b slope coefficient. It is useful to know how to calculate this by hand so that you can generate this effect size measure for published regression studies that don’t happen to include this information:

    srt

    dfR1 ,

    i

    i

    residual

    2

    2

    2( )= − (4.26)where ti is the ratio bi/SEbi for the Xi predictor variable, the df residual = N – k – 1, and R

    2 is the multiple R2 for the entire regression equation. The verbal interpretation of sr2i is the pro-portion of variance in Y that is uniquely predictable from Xi (when the variance due to other predictors is partialled out of Xi).

    Some multiple regression programs do not provide the part or semipartial correlation for each predictor, and they report an F ratio for the significance of each b coefficient; this F ratio may be used in place of t2i to calculate the effect size estimate:

    srF

    dfR1 .

    iresidual

    2 2( )= − (4.27)

    4.14 STATISTICAL POWER

    Tabachnick and Fidell (2018) discussed a number of issues that need to be considered in decisions about sample size; these include alpha level, desired statistical power, number of predictors in the regression equation, and anticipated effect sizes. They suggested the following simple guidelines. Let k be the number of predictor variables in the regression (in this chapter, k = 2). The effect size index used by Green (1991) was f2, where f2 = R2/(1 – R2); f2 = .15 is considered a medium effect size. Assuming a medium effect size and α = .05, the minimum desirable N for testing the significance of multiple R is N > 50 + 8k, and the minimum desirable N for testing the significance of individual predictors is N > 104 + k. Tabachnick and Fidell recommended that the data analyst choose the larger number of cases required by these two decision rules. Thus, for the regression analysis with two predictor variables described in this chapter, assuming the researcher wants to detect medium-size effects, a desirable minimum sample size would be N = 106. (Smaller N’s are used in many of the demonstrations and examples in this textbook, however.) If there are substantial viola-tions of assumptions (e.g., skewed rather than normal distribution shapes) or low measure-ment reliability, then the minimum N should be substantially larger; see Green for more detailed instructions. If N is extremely large (e.g., N > 5,000), researchers may find that even associations that are too weak to be of any practical or clinical importance turn out to be statistically significant.

    To summarize, then, the guidelines described above suggest that a minimum N of about 106 should be used for multiple regression with two predictor variables to have reasonable power to detect the overall model fit that corresponds to approximately medium-size R2 val-ues. If more precise estimates of required sample size are desired, the guidelines given by Green (1991) may be used. In general, it is preferable to have sample sizes that are somewhat larger than the minimum values suggested by these decision rules. In addition to having a large enough sample size to have reasonable statistical power, researchers should also have samples large enough so that the CIs around the estimates of slope coefficients are reasonably narrow. In other words, we should try to have sample sizes that are large enough to provide reasonably precise estimates of slopes and not just samples that are large enough to yield “statistically significant” results.

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • 124 APPLIED STATISTICS II

    4.15 ISSUES IN PLANNING A STUDY

    4.15.1 Sample Size

    A minimum N of at least 100 cases is desirable for a multiple regression with two predic-tor variables (the rationale for this recommended minimum sample size is given in Section 4.14 on statistical power). The examples presented in this chapter use fewer cases, so that readers who want to enter data by hand or perform computations by hand or in an Excel spreadsheet can replicate the analyses shown.

    4.15.2 Selection of Predictor and/or Control Variables

    The researcher should have some theoretical rationale for the choice of independent variables. Often, the X1, X2 predictors are chosen because one or both of them are implicitly believed to be “causes” of Y (although a significant regression does not provide evidence of causality). In some cases, the researcher may want to assess the combined predictive useful-ness of two variables or to judge the relative importance of two predictors (e.g., How well do age and weight in combination predict blood pressure? Is age a stronger predictor of blood pressure than weight?). In some research situations, one or more of the variables used as predictors in a regression analysis serve as control variables that are included to control for competing causal explanations or to control for sources of contamination in the measurement of other predictor variables.

    Several variables are often used to control for contamination in the measurement of pre-dictor variables. For example, many personality test scores are related to social desirability; if the researcher includes a good measure of social desirability response bias as a predictor in the regression model, the regression may yield a better description of the predictive usefulness of the personality measure. Alternatively, of course, controlling for social desirability could make the predictive contribution of the personality measure drop to zero. If this occurred, the researcher might conclude that any apparent predictive usefulness of that personality measure was due entirely to its social desirability component.

    After making a thoughtful choice of predictors, the researcher should try to anticipate the possible different outcomes and the various possible interpretations to which these would lead. Selection of predictor variables on the basis of “data fishing”—that is, choosing predic-tors because they happen to have high correlations with the Y outcome variable in the sample of data in hand—is not recommended. Regression analyses that are set up in this way are likely to report “significant” predictive relationships that are instances of Type I error. It is preferable to base the choice of predictor variables on past research and theory rather than on sizes of correlations. (Of course, it is possible that a large correlation that turns up unexpect-edly may represent a serendipitous finding; however, replication of the correlation with new samples should be obtained.)

    4.15.3 Collinearity (Correlation) Between Predictors

    Although multiple regression can be a useful tool for separating the unique predictive contributions of correlated predictor variables, it does not work well when predictor variables are extremely highly correlated (in the case of multiple predictors, high correlations among many predictors are referred to as multicollinearity). In the extreme case, if two predictors are perfectly correlated, it is impossible to distinguish their predictive contributions; in fact, regression coefficients cannot be calculated in this situation.

    To understand the nature of this problem, consider the partition of variance illustrated in Figure 4.13 for two predictors, X1 and X2, that are highly correlated with each other. When there is a strong correlation between X1 and X2, most of the explained variance cannot be

    Copyright ©2021 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

    Do no

    t cop

    y, po

    st, or

    distr

    ibute

  • ChAPTEr 4 • rEgrESSIoN ANALySIS AND STATISTICAL CoNTroL 125

    attributed uniquely to either predictor variable; in this situation, even if the overall multiple R is statistically significant, neither predictor may be judged statistically significant. The area (denoted as Area c in Figure 4.13) that corresponds to the variance in Y that could be pre-dicted from either X1 or X2 tends to be quite large when the predictors are highly intercor-related, whereas Areas a and b, which represent the pro