relationship between education level, income, and length of time out of school our new regression...

Relationship between education level, income, and length of time out of school

Our new regression equation:

is the predicted value of the dependent variableis the predicted value of the dependent variable (income) (income)Yis the value of the first predictor variable (education level)is the value of the first predictor variable (education level)

is the value of the second predictor variable (time out of school)is the value of the second predictor variable (time out of school)

X1

X2

= a + bX1 + bX2 Y

The new regression equation allows us to:• See whether my two predictor

variables, combined, are significantly related to, or predictive of, my dependent variable, and how much of the variance my predictor variables explain in my dependent variable

• Test whether each of my predictor variables is significantly related to my dependent variable when controlling for the other predictor variable.

• See which of my two predictor variables is the stronger predictor of my dependent variable

• Test whether one predictor variable is related to my dependent variable after controlling for the other predictor variable, thus conducting a sort of ANCOVA

Education Level Education Level (X1) in years(X1) in years

Years Working Years Working (X(X2)2)

Monthly Income Monthly Income (Y) in thousands(Y) in thousands

Case 1Case 1 66 1010 11

Case 2Case 2 88 1414 1.51.5

Case 3Case 3 1111 88 11

Case 4Case 4 1212 77 22

Case 5Case 5 1212 2020 44

Case 6Case 6 1313 1515 2.52.5

Case 7Case 7 1414 1717 55

Case 8Case 8 1616 2222 66

Case 9Case 9 1616 3030 1010

Case 10Case 10 2121 1010 88

MeanMean 12.912.9 1515 4.14.1

Standard DeviationStandard Deviation 4.254.25 7.27.2 3.123.12

Correlation Correlation CoefficientCoefficient

rr = 0.83 = 0.83 rr = 0.70 = 0.70

The data presented in the table on the previous slide reveal that both years of education and years in the workforce are positively correlated with monthly income

Need to answer the following questions:• How much of the variance in

income can these two predictor variables explain together?

• Will years of education still predict income when we control for the effects of years in the workforce?

• Which of these two independent variables will be the stronger predictor of income?

• And will each make a unique contribution in explaining variance in income?

To answer these questions, we need to run multiple regression analyses

We begin by computing Pearson correlation coefficients for all three of the variables in the model:

These data reveal that level of education and years in the workforce are both correlated with monthly income, and with each other

Years of Years of EducationEducation

Years in Years in WorkforceWorkforce

Monthly Monthly IncomeIncome

Years of Years of EducationEducation

1.001.00

Years in Years in WorkforceWorkforce

0.3100.310 1.001.00

Monthly IncomeMonthly Income 0.8260.826 0.6950.695 1.001.00

In a multiple regression, we’ve got multiple predictor variables trying to explain variance in the dependent variable

In our current example, both of our predictor variables are strongly correlated with our dependent variable, so this condition is met

In addition, for each of our predictor variables to explain a unique, or independent portion of the variance in the dependent variable, our two predictor variables cannot be too strongly related to each other

Strong correlations among predictor variables is called Strong correlations among predictor variables is called multicollinearitymulticollinearity and can cause problems in multiple and can cause problems in multiple regression analysis because it can make it difficult to regression analysis because it can make it difficult to identify the unique relation between each predictor identify the unique relation between each predictor variable and the dependent variablevariable and the dependent variable

Use Tolerance, VIF, and Collinearity diagnostics to detect Use Tolerance, VIF, and Collinearity diagnostics to detect collinearity issuescollinearity issues

First independent First independent variable (variable (XX11))

Unique varianceUnique variance

Dependent variable (Dependent variable (YY))

Second independent Second independent variable (variable (XX22))

Shared varianceShared variance

Variance ExplainedVariance Explained

RR R SquareR Square Adjusted R Adjusted R SquareSquare

Std. Error Std. Error of the of the

EstimateEstimate

0.9460.946 0.8960.896 0.8660.866 1.14051.1405

ANOVA ResultsANOVA Results

Sum of Sum of SquaresSquares

dfdf Mean SquareMean Square F ValueF Value P ValueP Value

RegressionRegression 78.29578.295 22 39.14739.147 30.09530.095 0.0000.000

ResidualResidual 9.1059.105 77 1.3011.301

TotalTotal 87.40087.400 99

Regression CoefficientsRegression Coefficients

Un-Un-standardized standardized CoefficientsCoefficients

Standardized Standardized CoefficientsCoefficients

BB Std. ErrorStd. Error BetaBeta t Valuet Value p Valuep Value

InterceptIntercept -5.504-5.504 1.2981.298 -4.421-4.421 0.0040.004

Years Edu.Years Edu. 0.4950.495 0.0940.094 0.6760.676 5.2705.270 0.0010.001

Years WorkYears Work 0.2100.210 0.0560.056 0.4850.485 3.7833.783 00070007

Model Summary

ANOVA

Coefficients

We get an “R” value of 0.946. This is the multiple correlation coefficient (R).• It provides a measure of the correlation between the two predictors

combined and the dependent variable. It is also the correlation between the observed value of Y and the predicted value of Y.

We get an “R Square” value (symbolized R2) of 0.896. • This is the coefficient of determination for my combined predictor

variables and the dependent variables, and it provides us with a percentage of variance explained. The R2 statistic is the measure of effect size used in multiple regression. Combined, these two predictor variables explain about 90% of the variance in the income variable

The “Adjusted R Square” is 0.866.• Accounts for some of the error associated with multiple predictor

variables by taking the number of predictor variables and the sample size into account, and thereby adjusts the R2 value down a little bit

Standard error for the R and R2 value

F value of 30.095, with a corresponding p value of .000• Reveals that our regression model overall is statistically significant

• Total variance of dependent variable is divided into 2 components• Explained variance “regression” compared to unexplained variance “residual”• Mean square = sum squares/df

The unstandardized regression coefficients can be found in the column labeled “B” • It is difficult to compare the size of the unstandardized regression

coefficients In the column labeled “Beta”, the standardized regression coefficients

are presented• These regression coefficients have been standardized, thereby

converting the unstandardized coefficients into coefficients with the same scale of measurement. In this example, the beta for years of education is a bit larger than the beta for years in the workforce.

In the columns labeled “t value” and “p value” we get measures that allow us to determine whether each predictor variable is statistically significantly related to the dependent variable. In this example, both predictor variables are significant predictors of income.

How do you decide which variables to include in the final equation?• Include only significant correlation coefficients

• Use a limited number of variables

• 10:1 rule

• Enter- usually the default method

• Forward- variables added one by one if they meet a certain significance value- starts with most significant correlation

• Backward- variables are removed one by one if they meet a certain significance value- starts with variable that has the smallest partial correlation with the dependent

• Stepwise- a hybrid of the previous 2- variables are added one at a time however at each step a backward procedure is used to see if any variables should be removed.

Include variables that make sense predictor variables chosen by selection criteria, need

to have logical relationships with the dependent May force important or logical predictors

into the model even if they are not chosen by selection criteria.

Be wary of variable selection results

Issues to Consider

relationship between education level, income, and length of time out of school our new regression...

Documents

multiple predictor variables

predictor variable time

stronger predictor of

dependent variable incomeis

independent variables

dependent variabletest

dependent variablein

case 1021108mean12