s doc1

7
MULTIPLE LINEAR REGRESSION Q1. To decide whether a company is discriminating against women, the following data were collected from the company’s records: Salary is the annual salary in thousands of dollars, Qualification is an index of employee qualification, and Gender (1, if the employee is a man, and 0, if the employee is a woman). Two linear models were fit into the data and the regression outputs are shown in the table below. Suppose that the usual regression assumptions hold. (a) Are men paid more than equally qualified women? (b) Are men less qualified than equally paid women? (c) Do you detect any inconsistency in the above results? Explain. (d) Which model would you advocate if you were the defense lawyer? Explain. Model 1: Dependent Variable is Salary Variable Coefficient s.e. t-Test p-Value Constant 20009.5 0.8244 24271 <0.0001 Qualification 0.935253 0.0500 18.7 <0.0001 Gender 0.224337 0.4681 0.479 0.6329 Model 2: Dependent Variable is Qualifiction Variable Coefficient s.e. t-Test p-Value Constant -16744.4 896.4 -18.7 <0.0001 Qualification 0.850979 0.4349 1.96 0.0532 Salary 0.836991 0.0448 18.7 <0.0001 Ans 1. (a) Using Model 1, the regression equation for Salary (response variable) and Qualification and Gender (predictor variables) is found to be: Salary = 20009.5 + 0.935253 Q + 0.224337 G + ……………………… (1) ( ) ( ) ( ) ( ) Gender is categorical: 0 ----------- Women 1 ----------- Men

Upload: ejoneel

Post on 26-Dec-2015

97 views

Category:

Documents


0 download

DESCRIPTION

To decide whether a company is discriminating against women, the following data were collected from the company’s records: Salary is the annual salary in thousands of dollars, Qualification is an index of employee qualification, and Gender (1, if the employee is a man, and 0, if the employee is a woman). Two linear models were fit into the data and the regression outputs are shown in the table below. Suppose that the usual regression assumptions hold.

TRANSCRIPT

Page 1: S DOC1

MULTIPLE LINEAR REGRESSION

Q1. To decide whether a company is discriminating against women, the following data were

collected from the company’s records: Salary is the annual salary in thousands of dollars,

Qualification is an index of employee qualification, and Gender (1, if the employee is a man,

and 0, if the employee is a woman). Two linear models were fit into the data and the

regression outputs are shown in the table below. Suppose that the usual regression

assumptions hold.

(a) Are men paid more than equally qualified women?

(b) Are men less qualified than equally paid women?

(c) Do you detect any inconsistency in the above results? Explain.

(d) Which model would you advocate if you were the defense lawyer? Explain.

Model 1: Dependent Variable is Salary

Variable Coefficient s.e. t-Test p-Value

Constant 20009.5 0.8244 24271 <0.0001

Qualification 0.935253 0.0500 18.7 <0.0001

Gender 0.224337 0.4681 0.479 0.6329

Model 2: Dependent Variable is Qualifiction

Variable Coefficient s.e. t-Test p-Value

Constant -16744.4 896.4 -18.7 <0.0001

Qualification 0.850979 0.4349 1.96 0.0532

Salary 0.836991 0.0448 18.7 <0.0001

Ans 1.

(a) Using Model 1, the regression equation for Salary (response variable) and Qualification and

Gender (predictor variables) is found to be:

Salary = 20009.5 + 0.935253 Q + 0.224337 G + ∈ ……………………… (1)

(��) (𝛃𝟎) (𝛃𝟏) (𝛃𝟐)

Gender is categorical:

0 ----------- Women

1 ----------- Men

Page 2: S DOC1

Keeping the qualification constant, the salary for women and men can be determined as:

Salary for women: E[Salary| Qualification, Gender = 0]

= (𝛃𝟎) + (𝛃𝟏) Q + (𝛃𝟐) G ……………………..(From (1))

= 20009.5 + 0.935253 Q ……………………….. (Substituting G=0 in (1))

Salary for men: E[Salary| Qualification, Gender = 1]

= (𝛃𝟎) + (𝛃𝟏) Q + (𝛃𝟐) G ……………………..(From (1))

= ((𝛃𝟎) + (𝛃𝟐) ) +(𝛃𝟏) Q ……………………...(substituting G=1 in (1))

= (20009.5+0.224337) + 0.935253 Q

= 20009.724+ 0.935253 Q

When qualifications are the same, difference in salary for men and women = 20009.724-20009.5

= 0.224

Thus, on an average, men are paid $224 more than equally qualified women.

(b) Using Model 2, the regression equation for Qualification (response variable) and Salary and

Gender (predictor variables) is found to be:

Qualification = -16744.4 + 0.850979 G + 0.836991 S + ∈ ……………………… (2)

(��) (𝛃𝟎) (𝛃𝟏) (𝛃𝟐)

Gender is categorical:

0 ----------- Women

1 ----------- Men

Keeping the salary constant, the qualification for women and men can be determined as:

Qualification for women: E[Q| Salary, Gender = 0]

= (𝛃𝟎) + (𝛃𝟏) G + (𝛃𝟐) S ……………………..(From (2))

Page 3: S DOC1

= - 16744.4 + 0.836991 S ……………………….. (Substituting G=0 in (2))

Qualification for men: E[Q| Salary, Gender = 0]

= (𝛃𝟎) + (𝛃𝟏) G + (𝛃𝟐) S ……………………..(From (2))

= - 16744.4 + 0.850979 + 0.836991 S…………….. (Substituting G=1 in (2))

= -16743.55 + 0.836991 S

When salaries are the same, difference in qualification between women and men

= -16744.4- (-16743.55)

= - 0.85

Thus, on an average, men are 0.85 times less qualified than equally paid women.

Q2. The table below shows the regression output of a multiple regression model relating the

beginning salaries in dollars of employees in a given company to the following predictor

variables:

Gender An indicator variable ( 1= Man and 0 = Woman)

Education Years of schooling at the time of hire

Experience Number of months of previous work experience

Months Number of months with the company

In (a) to (b) below, specify the null and the alternative hypotheses, the test used and your

conclusion using a 5% level of significance

(a) Conduct the F-test for the overall fit of the regression.

(b) Is there a positive linear relationship between Salary and Experience, after accounting

for the effect of the variables Gender, Education and Months?

(c) What salary would you forecast for a man with 12 years of education, 10 months of

experience, and 15 months with the company?

(d) What salary would you forecast, on an average, for men with 12 years of education, 10

months of experience and 15 months with the company?

(e) What salary would you forecast, on an average, for women with 12 years of education,

10 months of experience and 15 months with the company?

Page 4: S DOC1

Regression output when salary is related to four predictor variables

ANOVA Table

Source Sum of Squares df Mean Square F-Test

Regression 23665352 4 5916338 22.98

Residuals 22657938 88 257477

Ans 2.

(a) To test the overall fit of the model:

RM: 𝐇𝟎: Y= 𝜷𝟎 + ∈

FM: 𝐇𝟏: Y= 𝜷𝟎 + 𝜷𝟏 𝑿𝟏+ 𝜷𝟐𝑿𝟐+ + 𝜷𝟑 𝑿𝟑+ 𝜷𝟒 𝑿𝟒 + ∈

(Alternatively, we can have the following hypotheses:

H0: 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4

H1: not H0)

The table provides the value of F-test = 22.98

Critical F-value with 4 and 88 degrees of freedom at 0.05% level of significance is 2.475.

Since the observed F-value is greater than the critical value, null hypothesis is rejected. Thus, at

least one of the 𝛽′𝑠 is not zero.

The goodness of fit can also be seen by the value of R2 = 0.515, which implies that the predictor

variables considered in our analysis can predict around 51% of the response variable.

(b) To test a positive linear relationship between Salary and Experience, we can have the

following null and alternate hypotheses:

Coefficients Table

Variable Coefficient s.e. t-Test p-value

Constant 3526.4 327.7 10.76 0.000

Gender 722.5 117.8 6.13 0.000

Education 90.02 24.69 3.65 0.000

Experience 1.2690 0.5877 2.16 0.034

Months 23.406 5.201 4.50 0.000

N=43 R2 = 0.515 𝑅𝑎2 = 0.489 �� = 507.4 df= 88

Page 5: S DOC1

RM: 𝐇𝟎: Y= 𝜷𝟎+ 𝜷𝟏 𝑿𝟏+ 𝜷𝟐𝑿𝟐+ 𝜷𝟒 𝑿𝟒+ ∈

FM: 𝐇𝟏: Y= 𝜷𝟎 + 𝜷𝟏 𝑿𝟏+ 𝜷𝟐𝑿𝟐+ + 𝜷𝟑 𝑿𝟑+ 𝜷𝟒 𝑿𝟒 + ∈

(Alternatively, we can have the following hypotheses:

H0: 𝛽3 = 0

H1: not H0)

The t-value for Experience = 2.16 as well as the p-value < 0.05. Moreover, the coefficient of

experience is +1.2690.

Hence, we can say that there is a positive linear relationship between Salary and Experience.

(c) ��= 𝜷𝟎 + 𝜷𝟏 𝑿𝟏+ 𝜷𝟐𝑿𝟐+ + 𝜷𝟑 𝑿𝟑+ 𝜷𝟒 𝑿𝟒 + ∈

Y= 3526.4+722.5 G + 90.02 Ed + 1.2690 Exp + 23.406 M ………………… (1)

Substituting the values in the above equation:

��= 3526.4+722.5 (1) + 90.02 (12) + 1.2690 (10) + 23.406 (15)

�� = 5692.92

(d) We need to find the confidence interval for μ0 , which is given by:

[μ0 - tn−p−1,α/2 * s.e (μ0 ) , μ0 + tn−p−1,α/2 * s.e (μ0 )]

μ0 is found by substituting the given values in equation (1)

μ0 = 3526.4+722.5 (1) + 90.02 (12) + 1.2690 (10) + 23.406 (15)

μ0 = 5692.92

t- statistic (from the table) with n-p-1 = 93-4-1= 88 degrees of freedom at 𝛼 = 0.05/2 = 0.025 is:

t-value = 1.987

s.e. μ0 = √∑(y−y)2

n−p−1

Page 6: S DOC1

= √MSE

= 507.42

Confidence interval: [5692.92 – (1.987 * 507.42), 5692.92 + (1.987 * 507.42)]

[4684.68, 6701.16]

Thus, the average salary for men, for the given data, will be between $4684.68 and

$6701.16.

(e) We need to find the confidence interval for μ0 , which is given by:

[μ0 - tn−p−1,α/2 * s.e (μ0 ) , μ0 + tn−p−1,α/2 * s.e (μ0 )]

μ0 is found by substituting the given values in equation (1)

μ0 = 3526.4+722.5 (0) + 90.02 (12) + 1.2690 (10) + 23.406 (15)

μ0 = 4970.42

t- statistic (from the table) with n-p-1 = 93-4-1= 88 degrees of freedom at 𝛼 = 0.05/2 = 0.025 is:

t-value = 1.987

s.e. μ0 = √∑(y−y)2

n−p−1

= √MSE

= 507.42

Confidence interval: [4970.42– (1.987 * 507.42), 4970.42+ (1.987 * 507.42)]

Page 7: S DOC1

[3962.18, 5978.66]

Thus, the average salary for women, for the given data, will be between $3962.18 and

$5978.66.