dummy variable regression model

ECONOMETRICS

DUMMY VARIABLE REGRESSION MODEL

SUBMITTED TO:

MA’AM SABAHAT SUBHAN

SUBMITTED BY:

ARSHAD AHMED SAEED

Department:

Economic & Finance

NUML UNIVERSITY ISLAMABAD

Dummy variables regression models

Introduction:

We have four types of variables that we use generally for the analysis:

Ratio scale

Interval scale

Ordinal scale

Nominal scale

In most of the previous chapters we used the ratio scale variables in the models.

But in this chapter we will use the models which consider ratio as well as nominal

scale variables, which are also known as the categorical variables, indicator

variables, qualitative or dummy variable.

The nature of the dummy variables:

The regressand variable (dependent variable) in the regression analysis is not just

influenced by the ratio scale variable but also influenced by the nominal scale or

qualitative variable like color, sex, race, religion etc.

These variables indicates the presence or absence of a quality like male or female

black or white etc. we quantify these attributes with the artificial variables. These

artificial variables take the value of 0 or 1.

0 indicates the absence of the attribute and 1 indicates the presence of attribute for

example a player is winner so we will assign him one otherwise zero. These

variables which assume the values of zero and one are called dummy variables and

are essential to classify the data into mutually exclusive categories like winner or

loser.

These variables can be used in the models just as easily as quantitative variables. In

the regression model the independent variables may be dummy or qualitative in

nature and if a model has all the dummy variables than these types of models are

called analysis of variance model (ANOVA).

ANOVA models

With the help of example we can illustrate the ANOVA model.

In the table, average salaries of twelve private schools teachers of different cities

are given. These 12 cities are grouped into three regions:

1. Punjab

2. KPK

3. Baluchistan & other

Suppose we want to know the average annual salaries of private teachers differs in

these three regions. With the help of regression analysis we can get this objective.

Consider the model.

Yi= β1+ β2S2i+ β3S3i+ui -------- (1)

Yi= average salary of private schools in i regions

S2i=1 if the region is Punjab otherwise 0d

S3i=1if the region is KPK otherwise 0

The model is same like the previous models but it is qualitative (dummy

regressors) rather than quantitative.

This model tells us that by assuming error term satisfy the usual OLS assumptions,

we take expectation on the both sides of equation one. So we get that:

Mean salary of the Punjab is:

E (Yi/S2i = 1, S3i = 0) = β1 + β2 ------------ (2)

The mean salary of the KPK is:

E (Yi/S2i = 1, S3i = 0) = β1 + β3 -------------(3)

So now what will be the mean salary of remaining? That is:

E (Yi/S2i = 1, S3i = 0) = β1

Here intercept β1 tells the mean salary of private teachers in the cities of

Baluchistan and others the ‘slope’ of coefficient of β2 and β3 tell that how much

the average salary of Punjab and KPK differ from the Baluchistan.

By using the data table we get the results:

Yi = 32563.7 + 27213 S2i - 26732.8 S3I

Se = (1345.20), (2165.67), (1943.23)

t = (18.825) (0.265) (-0.234)

(0.00001)* (0.5632)* (.5965)*

The * values are the value for p.

The actual values for the average salaries of last two regions can easily calculated

by adding equation (2) and (3).the actual salaries will be; 59776.7 and 59296.5

respectively.

Now we will calculate how much these values are different from the mean value of

Baluchistan we can do this by taking the slope of coefficients, & checking the

significance.

Form the upper calculated values we can see that, the estimated slope of coefficient

for Punjab is not significant as its p value is 56 percent and KPK is also

insignificant with the p value 59 percent. That is why we can say that the mean

salaries in the Punjab, KPK and Baluchistan are probably same.

Caution in the use of dummy variables:

If there is constant in the regression than the no. of dummy variables must be one

less than the no. of classification of each dummy variable.

The coefficient which is attached to dummy variable must be interpreted in base or

group.

For a model having large no of dummy variables with the many classes than the

introduction of dummy variable will consume a large no. of d.f. for this purpose we

should weigh the no of qualitative variables to be introduce against the total no of

observations available for analysis.

dummy variable regression model

Economy & Finance