dummy variable regression model
TRANSCRIPT
ECONOMETRICS
DUMMY VARIABLE REGRESSION MODEL
SUBMITTED TO:
MA’AM SABAHAT SUBHAN
SUBMITTED BY:
ARSHAD AHMED SAEED
Department:
Economic & Finance
NUML UNIVERSITY ISLAMABAD
Dummy variables regression models
Introduction:
We have four types of variables that we use generally for the analysis:
Ratio scale
Interval scale
Ordinal scale
Nominal scale
In most of the previous chapters we used the ratio scale variables in the models.
But in this chapter we will use the models which consider ratio as well as nominal
scale variables, which are also known as the categorical variables, indicator
variables, qualitative or dummy variable.
The nature of the dummy variables:
The regressand variable (dependent variable) in the regression analysis is not just
influenced by the ratio scale variable but also influenced by the nominal scale or
qualitative variable like color, sex, race, religion etc.
These variables indicates the presence or absence of a quality like male or female
black or white etc. we quantify these attributes with the artificial variables. These
artificial variables take the value of 0 or 1.
0 indicates the absence of the attribute and 1 indicates the presence of attribute for
example a player is winner so we will assign him one otherwise zero. These
variables which assume the values of zero and one are called dummy variables and
are essential to classify the data into mutually exclusive categories like winner or
loser.
These variables can be used in the models just as easily as quantitative variables. In
the regression model the independent variables may be dummy or qualitative in
nature and if a model has all the dummy variables than these types of models are
called analysis of variance model (ANOVA).
ANOVA models
With the help of example we can illustrate the ANOVA model.
In the table, average salaries of twelve private schools teachers of different cities
are given. These 12 cities are grouped into three regions:
1. Punjab
2. KPK
3. Baluchistan & other
Suppose we want to know the average annual salaries of private teachers differs in
these three regions. With the help of regression analysis we can get this objective.
Consider the model.
Yi= β1+ β2S2i+ β3S3i+ui -------- (1)
Yi= average salary of private schools in i regions
S2i=1 if the region is Punjab otherwise 0d
S3i=1if the region is KPK otherwise 0
The model is same like the previous models but it is qualitative (dummy
regressors) rather than quantitative.
This model tells us that by assuming error term satisfy the usual OLS assumptions,
we take expectation on the both sides of equation one. So we get that:
Mean salary of the Punjab is:
E (Yi/S2i = 1, S3i = 0) = β1 + β2 ------------ (2)
The mean salary of the KPK is:
E (Yi/S2i = 1, S3i = 0) = β1 + β3 -------------(3)
So now what will be the mean salary of remaining? That is:
E (Yi/S2i = 1, S3i = 0) = β1
Here intercept β1 tells the mean salary of private teachers in the cities of
Baluchistan and others the ‘slope’ of coefficient of β2 and β3 tell that how much
the average salary of Punjab and KPK differ from the Baluchistan.
By using the data table we get the results:
Yi = 32563.7 + 27213 S2i - 26732.8 S3I
Se = (1345.20), (2165.67), (1943.23)
t = (18.825) (0.265) (-0.234)
(0.00001)* (0.5632)* (.5965)*
The * values are the value for p.
The actual values for the average salaries of last two regions can easily calculated
by adding equation (2) and (3).the actual salaries will be; 59776.7 and 59296.5
respectively.
Now we will calculate how much these values are different from the mean value of
Baluchistan we can do this by taking the slope of coefficients, & checking the
significance.
Form the upper calculated values we can see that, the estimated slope of coefficient
for Punjab is not significant as its p value is 56 percent and KPK is also
insignificant with the p value 59 percent. That is why we can say that the mean
salaries in the Punjab, KPK and Baluchistan are probably same.
Caution in the use of dummy variables:
If there is constant in the regression than the no. of dummy variables must be one
less than the no. of classification of each dummy variable.
The coefficient which is attached to dummy variable must be interpreted in base or
group.
For a model having large no of dummy variables with the many classes than the
introduction of dummy variable will consume a large no. of d.f. for this purpose we
should weigh the no of qualitative variables to be introduce against the total no of
observations available for analysis.