ppol solution

8/9/2019 PPOL Solution

1/13

PPOL 502: Problem Set #2

Vivek Agarwal

February 11, 2015

Time Taken: 2.5 hours

Solution 1

1. A 1% in the distance of house from a recently built garbage incineratorresults in an increase of 0.312% in price of the house sold. One wouldexpect that as the distance between a house and garbage incinerator in-creases its price would go up. Harmful gases released by an incinerator aremore likely to be found in houses closer to the incinerator and are there-fore likely to reduce to the demand for such house. Hence, the positivesign of the coefficient on log(dist ) is expected.

2. No. Controlling for several other factors has been ignored. It is likelythat the incinerators are actually installed by the local authorities basedon other criterion such as availability of land, distance from water sources(to prevent contamination, etc.). Clearly these factors also determine theprice of the houses, therefore ignoring them creates a bias.

3. As discussed above, distance from say a clean water body (lake, sea, etc.)can increase the price of a house dramatically. Further, incinerators areusually not installed close to water bodies to prevent contamination, etc.Therefore, proximity to a water body increases the price, but also has aneff ect by the way increasing the distance.

Solution 2

1. The average salary is $ 957.9455, while the average IQ of the populationis 101.2824. The standard deviation of IQ in the sample is 15.05264. SeeFigure 1.

2. The simple regression model has been presented in Figure2. The equationis: d wage = 116.9916 + 8.303064 c IQ (1)

1


2/13

Figure 1: Summary Results

Figure 2: Regression Results

2


3/13


Further, an increase of 15 points in IQ will result in an increase of 15times the coefficient of IQ in the value of wage . This can be calculated as$124.54596.

R2 values can help us determine if most of the variation in the dependentvariable is explained by the independent variable. Here, the R2 value is0.0955. Hence, only 9.55%, a very small part, of the variation in wage isexplained by IQ .

3. The regression model has been presented in Figure3. The equation is:

d ln(wage) = 5.886994 + 0.0088072 c IQ (2)

This model predicts that a one unit change in IQ will result in a 0.88072%change in wage. Further, an increase of 15 points in IQ will result in anincrease which is 15 times the coefficient of IQ in the model, i.e. 13.2108%.However, note that this is an approximation and might not be accurate.

Solution 3

1. A lower value of rank reflects a higher academic superiority of the school.Further, a higher academic superiority is expected to result in studentswith higher abilities - a valued asset for employers - leading to higherstarting salaries. Therefore, rank is inversely related to salary . This resultsin the expectation of β 5 ≤ 0.

2. An increase in LSAT reflects a higher ability for a better job performanceand consequently would result in higher salaries. Therefore, one would

3


4/13

expect β 1 ≥ 0. Similarly, GPA would result in higher salaries. Therefore,one would expect β 2 ≥ 0.

Higher the number of volumes in a law school library, libvol , greater arethe opportunities for students to gain more expertise. This expertise isof value to employers and is expected to result in higher starting salaries.Therefore, one would expect β 3 ≥ 0.

Students typically perform a cost-benefit perspective while making thechoice of a college. A college with higher cost would be favored only if thefuture payoff s, including starting salary , are high. It could also be arguedthat schools that charge more, have greater capacity to administer bettereducation that would result in better academic outcomes for the students.This is again of value to employers and results in the same conclusion.Therefore, one would expect β 4 ≥ 0.

3. A one unit change in GPA results in a 100 × β 2% change in salary , i.e.24.8% increase in salary.

4. A β 3% increase results in salary from a 1% change in the value of libel . Theestimated equation suggests that a 0.095% increase results in the value of salary from every 1% increase in the value of libel .

5. For every bettered rank, one can expect an average increase of 0.33%(= −β 5× 100) in their salaries. Therefore, an increase in ranking by 20 isexpected to result in a 6.6% (= 20 × 0.33%) increase in the salary.

The substantiveness of the 0.33% increase in salary for every unit ‘better-ment’ rank has to be interpreted considering trade-off s involved in increas-ing salary by changing the other parameters. However, overall choosing a

‘better’ ranked (lower value) college is expected to result in higher salaries.

Solution 4

1. The estimated equation is presented below. Also, see Figure??.

d price = −19.315 + 0.1284362 d sqrft + 15.19819 d bdrms (3)

2. The estimated increase in the price for a house is $0 with addition of onemore bedroom, holding square footage constant. This results from thefact that the coefficient of bdrms is not statistically significant even at asignificance level of 10%. Hence, its coefficient is indistinguishable from 0.

3. The estimated increase in the price for a house with an additional bedroomthat is 140 square feet in size is $33,179 (= (β 1δ (sqrft ) + β 2δ (bdrms )) ×1000).

4


5/13


Figure 5: Summary Results

4. R-Squared value suggests the percentage variation in the dependent vari-

able that is explained by the independent variables. The R-Squared valuefor this estimated model is 0.6319 or 63.19%. Therefore, 63.19% of thevariation in price for a house is explained by the variation in square footageand number of bedrooms.

5. Substituting for sqrft = 2438 and bdrms = 4 in Equation 3, we have price = 354.60522. Hence, the price is $ 354,605.

(vi) Residual is the diff erence between the actual (price ) and the estimated

value( d price). Therefore, the residual here is $-54,605. Hence, the buyerunderpaid by $54,605according to the predictions from the model.

Solution 5

1. The average value of prpblck is 0.1134864, and the average value of income is $47053.78. See Figure 5. prpblck is unit-less and is ratio while income has the units of $ per year and is ratio.

2. The estimated equation is presented below (See Equation 4). Also, see

5


6/13


Figure6. The R-Squared is 0.0642 and the sample size is 401.

d psoda = 0.9563196 + 0.1149882 d prpblck + 0.0000016 d income (4)

For every 1% increase in the proportion of black population the price of medium soda increases by $0.1149882. This increase is almost 10% themean of psoda (1.044876) and greater than the its standard deviation(0.0886873). Therefore, this change can be considered to be substantiallysignificant.

3. The estimated equation (without controlling for income) is presented be-low (See Equation 5). Also, see Figure7. .

d psoda = 1.037399 + 0.0649269 d prpblck (5)

The previous model estimates a larger eff ect of $0.1149882 increase inprice of soda for every percentage increase in black population comparedto $0.0649269 predicted by the modified model. The discrimination eff ectis larger when income is controlled in the model.

4. For every 1% increase in the proportion of black population the price of soda increases by β 1×100, i.e. 12.15803%. See Figure 8. Therefore, a 0.2%

increase will increase the price of soda by 2.431606% (= 12.15803%×0.20).

5. On adding the variable prppov to the regression model we notice that thevalue of ˆβ prpblck decreases from 0.1215803 to 0.0728072. See Figures 8 &9

6


7/13



7


8/13


Figure 10: Correlation Results

6. The correlation between logincome and prppov is found be -0.8385. SeeFigure 10. A negative correlation was expected because median familyincome, income , in poor areas is ’roughly’ expected to be low. However,it must be noted that areas with low poverty proportion might still havelow median family incomes.

7. The statement “because log(income ) and prppov are so highly correlated,they have no business being in the same regression” is partly unjustified.log(income ) and prppov are not perfectly collinear and measure povertybut using diff erent measures. Therefore, although this might lead to mul-ticollinearity, it will not lead to perfect collinearitity and they can be

included in the same regression.

Solution 6

1. MLR1: Although the few points corresponding to higher values of ln(income)& prpblck tend to skew the best fit line; however generally a trend can be

8


9/13

Figure 11: Matrix Plot

observed that is linear. See Figure 11.

2. MLR2: No information on random sampling has been provided. However,for the purposes of this exercise it has been assumed that random samplingwas infact adhered to.

3. MLR3: From Figure 11 it is clear that none of the Independent Variables(ln(income) & prpblck ) are constant. Also, the number of observations(401) are far greater than the number of Independent Variables (2). In ad-dition, STATA doesn’t report a problem with perfect collinearity. Hence,it can be comfortably concluded that MLR3 is satisfied.

4. MLR4: This condition essentially requires for the rvp plots for all in-dependent variables to be a mirror image across residual equals zero (as-suming one dot represents one point).

Studying the rvp plot for prpblck , we can conclude that almost across allthe values of prpblck we see that the residuals add up to zero. See Figure12. Similarly, on analyzing the rvp plot for ln(income) we notice thatalthough for a few low values of ln(income) the residuals are net negative,they generally cancel each other out for most other values of ln(income).See Figure 13.

9


10/13

Figure 12: Residual versus Predictor Plot for prpblck

This can also be interpreted from the rvf plot. Except for the lowerand higher fitted values the residuals approximately cancel each other

out. This deviation in lower and higher values is not surprising and wasexpected based on the interpretations of the rvp plots. See Figure 14.

5. Unbiasedness:Since MLR1-4 are satisfied, it can be concluded that theestimated equation in unbiased. Further, not we can explore the efficiencyof the estimated equation by evaluating MLR 5.

6. MLR5: This condition essentially requires for that the envelope of pointsbe rectangular in shape and be symmetric around the residuals = 0 linefor all independent variables.

Studying the rvp plot for prpblck , we see that the residuals are boxed inbetween -0.2 and 0.2 for values of prpblck between 0.1 to 0.7. For othervalues of prpblck though the values are either higher or below this rectan-

gle, they are very few in number. Therefore, we can conclude that MLR5 is almost satisfied for prpblck . See Figure 12. Similarly, on analyzingthe rvp plot for ln(income) we notice for values ln(income) between 10.25and 11.25 the residuals are boxed between -0.2 and 0.2. However for theother values they are much lower that 0.2. Therefore, we can concludethat MLR 5 is not satisfied for ln(income). See Figure 13.

10


11/13

Figure 13: Residual versus Predictor Plot for lnincome

11


12/13

Figure 14: Residual versus Fitted Value Plot

12


13/13

Figure 15: Kernel Density Plot

This can also be interpreted from the rvf plot. Though boxed for fittedvalues between 0 and 0.07 between -0.2 and 0.2, the values are much lower

for higher and lower fitted values. This deviation in lower and highervalues is not surprising and was expected based on the interpretations of the rvp plots. See Figure 14.

7. Efficiency: Since, MLR 5 is violated overall we can conclude that theestimated equation is not an efficient estimator.

8. MLR6: This condition requires for the population residual u to be in-dependent of the independent variables and be normally distributed withzero mean and constant variance. This can be analyzed from the plotin Figure 15. Clearly, the residuals do not follow a normal distribution.Therefore, it can be concluded that MLR 6 is not satisfied.

9. Reliability of Standard Errors: Since MLR 6 is not satisfied we canconclude that the standard errors are not reliable.

13

ppol solution

Documents