frank q1q3

7/29/2019 Frank Q1Q3

1/18

Exercise 1:

a) We have used the command describe in order to get the information about the

dataset.

The total sample size is: 74. Each observation is one brand of car.There are 12 variables used in the dataset:

make Make and Model

price Price

mpg Mileage (mpg)

rep78 Repair Record 1978

headroom Headroom (in.)

trunk Trunk space (cu. ft.)

weight Weight (lbs.)

length Length (in.)

turn Turn Circle (ft.)

displacement Displacement (cu. in.)

gear_ratio Gear Ratio

foreign origin car type

b) To get the description statistics for variables we use the command summarize.

7/29/2019 Frank Q1Q3

2/18

To get description for only foreign origin models only we add if foreign==1 to the

command. To get description for only domestic models only we add if foreign==0

to the command.

In the result, we found 52 domestic models, 22 foreign models.

For price we have statistics below:

For mileage we have statistics below:

7/29/2019 Frank Q1Q3

3/18

For length we have statistics below:

For weight we have statistics below:

c) We create new variables by using command gen:

mpgDos Mileage per gallon of gas ( Dosmetics models)

mpgFor Mileage per gallon of gas ( Foreign models)

priceDos Price ( Dosmetics models)priceFor Price ( Foreign models)

7/29/2019 Frank Q1Q3

4/18

To test the claim that domestic cars are cheaper than foreign car. We conducted two-

sample mean comparison test.

H0 : mean(priceDos)mean(priceFor)0

We used the command ttest and get the result below.

Right-tail test: our p-value equals 0.6701 , which is higher than alpha=0.05, we fail to

reject the null hypothesis. We can conclude that domestic cars are cheaper than the

foreign ones.

To test the claim that domestic cars have better mileage (they can go more miles per

consumed gallon of gas). We conducted two-sample mean comparison test.

H0 : mean(mpgDos)mean(mpgFor)>=0

Ha : mean(mpgDos)mean(mpgFor)

7/29/2019 Frank Q1Q3

5/18

Left-tail test: our p-value equals 0,0017, which is less than alpha=0.05, we reject the

null hypothesis. We conclude that domestic cars go less miles per consumed gallon of

gas than foreign ones. ( or foreign cars have better mileage)

d) D: Event the car has domestic origin

F: Event the car has foreign origin

E: Event the car is expensive (price > 5500)To find the number of cars which are expensive and have domestic origin, we used

command below. The result is 18.

To find the number of cars which are expensive and have foreign origin, we used

command below. The result is 12.

We have number of domestic cars is 52, and numbers of foreign cars is 22. Total

sample size is 74. So we can construct this table.

Domestic Foreign Totals

Expensive 18 12 30

Not Expensive 34 10 44Totals 52 22 74

7/29/2019 Frank Q1Q3

6/18

From the table we have :

(E|D) = 18/52 = 0.3562 nD =52

(E|F) = 12/22 = 0.5455 nF= 22

We now test the hypothesis that the proportion of expensive is higher for foreignmodels. So we do the two-sample proportion test.

Ho: p(E|D) - p(E|F) 0

Ha: p(E|D) - p(E|F) > 0

We used command below to get the result.

Right-tail test. The p-value equals 0.9347, which is greater than alpha 0.05.

Consequently, we fail to reject the Ho. So we can conclude that the proportion of

expensive is higher for foreign models.

e) We estimate the relationship between price index and mileage index by

constructing the simple linear regression equation:

price = b0 + b1(mpg)

We used command below to get the results of the regression:

Ho : b0 = 0 Ha : b0 0

The coefficient for _cons is 11253.06 and its p-value P>|t| is 0.000. The coefficient=11253.06 is significantly different from 0 because its p-value P>|t| is 0.000, whichis less than 0.05. So we reject the Ho. We have b0 = 11253.06

Ho : b1 = 0 Ha : b1 0

The coefficient for utilities is -238.89 and its p-value P>|t| is 0.000. The coefficient =-238.89 is significantly different from 0 because its p-value P>|t| is 0.000, which isless than 0.05. So we reject the Ho. We have b1 = -238.89

7/29/2019 Frank Q1Q3

7/18

We can conclude the multiple regression equation:

price = 11253.06 238.89 (mpg)

Each unit increased in the mileage index, we can predict an decrease of 238.89 unit in

the price of the car, given other indexes remain unchanged. ( Which mean, the more

mileage the car can go per gallon of gas, the cheaper the car is) (negative relationship)

We have R-squared = 0.2196,

It means the portion of the total variation in the dependent variable (price) , which is

explained by the variation of independent variable (mpg) in our regression, is

21.96%. However, The R-squared is small, the multiple regression still do not have

sufficient quality to claim a strong the relationship between price and mileage index.

7/29/2019 Frank Q1Q3

8/18

Question 3:

We used Data Editor to put the dataset given into Stata. To get some description of

the dataset, we used command sum. There is 5 variables, namely grocery, housing,

utilities, transportation, and healthcare . There are total 25 observations.

a. We need to plot grocery index against each other indexes. The command we used is

scatter.

For Grocery vs Housing

In the scatter plot given below, we suggest that the observation for Grocery and

housing is require extra attention since the points tend to stay in a side of a plot.

However, the points do not significantly follow any linear direction. We can keep in

my this when we do the regression At this point we could not observe any significant

pattern.

For Grocery vs Utilities

The scatter plot given below can suggest a pattern. We can estimate an increase in the

cost of grocery when the cost of utilities increases. However, the points stay away

from each other, hence we may see a large variation from the predicted pattern. We

can figure it out after doing the regression.

7/29/2019 Frank Q1Q3

9/18

For Grocery vs Transportation :

In the scatter plot given below, we predict that the points tend to follow a linear

pattern. In this case, the cost of Grocery tends to follow the increase of the cost of

transportation. However, we suggest a cautious regression to claim the relationship

between these variables, since there are some points which is significantly against the

pattern.

For Grocery vs Healthcare:

The scatter plot for Grocery vs Healthcare given below can not suggest any

significant pattern since each point stands out away from all other point. We can

figure it out after doing the regression.

7/29/2019 Frank Q1Q3

10/18

b.

Grocery vs Housing

To run the regression of grocery index vs housing index, we construct a simple linear

regression equation given below:

Grocery = b0 + b1(housing)

We test hypothesis about the coefficient.

We used the command

Ho : b1 = 0 Ha : b1 0

The coefficient for housing is 0.0517 and its p-value P>|t| is 0.186. The coefficient =

0.0517 is not significantly different from 0 because its p-value P>|t| is 0.186, which

is higher than 0.05. So we fail to reject the Ho.

This means, from the observations given, we can not conclude any significant linear

relationship between Grocery and Housing.

Ho : b0 = 0 Ha : b0 0

The coefficient for _cons is 92.94 and its p-value P>|t| is 0.00 . The coefficient 92.94is significantly different from 0 because its p-value P>|t| is 0.00, which is lower than

0.05. So we reject the Ho.

We can conclude the equation: Grocery = 92.94

7/29/2019 Frank Q1Q3

11/18


It means the portion of the total variation in the dependent variable, which is

explained by variation in the independent is only 18.6%. 81.4% of total variation can

not be explained by the model. The regression do not have sufficient quality to claim

the relationship between the cost of Grocery and the cost of housing.

Grocery vs utilitiesTo run the regression of grocery index vs utilities index, we construct a simple linear

regression equation given below:

Grocery = b0 + b1(utilities)

Command :


Ho : b1 = 0 Ha : b1 0

The coefficient for utilities is 0.1411. p-value P>|t| is 0.029

The coefficient = 0.1411 is significantly different from 0 because its p-value P>|t| is

0.029, which is smaller than 0.05. So we reject the Ho. This means, from theobservations given, we can predict a linear relationship between the cost of Grocery

and the cost of utilities.

Ho : b0 = 0 Ha : b0 0


0.00 which is smaller than 0.05. So we reject the Ho.

We can conclude the equation: Grocery = 83.99 + 0.144*(utilities)

Each unit increased in the cost of utilities, we can predict an increase of 0.144 unit in

the cost of grocery, given other indexes remain unchanged.

7/29/2019 Frank Q1Q3

12/18

We have R-squared = 0.1911. It means the portion of the total variation in the

dependent variable, which is explained by variation in the independent is only

19.11%. The regression do not have sufficient quality to claim the relationship

between the cost of Grocery and the cost of utilities.

Grocery vs Transportation:

To run the regression of grocery index vs transportation index, we construct a simple

linear regression equation given below:

Grocery = b0 + b1(transportation)

Command :


Ho : b1 = 0 Ha : b1 0


The coefficient = 0.1372 is insignificantly different from 0 because its p-value P>|t|

is 0.45, which is higher than 0.05. So we reject the Ho. This means, from the

observations given, we cannot conclude any linear relationship between the cost of

Grocery and the cost of transportation.

Ho : b0 = 0 Ha : b0 0

The coefficient = 84.25 is significantly different from 0 because its p-value P>|t| is0.00 which is smaller than 0.05. So we reject the Ho.

7/29/2019 Frank Q1Q3

13/18

We can conclude the estimated equation: Grocery = 84.25


dependent variable, which is explained by variation in the independent is only 2.51%.

The regression do not have sufficient quality to claim the relationship between thecost of Grocery and the cost of transportation.

Grocery vs Healthcare

To run the regression of grocery index vs healthcare index, we construct a simple

linear regression equation given below:

Grocery = b0 + b1(healthcare)

Command :


Ho : b1 = 0 Ha : b1 0


The coefficient = 0.0869 is insignificantly different from 0 because its p-value P>|t|

is 0.258, which is greater than 0.05. So we fail to reject the Ho. This means, from the

observations given, we cannot conclude any linear relationship between the cost of

Grocery and the cost of healthcare.

Ho : b0 = 0 Ha : b0 0


0.00 which is smaller than 0.05. So we reject the Ho.

We can conclude the estimated equation: Grocery = 89.44


dependent variable can be explained by our model is only 5.52%. The regression do

not have sufficient quality to claim the relationship between the cost of Grocery andthe cost of healthcare.

7/29/2019 Frank Q1Q3

14/18

c. Log Grocery vs Log Housing

In order to estimate the elasticity of housing to the grocery index, we construct a

simple linear regression equation given below:

ln(Grocery) = b0 + b1ln(housing)

Ho : b1 = 0 Ha : b1 0

The coefficient for ln_housing is 0.066 and its p-value P>|t| is 0.199. The coefficient

=0.066 is not significantly different from 0 because its p-value P>|t| is 0.199, which

is higher than 0.05. So we fail to reject the Ho.


relationship between Log Grocery and Log Housing.

We can conclude housing elasticity of grocery index is =0



explained by our regression is only 7.08%. The regression do not have sufficient

quality to claim the elasticity of housing index and grocery index.

Log Grocery vs Log Utilities



ln(Grocery) = b0 + b1ln(Utilities)

7/29/2019 Frank Q1Q3

15/18

Ho : b1 = 0 Ha : b1 0

The coefficient for ln_utilities is 0.131 and its p-value P>|t| is 0.047. The coefficient

=0.131 is significantly different from 0 because its p-value P>|t| is 0.045, which is

lower than 0.05. So we reject the Ho.


relationship between Log Grocery and Log utilities. We predict, when utilities index

increases by 1% , grocery index increases by 0.131%. The estimated elasticity is

=0.131.




quality to claim the elasticity of housing index and utilities index.

Log Grocery vs Log Transportation



ln(Grocery) = b0 + b1ln(Transportation)

Ho : b1 = 0 Ha : b1 0

The coefficient for ln_transportation is 0.1297 and its p-value P>|t| is 0.481. The

coefficient =0.1297 is not significantly different from 0 because its p-value P>|t| is

0.481, which is higher than 0.05. So we fail to reject the Ho.


relationship between Log Grocery and Log transportation.We can conclude transportation elasticity of grocery index is =0

7/29/2019 Frank Q1Q3

16/18




quality to claim the elasticity of transportation index and grocery index.

Log Grocery vs Log Healthcare



ln(Grocery) = b0 + b1ln(Healthcare)

Ho : b1 = 0 Ha : b1 0

The coefficient for ln_Healthcare is 0.092 and its p-value P>|t| is 0.265. The

coefficient = 0.092 is not significantly different from 0 because its p-value P>|t| is

0.265, which is higher than 0.05. So we fail to reject the Ho.


relationship between Log Grocery and Log Healthcare.

We can conclude Healthcare elasticity of grocery index is =0




quality to claim the elasticity of healthcare index and grocery index.

7/29/2019 Frank Q1Q3

17/18

d. Multiple regressionTo estimate the multiple linear model, we construct a multiple regression equation as

below :

Grocery= b0 + b1(housing) +b2(utilities) +b3(transportation) + b4(healthcare)

We used command below to do the regression:

Ho : b0 = 0 Ha : b0 0

The coefficient for _cons is 76.31 and its p-value P>|t| is 0.000. The coefficient =76.31 is significantly different from 0 because its p-value P>|t| is 0.000, which is lessthan 0.05. So we reject the Ho. We have b0 = 76.31

Ho : b1 = 0 Ha : b1 0

The coefficient for housing is 0.0859 and its p-value P>|t| is 0.109. The coefficient =0.0859 is not significantly different from 0 because its p-value P>|t| is 0.109, whichis higher than 0.05. So we fail to reject the Ho. We have b1 = 0

Ho : b2 = 0 Ha : b2 0

The coefficient for utilities is 0.1677 and its p-value P>|t| is 0.018. The coefficient =0.1677 is significantly different from 0 because its p-value P>|t| is 0.018, which isless than 0.05. So we reject the Ho. . We have b2 = 0.1677

Ho : b3 = 0 Ha : b3 0

The coefficient for transportation is 0.0284 and its p-value P>|t| is 0.87. Thecoefficient =0.0284 is not significantly different from 0 because its p-value P>|t| is0.87, which is higher than 0.05. So we fail to reject the Ho. We have b3 = 0

Ho : b4 = 0 Ha : b4 0

The coefficient for healthcare is -0.0659 and its p-value P>|t| is 0.53. The coefficient

=-0.0659 is not significantly different from 0 because its p-value P>|t| is 0.53, whichis higher than 0.05. So we fail to reject the Ho. We have b4 = 0

7/29/2019 Frank Q1Q3

18/18

We can conclude the multiple regression equation:

Grocery= 76.31 + 0(housing) +0.1677 (utilities) +0(transportation) + 0(healthcare)

Grocery= 76.31 +0.1677 (utilities)


It means the portion of the total variation in the dependent variable (grocery) , whichis explained by the variation of independent variables (housing, utilities,

transportation, healthcare) in our regression, is 31.45%. Although, the R-squared is

higher than the R-squared in individual regressions, which mean a better model, the

multiple regression still do not have sufficient quality to claim the relationship

between Grocery index and other indexes.

We can see in the multiple regressions, we increase the number of independent

variables. The more independent variables we have, the more variation of dependent

variable can be explained, the less error variation is. It means, we can increase the R-

squared. Hence the quality of our regression model is increased, the relationshipbetween variables can be predicted better.

R-squared = SSR/SST

frank q1q3

Documents