assignment 3.1
DESCRIPTION
22TRANSCRIPT
Assignment 3: Multiple Regression
This data set consists of a sample of over eight hundred used cars in this country. The retail price of these cars was calculated from the tables provided by the association of car manufacturer. You are provided with a data set containing the following variables:
· Price: suggested retail price of the used car in excellent condition. The condition of a car can greatly affect price. All cars in this data set were less than one year old when priced and considered to be in excellent condition.
· Mileage: number of miles the car has been driven · Make: manufacturer of the car. · Model: specific models for each car manufacturer.· Trim (of car): specific type of car model such as SE Sedan 4D, Quad Coupe 2D· Type: body type such as sedan, coupe, etc.· Cylinder: number of cylinders in the engine· Liter: a more specific measure of engine size · Doors: number of doors · Cruise: indicator variable representing whether the car has cruise control (1 = cruise)· Sound: indicator variable representing whether the car has upgraded speakers (1 = upgraded)· Leather: indicator variable representing whether the car has leather seats (1 = leather)
Perform the following tasks on this data set:
1. Use simple linear regression to explore the intuitive relationship between miles traveled and retail price.From the simple regression results, answer the following questions:a. In general, what happens to price when there is one more mile on the car?b. Does mileage help you predict price? What does the p-value tell you?c. Does mileage help you predict price? What does the R-Sq value tell you?
Answers
Variables Entered/Removeda
Model
Variables
Entered
Variables
Removed Method
1 Mileageb . Enter
a. Dependent Variable: Price
b. All requested variables entered.
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t
Sig.
B Std. Error Beta
1 (Constant) 24764.559 904.363 27.383 .000
Mileage -.173 .042 -.143 -4.093 .000
a. Dependent Variable: Price
a. The price will be reduced by 1.73 cents with each added mile on the car.
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) 24764.559 904.363 27.383 .000
Mileage -.173 .042 -.143 -4.093 .000
a. Dependent Variable: Price
b. Yes.p-value explains that the relationship between mileage and price is negatively significant corelated.It is significant but in negative direction.
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .143a .020 .019 9789.288
a. Predictors: (Constant), Mileage
c. Yes. R-Sq(R2) is the correlation coefficient squared(.1432 = .020) referred to as the coefficent of determination. This values indicates the percentage of total variation of Y ( Price) explained by the regression model consisting of miles. Only 2% can be influenced by mileage and the rest (98%) by other factors.
2. Taking price as the dependent variable, perform stepwise multiple regression on this data set.What is your final model? How many variable/variables was/were dropped from the model. Explain why?
Variables Entered/Removeda
Model
Variables
Entered
Variables
Removed Method
1
Cylinder .
Stepwise
(Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove
>= .100).
2
Cruise .
Stepwise
(Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove
>= .100).
3
Leather .
Stepwise
(Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove
>= .100).
4
Mileage .
Stepwise
(Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove
>= .100).
5
Doors .
Stepwise
(Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove
>= .100).
6
Sound .
Stepwise
(Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove
>= .100).
a. Dependent Variable: Price
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -17.057 1126.944 -.015 .988
Cylinder 4054.203 206.852 .569 19.600 .000
2 (Constant) -1046.431 1082.655 -.967 .334
Cylinder 3392.587 211.273 .476 16.058 .000
Cruise 6000.366 678.841 .262 8.839 .000
3 (Constant) -2978.398 1129.554 -2.637 .009
Cylinder 3276.233 209.189 .460 15.662 .000
Cruise 6362.343 671.901 .278 9.469 .000
Leather 3139.484 608.259 .142 5.161 .000
4 (Constant) 412.562 1296.815 .318 .750
Cylinder 3232.656 206.188 .454 15.678 .000
Cruise 6492.035 662.181 .284 9.804 .000
Leather 3161.569 599.032 .143 5.278 .000
Mileage -.165 .032 -.137 -5.087 .000
5 (Constant) 5530.335 1709.446 3.235 .001
Cylinder 3257.643 203.798 .457 15.985 .000
Cruise 6319.636 655.373 .276 9.643 .000
Leather 2978.887 593.246 .135 5.021 .000
Mileage -.167 .032 -.139 -5.214 .000
Doors -1402.112 310.015 -.121 -4.523 .000
6 (Constant) 7323.164 1770.837 4.135 .000
Cylinder 3200.125 202.983 .449 15.765 .000
Cruise 6205.511 651.463 .271 9.525 .000
Leather 3327.143 597.114 .151 5.572 .000
Mileage -.171 .032 -.141 -5.352 .000
Doors -1463.399 308.274 -.126 -4.747 .000
Sound -2024.401 570.718 -.096 -3.547 .000
a. Dependent Variable: Price
Model Summaryg
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate Durbin-Watson
1 .569a .324 .323 8133.162
2 .620b .384 .382 7768.193
3 .635c .404 .402 7646.769
4 .650d .423 .420 7530.569
5 .661e .437 .433 7440.529
6 .668f .446 .442 7387.114 .304
a. Predictors: (Constant), Cylinder
b. Predictors: (Constant), Cylinder, Cruise
c. Predictors: (Constant), Cylinder, Cruise, Leather
d. Predictors: (Constant), Cylinder, Cruise, Leather, Mileage
e. Predictors: (Constant), Cylinder, Cruise, Leather, Mileage, Doors
f. Predictors: (Constant), Cylinder, Cruise, Leather, Mileage, Doors, Sound
g. Dependent Variable: Price
In the Model Summary, we can see that litre is deleted.
Excluded Variablesa
Model Beta In t Sig.
Partial
Correlation
Collinearity
Statistics
Tolerance
1 Mileage -.126b -4.401 .000 -.154 .999
Liter .158b 1.563 .118 .055 .082
Doors -.140b -4.890 .000 -.170 1.000
Cruise .262b 8.839 .000 .298 .874
Sound -.074b -2.543 .011 -.090 .992
Leather .115b 3.981 .000 .139 .994
2 Mileage -.136c -4.966 .000 -.173 .998
Liter .037c .383 .702 .014 .081
Doors -.128c -4.655 .000 -.162 .997
Sound -.058c -2.094 .037 -.074 .988
Leather .142c 5.161 .000 .180 .983
3 Mileage -.137d -5.087 .000 -.177 .998
Liter .004d .037 .970 .001 .080
Doors -.119d -4.377 .000 -.153 .993
Sound -.084d -3.048 .002 -.107 .960
4 Liter .017e .180 .858 .006 .080
Doors -.121e -4.523 .000 -.158 .992
Sound -.088e -3.243 .001 -.114 .959
5 Liter -.108f -1.108 .268 -.039 .074
Sound -.096f -3.547 .000 -.125 .956
6 Liter -.088g -.908 .364 -.032 .074
a. Dependent Variable: Price
b. Predictors in the Model: (Constant), Cylinder
c. Predictors in the Model: (Constant), Cylinder, Cruise
d. Predictors in the Model: (Constant), Cylinder, Cruise, Leather
e. Predictors in the Model: (Constant), Cylinder, Cruise, Leather, Mileage
f. Predictors in the Model: (Constant), Cylinder, Cruise, Leather, Mileage, Doors
g. Predictors in the Model: (Constant), Cylinder, Cruise, Leather, Mileage, Doors, Sound
Litre is excluded where the P value is high. For each model ( 1 – 6 ) the p values are more than 0.05 significant.
Only one variable is dropped. Because the P value of Litre is more than the significance
value; p < 0.05.
3. Transform price to log price and take this new variable as your dependent variable. Perform multiple regression by including variables in (2) as independent variables. Discuss the results.
Variables Entered/Removeda
Model
Variables
Entered
Variables
Removed Method
1 Leather,
Mileage, Doors,
Cylinder, Sound,
Cruiseb
. Enter
a. Dependent Variable: LgPrice
b. All requested variables entered.
Model Summaryb
Mode
l R
R
Square
Adjusted R
Square
Std. Error
of the
Estimate
Change Statistics
Durbin-
Watson
R Square
Change
F
Change df1 df2
Sig. F
Change
1 .695a .484 .480 .12847 .484 124.410 6 797 .000 .376
a. Predictors: (Constant), Leather, Mileage, Doors, Cylinder, Sound, Cruise
b. Dependent Variable: LgPrice
After running the Log Price as the dependent variable, we can see that Litre is excluded but cylinder is included.
Coefficientsa
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig. 95.0%
Confidence
Interval for B
Correlations Collinearity
Statistics
B
Std.
Error Beta
Lower
Bound
Upper
Bound
Zero-
order Partial Part Tolerance VIF
1 (Constant) 3.996 .031 129.744 .000 3.935 4.056
Mileage -
3.206E-
6
.000 -.148 -5.786 .000 .000 .000 -.148 -.201-.14
7.997 1.003
Cylinder .057 .004 .440 16.018 .000 .050 .063 .583 .493 .408 .857 1.167
Doors-.016 .005 -.077 -3.007 .003 -.027 -.006 -.092 -.106
-.07
7.989 1.011
Cruise .139 .011 .338 12.298 .000 .117 .162 .494 .399 .313 .859 1.165
Sound-.038 .010 -.099 -3.816 .000 -.057 -.018 -.139 -.134
-.09
7.956 1.046
Leather .053 .010 .132 5.078 .000 .032 .073 .130 .177 .129 .952 1.050
a. Dependent Variable: LgPrice
Excluded Variablesa
Model Beta In t Sig.
Partial
Correlation
Collinearity
Statistics
Tolerance
1 Mileage -.137b -4.876 .000 -.170 1.000
Cylinder .215b 2.170 .030 .076 .082
Doors -.045b -1.578 .115 -.056 .994
Cruise .316b 10.999 .000 .362 .857
Sound -.101b -3.561 .000 -.125 .996
Leather .079b 2.777 .006 .098 .992
2 Mileage -.147c -5.646 .000 -.196 .998
Cylinder .243c 2.636 .009 .093 .082
Doors -.039c -1.480 .139 -.052 .993
Sound -.080c -3.017 .003 -.106 .990
Leather .113c 4.271 .000 .149 .980
3 Cylinder .223d 2.463 .014 .087 .082
Doors -.042d -1.610 .108 -.057 .993
Sound -.084d -3.220 .001 -.113 .990
Leather .114d 4.393 .000 .154 .980
4 Cylinder .236e 2.632 .009 .093 .082
Doors -.036e -1.375 .169 -.049 .990
Sound -.106e -4.060 .000 -.142 .963
5 Cylinder .204f 2.284 .023 .081 .081
Doors -.042f -1.642 .101 -.058 .986
6 Doors -.062g -2.346 .019 -.083 .916
a. Dependent Variable: TrPrice
b. Predictors in the Model: (Constant), Liter
c. Predictors in the Model: (Constant), Liter, Cruise
d. Predictors in the Model: (Constant), Liter, Cruise, Mileage
e. Predictors in the Model: (Constant), Liter, Cruise, Mileage, Leather
f. Predictors in the Model: (Constant), Liter, Cruise, Mileage, Leather, Sound
g. Predictors in the Model: (Constant), Liter, Cruise, Mileage, Leather, Sound, Cylinder
Doors is excluded. Where the P values are more than 0.05 for each model.
Only one variable is dropped. Because the P value of Doors is more than the significance
value; p < 0.05.
4. Since Type (Sedan, Hatchback, Convertible or Coupe) and Make (A,B,C,D,E or F) are also criterias considered by many car buyers, perform another regression by considering these two variables. Discuss the results.
A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute with two or more distinct categories/levels. Regression analysis treats all independent (X) variables in the analysis as numerical. Numerical variables are interval or ratio scale variables whose values are directly comparable. For multiple regression analysis, all but one of the dummy variables is entered as independent variables for each of the original categorical variables. With dummy variables, the regression coefficients indicate the difference in the dependent variable between the category specified by the dummy variable and the category omitted from the analysis.
After Type and Make are changed into dummy variables. The data is analysed.
Descriptive Statistics
Mean Std. Deviation N
LgPrice 4.2904 .17811 804
Mileage 19831.93 8196.320 804
Cylinder 5.27 1.388 804
Doors 3.53 .850 804
Cruise .75 .432 804
Sound .68 .467 804
Leather .72 .447 804
A .10 .300 804
B .10 .300 804
C .40 .490 804
D .14 .349 804
E .07 .263 804
Sedan .42 .494 804
Convertible .06 .242 804
Hatchback .05 .218 804
Coupe .17 .379 804
Variables Entered/Removeda
Model
Variables
Entered
Variables
Removed Method
1 Coupe, Mileage,
Cruise, Leather,
Convertible,
Sound,
Hatchback, E, A,
B, D, C,
Cylinder, Sedanb
. Enter
a. Dependent Variable: LgPrice
b. Tolerance = .000 limit reached.
Model Summaryb
Mod
el R
R
Square
Adjusted R
Square
Std. Error
of the
Estimate
Change Statistics
Durbin-
Watson
R Square
Change
F
Change df1 df2
Sig. F
Change
1 .960a .922 .921 .05008 .922 669.112 14 789 .000 .274
a. Predictors: (Constant), Coupe, Mileage, Cruise, Leather, Convertible, Sound, Hatchback, E, A, B, D, C, Cylinder,
Sedan
b. Dependent Variable: LgPrice
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0%
Confidence
Interval for B Correlations
Collinearity
Statistics
B
Std.
Error Beta
Lower
Bound
Upper
Bound
Zero-
order Partial Part Tolerance VIF
1 (Constant) 3.903 .013 306.064 .000 3.877 3.928
Cylinder .072 .002 .560 36.209 .000 .068 .076 .583 .790 .359 .412 2.429
Cruise .010 .005 .024 1.963 .050 .000 .020 .494 .070 .019 .663 1.507
Sound .002 .004 .004 .425 .671 -.006 .010 -.139 .015 .004 .884 1.131
Leather .017 .004 .042 3.927 .000 .008 .025 .130 .138 .039 .845 1.183
A .067 .009 .113 7.316 .000 .049 .085 .044 .252 .073 .414 2.416
B .213 .010 .359 21.339 .000 .194 .233 .580 .605 .212 .348 2.870
C-.007 .006 -.019 -1.167 .243 -.018 .005 -.467 -.042
-.01
2.377 2.654
D .310 .008 .608 38.839 .000 .294 .326 .402 .810 .385 .402 2.486
E .008 .009 .012 .896 .371 -.010 .026 -.237 .032 .009 .561 1.781
Sedan-.032 .006 -.089 -5.275 .000 -.044 -.020 .029 -.185
-.05
2.349 2.862
Convertible .113 .009 .154 13.072 .000 .096 .130 .440 .422 .130 .712 1.404
Hatchback-.083 .010 -.102 -8.737 .000 -.102 -.065 -.263 -.297
-.08
7.727 1.375
Coupe .002 .006 .005 .364 .716 -.010 .014 -.178 .013 .004 .613 1.630
a. Dependent Variable: LgPrice
Coefficient Correlationsa
Model
Cou
pe
Mile
age
Crui
se
Leat
her
Conve
rtible
Sou
nd
Hatch
back E A B D C
Cylin
der
Sed
an
1 Correla
tions
Coupe 1.00
0.021
-.07
3
-.07
1.235
-.06
6.337
-.30
6
-.21
5
-.20
7.009
-.30
8.128 .513
Mileag
e.021
1.00
0
-.01
2
-.02
1.007 .023 .038
-.04
7
-.06
0
-.01
7
-.04
7
-.03
9.017 .051
Cruise -.07
3
-.01
2
1.00
0.102 .028 .005 .106 .073
-.13
3
-.04
7
-.32
2.022
-.33
4.041
Leathe
r
-.07
1
-.02
1.102
1.00
0.030
-.14
2-.015 .079 .095
-.14
7
-.10
5
-.08
7
-.06
8
-.02
0
Conve
rtible.235 .007 .028 .030 1.000
-.03
3.167
-.20
4
-.21
1
-.26
1
-.38
4
-.24
1
-.07
0.374
Sound -.06
6.023 .005
-.14
2-.033
1.00
0.042 .115
-.01
4.044 .065
-.09
8.076
-.06
2
Hatchb
ack.337 .038 .106
-.01
5.167 .042 1.000
-.14
5
-.25
0
-.25
9
-.05
6
-.33
7.161 .398
E -.30
6
-.04
7.073 .079 -.204 .115 -.145
1.00
0.431 .341 .387 .534 .098
-.42
4
A -.21
5
-.06
0
-.13
3.095 -.211
-.01
4-.250 .431
1.00
0.586 .361 .541
-.20
1
-.62
2
B -.20
7
-.01
7
-.04
7
-.14
7-.261 .044 -.259 .341 .586
1.00
0.257 .498
-.43
5
-.60
8
D.009
-.04
7
-.32
2
-.10
5-.384 .065 -.056 .387 .361 .257
1.00
0.504 .425
-.20
4
C -.30
8
-.03
9.022
-.08
7-.241
-.09
8-.337 .534 .541 .498 .504
1.00
0.025
-.42
6
Cylind
er.128 .017
-.33
4
-.06
8-.070 .076 .161 .098
-.20
1
-.43
5.425 .025
1.00
0.253
Sedan.513 .051 .041
-.02
0.374
-.06
2.398
-.42
4
-.62
2
-.60
8
-.20
4
-.42
6.253
1.00
0
Covari
ances
Coupe 3.53
6E-5
2.76
6E-
11
-
2.18
4E-6
-
1.81
4E-6
1.210
E-5
-
1.57
6E-6
1.906
E-5
-
1.63
2E-5
-
1.17
5E-5
-
1.23
3E-5
4.44
6E-7
-
1.07
6E-5
1.51
1E-6
1.84
4E-5
Mileag
e2.76
6E-
11
4.69
7E-
14
-
1.34
5E-
11
-
1.97
9E-
11
1.223
E-11
2.00
7E-
11
7.819
E-11
-
9.10
0E-
11
-
1.18
3E-
10
-
3.75
6E-
11
-
8.12
4E-
11
-
5.00
7E-
11
7.23
6E-
12
6.71
5E-
11
Cruise-
2.18
4E-6
-
1.34
5E-
11
2.52
5E-5
2.19
8E-6
1.233
E-6
1.10
5E-7
5.078
E-6
3.28
2E-6
-
6.11
9E-6
-
2.36
8E-6
-
1.29
2E-5
6.38
4E-7
-
3.33
0E-6
1.23
9E-6
Leathe
r-
1.81
4E-6
-
1.97
9E-
11
2.19
8E-6
1.84
7E-5
1.122
E-6
-
2.45
8E-6
-
6.332
E-7
3.05
2E-6
3.74
1E-6
-
6.29
6E-6
-
3.60
3E-6
-
2.21
0E-6
-
5.75
9E-7
-
5.16
4E-7
Conve
rtible1.21
0E-5
1.22
3E-
11
1.23
3E-6
1.12
2E-6
7.507
E-5
-
1.16
8E-6
1.379
E-5
-
1.58
6E-5
-
1.67
5E-5
-
2.26
3E-5
-
2.65
9E-5
-
1.22
6E-5
-
1.19
7E-6
1.96
2E-5
Sound -
1.57
6E-6
2.00
7E-
11
1.10
5E-7
-
2.45
8E-6
-
1.168
E-6
1.62
0E-5
1.599
E-6
4.14
9E-6
-
5.01
8E-7
1.78
1E-6
2.09
7E-6
-
2.31
3E-6
6.06
5E-7
-
1.50
7E-6
Hatchb
ack1.90
6E-5
7.81
9E-
11
5.07
8E-6
-
6.33
2E-7
1.379
E-5
1.59
9E-6
9.075
E-5
-
1.23
6E-5
-
2.18
8E-5
-
2.46
4E-5
-
4.27
8E-6
-
1.88
6E-5
3.03
9E-6
2.29
2E-5
E-
1.63
2E-5
-
9.10
0E-
11
3.28
2E-6
3.05
2E-6
-
1.586
E-5
4.14
9E-6
-
1.236
E-5
8.04
7E-5
3.54
4E-5
3.05
3E-5
2.76
9E-5
2.81
5E-5
1.75
0E-6
-
2.30
1E-5
A-
1.17
5E-5
-
1.18
3E-
10
-
6.11
9E-6
3.74
1E-6
-
1.675
E-5
-
5.01
8E-7
-
2.188
E-5
3.54
4E-5
8.41
3E-5
5.36
9E-5
2.64
1E-5
2.91
7E-5
-
3.65
3E-6
-
3.45
2E-5
B-
1.23
3E-5
-
3.75
6E-
11
-
2.36
8E-6
-
6.29
6E-6
-
2.263
E-5
1.78
1E-6
-
2.464
E-5
3.05
3E-5
5.36
9E-5
9.99
1E-5
2.05
4E-5
2.92
7E-5
-
8.63
2E-6
-
3.67
4E-5
D
4.44
6E-7
-
8.12
4E-
11
-
1.29
2E-5
-
3.60
3E-6
-
2.659
E-5
2.09
7E-6
-
4.278
E-6
2.76
9E-5
2.64
1E-5
2.05
4E-5
6.37
4E-5
2.36
3E-5
6.73
3E-6
-
9.85
8E-6
C -
1.07
6E-5
-
5.00
7E-
11
6.38
4E-7
-
2.21
0E-6
-
1.226
E-5
-
2.31
3E-6
-
1.886
E-5
2.81
5E-5
2.91
7E-5
2.92
7E-5
2.36
3E-5
3.45
6E-5
2.93
7E-7
-
1.51
4E-5
Cylind
er1.51
1E-6
7.23
6E-
12
-
3.33
0E-6
-
5.75
9E-7
-
1.197
E-6
6.06
5E-7
3.039
E-6
1.75
0E-6
-
3.65
3E-6
-
8.63
2E-6
6.73
3E-6
2.93
7E-7
3.94
1E-6
3.04
2E-6
Sedan1.84
4E-5
6.71
5E-
11
1.23
9E-6
-
5.16
4E-7
1.962
E-5
-
1.50
7E-6
2.292
E-5
-
2.30
1E-5
-
3.45
2E-5
-
3.67
4E-5
-
9.85
8E-6
-
1.51
4E-5
3.04
2E-6
3.65
9E-5
a. Dependent Variable: LgPrice
After including type and make (dummy variables), other variables are excluded from the
model as their partial correlation was significant. This suggest that if we maintain them in the
model, it will not have significant influence on the ability of the model to predict retail price
of the car.
The prediction model contained 11 variables in total and 9 dummies. All 11 predictors were
gathered in 11 steps with 5 variables removed. The model was statistically significant and
counted for 97.3% of the variance of the retail price. Only litre and mileage have the highest
influence on retail price of the car.
GRADUATE SCHOOL OF BUSINESS (UKM – GSB)
ZCZA6043
MULTIVARIATE ANALYSIS
ASSIGNMENT 3
MULTIPLE REGRESSION
PREPARED FOR :
PROF. MADYA DR. RASIDAH MOHAMAD SAID
PREPARED BY:
AL AZMI BIN ABDUL RAHMAN
(ZP 02311)