multivar 2 - simple and multiple regression.pdf
TRANSCRIPT
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
1/26
Faculty of Engineering
Gadjah Mada University
Andi Sudiarso
Mechanical & Industrial Engineering
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
2/26
Type of relationship : dependenceNumber of predicted variables : one
Type of relationship : single
Measurement scale ofthe dependent variable : metric
Purpose:
To predict the changes in the dependent variableas a result of changes in the independentvariables.
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
3/26
y : cost (in thousand USD, 1 dollar = Rp 10.000,00)x : road length (in mile, 1 mile = 1,6093 km)
Previous road resurfacing project
x y
13457
68
101420
Now, there is a new available project to do resurfacing of 6miles road and there was no 6 miles project done before.The question is: how much the cost will be?
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
4/26
The problem of fitting a line to the data, i.e. pairs ofnumbers (x,y).
What is regression?
To use data on a quantitative independent variableto predict or explain variation in a quantitativedependent variable (Ott, 2001). Prediction refers tofuture values, explanation refers to current or pastvalues; both requires unit of association.
The problem of predicting one variable (y) fromvalues of another variable (x).
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
5/26
Linear regression
y =
0
+
1
x +
Polynomial regression
y =0 +1x +2x2 + (quadratic)y =0 +1x +2x2 +3x3 + (cubic)etc.
Non-linear regression, for example:
y =0 +1sin(2x) +y =0 +1e2x +
Multiple regression
y =0 +1x1 +2x2 +3x3 + +
Type of regression analysis
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
6/26
y = 0 + 1xwhere
0 : the intercept, the value of y when x = 01 : the slope, the change in y when there is one-unit
change in x
Simple linear regression
x
y
0
11
y =
0
+
1
x
0
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
7/26
y = 0 + 1x + where
: random error, deviation of actual y values from theirpredicted values (unpredictable and ignored factors)
Linear regression (complete form)
x
yy =
0
+
1
x
0
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
8/26
n
(xi x). yii=1
1
=n
(xi x)2
i=1
0 = y -1x
The mean squared error (MSE)
MSE = [(n-1)sy2 -12(n-1)sx2]/(n-2)
To estimate the value of parameters
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
9/26
n
sx2 = MSx = SSx / v = (xi-x)2/(n-1)
i=1
n
sy2 = MSy = SSy / v = (yi-y)2/(n-1)
i=1
n
sxy = MSxy = SSxy / v = { (xi-x)(yi-y)}/(n-1)i=1
(sample covariance between x and y)whereMS : the mean square
SS : sum of the squaresv : number of degree of freedom = n-1(sampling)
The variance is the measure of dispersionabout the mean
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
10/26
y : the mean verbal test score for 6th gradersx : a composite measure of socio-economics status
The Coleman Report (USA, 1977)
School y x School y x
1234
56789
10
37.0126.5136.5140.70
37.1033.9041.8033.4041.0137.20
7.20-11.7112.3214.28
6.316.16
12.70-0.179.85
-0.05
11121314
151617181920
23.3035.2034.9033.10
22.7039.7031.8031.7043.1041.01
-12.860.924.77
-0.96
-16.0410.622.66
-10.9915.0312.77
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
11/26
Plot of y versus x (scatter plot)
The Coleman Report (USA, 1977)
20
25
30
35
40
45
-20 -15 -10 -5 0 5 10 15 20
x
y
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
12/26
Estimate the parameters
n=20, x = 3.14, sx2
= 92.65y = 35.08, sy2 = 33.84
xiyi = 3189.88i=1-n
The Coleman Report (USA, 1977)
1 = 3189.88-20(3.14)(35.08)/[(20-1)(92.65)] = 0.560 = 35.08-0.56(3.14) = 33.32
MSE = [(20-1)(33.84)-(0.56)2(20-1)(92.65)]/(20-2)= 5.01
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
13/26
Linear regression
The Coleman Report (USA, 1977)
20
25
30
35
40
45
-20 -15 -10 -5 0 5 10 15 20
x
y
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
14/26
Taking natural logarithmic on both sides:
ln T = ln c + b ln V
Lets define:
y = ln T
0 = ln c y = 0 + 1x1 = bx = ln V
To calculate c and b, we can calculate 0 and 1 first.
Original equation:
T = c.Vb
Calculate the constants c and b!
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
15/26
The correlation coefficient r is a positive number ify tends to increase as x increases; r is negative if ytends to decrease as x increase; r is zero if there iseither no relation between changes in x and
changes in y or there is a nonlinear relation.
A measure of the linear relationship between twovariables
What is correlation?
The sample correlation coefficient (r), -1 r 1,related to the estimated slope
r = sxy/sxsy = sxy/(sx2sy2)=1sx/sy
Measurement of the strength of linear relationbetween x and y.
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
16/26
Calculate the correlation coefficient r!
Consider the following data
No. y x
12
34567
89
2541
4759545649
4330
1020
2030303040
4050
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
17/26
x = 30.00y = 44.89
sx2 = (10-30.00)2 + = 1,200/8
sy2
= (25-44.89)2
+ = 1,062.89/8sxy = (10-30.00)(25-44.89) + = 140/8
r = 140 / [(1,200)0.5(1,062.89)0.5] = 0.1240
The correlation coefficient r is a small positive number.
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
18/26
An example is the effect of factory production,consumption level, and stocks in the storage on the
price of a product.
The problem of fitting more than one independentvariable to a dependent variable.
What is multiple regression?
The surfaces obtained are not used only to makepredictions, but also often used for purposes ofoptimization, i.e. to determine the values of
independent variables when the dependent variableis maximum or minimum.
The problem of predicting the dependent variable (y)from values of the independent variables (x1, x2, ).
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
19/26
Linear regression
y =0 +1x + Polynomial regression
y =0 +1x +2x2 + (quadratic)y =0 +1x +2x2 +3x3 + (cubic)etc.
Non-linear regression, for example:
y =0 +1sin(2x) +y =0 +1e2x +
Multiple linear) regression
y =
0
+
1
x
1
+
2
x
2
+
3
x
3
+
+
Type of regression analysis
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
20/26
For two independent variables, this is a problem offitting a plane to a set of n points with coordinates
(x1i, x2i, yi), for i=1 to n. The equation is
y = 0 + 1x1 + 2x2
For any given set of values x1, x2, x3, , and xr and thecorresponding values of y, a linear relationship betweenvariables is given by
y = 0 + 1x1 + 2x2 + 3x3 + + rxr
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
21/26
Solving the minimization, the results are normal
equations as follows (prove it!)
y = n0 +1x1 +2 x2x1y =0 x1 +1 x12 +2 x1x2x2y =0 x2 +1 x1x2 +2 x22
Applying the least squares method to obtain estimatesof the coefficients 0, 1, and 2 by minimizing the sumof the squares of the distances from the points to theplane, we minimize
n
[yi (0 + 1x1i + 2x2i)]2i=1
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
22/26
y : the number of twists required to break the alloyx1: the percentage of element Ax2: the percentage of element B
Twisting a forged alloy bar
y x1 x2 y x1 x2
384085
59406068
53
123
4123
4
555
5101010
10
313542
59183429
42
123
4123
4
151515
15202020
20
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
23/26
Calculating the following (n=16)
x1 = 40 x1x2 = 500x
2
= 200 x1
y = 1989
x12 = 120 x2y = 8285x22 = 3000 y = 733
Twisting a forged alloy bar
Substituting to the normal equations gives
733 = 16 0 + 40 1 + 200 21989 = 40 0 + 120 1 + 500 28285 = 200 0 + 500 1 + 3000 2
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
24/26
Solving the simultaneous linear equations gives
0 = 48.2
1
= 7.83
2 = -1.76
Twisting a forged alloy bar
Hence, the multiple regression equation is
y = 48.2 + 7.83x1 - 1.76x2
Using the equation, we can predict the number of twists required to break the forged alloy bar for anygiven pair of values of x1 and x2.
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
25/26
-
7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf
26/26
If we let x3 = x1x2 and3 =12 then we get
y = 0 + 1x1 + 2x2 + 3x3
Equation models that include interaction may also beanalyzed by multiple linear regression method. Aninteraction between two variables can be representedby a cross product term such as
y = 0 + 1x1 + 2x2 + 12x1x2