multivar 2 - simple and multiple regression.pdf

Upload: okisaputra198909

Post on 02-Mar-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    1/26

    Faculty of Engineering

    Gadjah Mada University

    Andi Sudiarso

    Mechanical & Industrial Engineering

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    2/26

    Type of relationship : dependenceNumber of predicted variables : one

    Type of relationship : single

    Measurement scale ofthe dependent variable : metric

    Purpose:

    To predict the changes in the dependent variableas a result of changes in the independentvariables.

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    3/26

    y : cost (in thousand USD, 1 dollar = Rp 10.000,00)x : road length (in mile, 1 mile = 1,6093 km)

    Previous road resurfacing project

    x y

    13457

    68

    101420

    Now, there is a new available project to do resurfacing of 6miles road and there was no 6 miles project done before.The question is: how much the cost will be?

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    4/26

    The problem of fitting a line to the data, i.e. pairs ofnumbers (x,y).

    What is regression?

    To use data on a quantitative independent variableto predict or explain variation in a quantitativedependent variable (Ott, 2001). Prediction refers tofuture values, explanation refers to current or pastvalues; both requires unit of association.

    The problem of predicting one variable (y) fromvalues of another variable (x).

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    5/26

    Linear regression

    y =

    0

    +

    1

    x +

    Polynomial regression

    y =0 +1x +2x2 + (quadratic)y =0 +1x +2x2 +3x3 + (cubic)etc.

    Non-linear regression, for example:

    y =0 +1sin(2x) +y =0 +1e2x +

    Multiple regression

    y =0 +1x1 +2x2 +3x3 + +

    Type of regression analysis

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    6/26

    y = 0 + 1xwhere

    0 : the intercept, the value of y when x = 01 : the slope, the change in y when there is one-unit

    change in x

    Simple linear regression

    x

    y

    0

    11

    y =

    0

    +

    1

    x

    0

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    7/26

    y = 0 + 1x + where

    : random error, deviation of actual y values from theirpredicted values (unpredictable and ignored factors)

    Linear regression (complete form)

    x

    yy =

    0

    +

    1

    x

    0

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    8/26

    n

    (xi x). yii=1

    1

    =n

    (xi x)2

    i=1

    0 = y -1x

    The mean squared error (MSE)

    MSE = [(n-1)sy2 -12(n-1)sx2]/(n-2)

    To estimate the value of parameters

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    9/26

    n

    sx2 = MSx = SSx / v = (xi-x)2/(n-1)

    i=1

    n

    sy2 = MSy = SSy / v = (yi-y)2/(n-1)

    i=1

    n

    sxy = MSxy = SSxy / v = { (xi-x)(yi-y)}/(n-1)i=1

    (sample covariance between x and y)whereMS : the mean square

    SS : sum of the squaresv : number of degree of freedom = n-1(sampling)

    The variance is the measure of dispersionabout the mean

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    10/26

    y : the mean verbal test score for 6th gradersx : a composite measure of socio-economics status

    The Coleman Report (USA, 1977)

    School y x School y x

    1234

    56789

    10

    37.0126.5136.5140.70

    37.1033.9041.8033.4041.0137.20

    7.20-11.7112.3214.28

    6.316.16

    12.70-0.179.85

    -0.05

    11121314

    151617181920

    23.3035.2034.9033.10

    22.7039.7031.8031.7043.1041.01

    -12.860.924.77

    -0.96

    -16.0410.622.66

    -10.9915.0312.77

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    11/26

    Plot of y versus x (scatter plot)

    The Coleman Report (USA, 1977)

    20

    25

    30

    35

    40

    45

    -20 -15 -10 -5 0 5 10 15 20

    x

    y

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    12/26

    Estimate the parameters

    n=20, x = 3.14, sx2

    = 92.65y = 35.08, sy2 = 33.84

    xiyi = 3189.88i=1-n

    The Coleman Report (USA, 1977)

    1 = 3189.88-20(3.14)(35.08)/[(20-1)(92.65)] = 0.560 = 35.08-0.56(3.14) = 33.32

    MSE = [(20-1)(33.84)-(0.56)2(20-1)(92.65)]/(20-2)= 5.01

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    13/26

    Linear regression

    The Coleman Report (USA, 1977)

    20

    25

    30

    35

    40

    45

    -20 -15 -10 -5 0 5 10 15 20

    x

    y

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    14/26

    Taking natural logarithmic on both sides:

    ln T = ln c + b ln V

    Lets define:

    y = ln T

    0 = ln c y = 0 + 1x1 = bx = ln V

    To calculate c and b, we can calculate 0 and 1 first.

    Original equation:

    T = c.Vb

    Calculate the constants c and b!

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    15/26

    The correlation coefficient r is a positive number ify tends to increase as x increases; r is negative if ytends to decrease as x increase; r is zero if there iseither no relation between changes in x and

    changes in y or there is a nonlinear relation.

    A measure of the linear relationship between twovariables

    What is correlation?

    The sample correlation coefficient (r), -1 r 1,related to the estimated slope

    r = sxy/sxsy = sxy/(sx2sy2)=1sx/sy

    Measurement of the strength of linear relationbetween x and y.

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    16/26

    Calculate the correlation coefficient r!

    Consider the following data

    No. y x

    12

    34567

    89

    2541

    4759545649

    4330

    1020

    2030303040

    4050

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    17/26

    x = 30.00y = 44.89

    sx2 = (10-30.00)2 + = 1,200/8

    sy2

    = (25-44.89)2

    + = 1,062.89/8sxy = (10-30.00)(25-44.89) + = 140/8

    r = 140 / [(1,200)0.5(1,062.89)0.5] = 0.1240

    The correlation coefficient r is a small positive number.

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    18/26

    An example is the effect of factory production,consumption level, and stocks in the storage on the

    price of a product.

    The problem of fitting more than one independentvariable to a dependent variable.

    What is multiple regression?

    The surfaces obtained are not used only to makepredictions, but also often used for purposes ofoptimization, i.e. to determine the values of

    independent variables when the dependent variableis maximum or minimum.

    The problem of predicting the dependent variable (y)from values of the independent variables (x1, x2, ).

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    19/26

    Linear regression

    y =0 +1x + Polynomial regression

    y =0 +1x +2x2 + (quadratic)y =0 +1x +2x2 +3x3 + (cubic)etc.

    Non-linear regression, for example:

    y =0 +1sin(2x) +y =0 +1e2x +

    Multiple linear) regression

    y =

    0

    +

    1

    x

    1

    +

    2

    x

    2

    +

    3

    x

    3

    +

    +

    Type of regression analysis

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    20/26

    For two independent variables, this is a problem offitting a plane to a set of n points with coordinates

    (x1i, x2i, yi), for i=1 to n. The equation is

    y = 0 + 1x1 + 2x2

    For any given set of values x1, x2, x3, , and xr and thecorresponding values of y, a linear relationship betweenvariables is given by

    y = 0 + 1x1 + 2x2 + 3x3 + + rxr

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    21/26

    Solving the minimization, the results are normal

    equations as follows (prove it!)

    y = n0 +1x1 +2 x2x1y =0 x1 +1 x12 +2 x1x2x2y =0 x2 +1 x1x2 +2 x22

    Applying the least squares method to obtain estimatesof the coefficients 0, 1, and 2 by minimizing the sumof the squares of the distances from the points to theplane, we minimize

    n

    [yi (0 + 1x1i + 2x2i)]2i=1

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    22/26

    y : the number of twists required to break the alloyx1: the percentage of element Ax2: the percentage of element B

    Twisting a forged alloy bar

    y x1 x2 y x1 x2

    384085

    59406068

    53

    123

    4123

    4

    555

    5101010

    10

    313542

    59183429

    42

    123

    4123

    4

    151515

    15202020

    20

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    23/26

    Calculating the following (n=16)

    x1 = 40 x1x2 = 500x

    2

    = 200 x1

    y = 1989

    x12 = 120 x2y = 8285x22 = 3000 y = 733

    Twisting a forged alloy bar

    Substituting to the normal equations gives

    733 = 16 0 + 40 1 + 200 21989 = 40 0 + 120 1 + 500 28285 = 200 0 + 500 1 + 3000 2

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    24/26

    Solving the simultaneous linear equations gives

    0 = 48.2

    1

    = 7.83

    2 = -1.76

    Twisting a forged alloy bar

    Hence, the multiple regression equation is

    y = 48.2 + 7.83x1 - 1.76x2

    Using the equation, we can predict the number of twists required to break the forged alloy bar for anygiven pair of values of x1 and x2.

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    25/26

  • 7/26/2019 Multivar 2 - Simple and Multiple Regression.pdf

    26/26

    If we let x3 = x1x2 and3 =12 then we get

    y = 0 + 1x1 + 2x2 + 3x3

    Equation models that include interaction may also beanalyzed by multiple linear regression method. Aninteraction between two variables can be representedby a cross product term such as

    y = 0 + 1x1 + 2x2 + 12x1x2