10 - regression 1

Upload: ruchit2809

Post on 03-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 10 - Regression 1

    1/58

    Simple Linear Regression

    Simple Linear Regression Model

    Least Squares Method

    Coefficient of Determination

    Model Assumptions

    Testing for Significance

    Using the Estimated Regression Equation

    for Estimation and Prediction

    Residual Analysis: Validating Model Assumptions

    Outliers and Influential Observations

  • 7/29/2019 10 - Regression 1

    2/58

    Simple Linear Regression

    Regression analysis can be used to develop anequation showing how the variables are related.

    Managerial decisions often are based on therelationship between two or more variables.

    The variables being used to predict the value of thedependent variable are called the independent

    variables and are denoted by x.

    The variable being predicted is called the dependentvariable and is denoted by y.

  • 7/29/2019 10 - Regression 1

    3/58

    Simple Linear Regression

    The relationship between the two variables isapproximated by a straight line.

    Simple linear regression involves one independentvariable and one dependent variable.

    Regression analysis involving two or moreindependent variables is called multiple regression.

  • 7/29/2019 10 - Regression 1

    4/58

    Simple Linear Regression Model

    y =b0 +b1x +e

    where:

    b0 andb1 are called parameters of the model,

    e is a random variable called the error term.

    The simple linear regression model is:

    The equation that describes how y is related to x andan error term is called the regression model.

  • 7/29/2019 10 - Regression 1

    5/58

    Simple Linear Regression Equation

    The simple linear regression equation is:

    E(y) is the expected value of y for a given x value.

    b1 is the slope of the regression line.

    b0 is the y intercept of the regression line.

    Graph of the regression equation is a straight line.

    E(y) =b0 +b1x

  • 7/29/2019 10 - Regression 1

    6/58

    Simple Linear Regression Equation

    Positive Linear Relationship

    E(y)

    x

    Slopeb1is positive

    Regression line

    Interceptb0

  • 7/29/2019 10 - Regression 1

    7/58

    Simple Linear Regression Equation

    Negative Linear Relationship

    E(y)

    x

    Slopeb1is negative

    Regression lineIntercept

    b0

  • 7/29/2019 10 - Regression 1

    8/58

    Simple Linear Regression Equation

    No Relationship

    E(y)

    x

    Slopeb1is 0

    Regression line

    Interceptb0

  • 7/29/2019 10 - Regression 1

    9/58

    Estimated Simple Linear Regression Equation

    The estimated simple linear regression equation

    0 1y b b x

    is the estimated value of y for a given x value.y b1 is the slope of the line. b0 is the y intercept of the line. The graph is called the estimated regression line.

  • 7/29/2019 10 - Regression 1

    10/58

    Estimation Process

    Regression Modely =b0 +b1x +e

    Regression EquationE(y) =b0 +b1x

    Unknown Parametersb0,b1

    Sample Data:

    x yx1 y1. .

    . .xn yn

    b0

    and b1

    provide estimates ofb0 andb1

    EstimatedRegression Equation

    Sample Statisticsb0, b1

    0 1y b b x

  • 7/29/2019 10 - Regression 1

    11/58

    Least Squares Method

    Least Squares Criterion

    min (y yi i )2

    where:

    yi = observed value of the dependent variablefor the ith observation

    ^yi = estimated value of the dependent variable

    for the ith observation

  • 7/29/2019 10 - Regression 1

    12/58

    Slope for the Estimated Regression Equation

    1 2

    ( )( )

    ( )

    i i

    i

    x x y yb

    x x

    Least Squares Method

    where:xi = value of independent variable for ith

    observation

    _y = mean value for dependent variable

    _x = mean value for independent variable

    yi = value of dependent variable for ith

    observation

  • 7/29/2019 10 - Regression 1

    13/58

    y-Intercept for the Estimated Regression Equation

    Least Squares Method

    0 1b y b x

  • 7/29/2019 10 - Regression 1

    14/58

    Reed Auto periodically has

    a special week-long sale.

    As part of the advertising

    campaign Reed runs one ormore television commercials

    during the weekend preceding the sale. Data from a

    sample of 5 previous sales are shown on the next slide.

    Simple Linear Regression

    Example: Reed Auto Sales

  • 7/29/2019 10 - Regression 1

    15/58

    Simple Linear Regression

    Example: Reed Auto Sales

    Number ofTV Ads (x)

    Number ofCars Sold (y)

    1

    3213

    14

    24181727

    Sx = 10 Sy = 1002x 20y

  • 7/29/2019 10 - Regression 1

    16/58

    Estimated Regression Equation

    10 5y x

    1 2

    ( )( ) 205

    ( ) 4i i

    i

    x x y yb

    x x

    0 1 20 5(2) 10b y b x

    Slope for the Estimated Regression Equation

    y-Intercept for the Estimated Regression Equation

    Estimated Regression Equation

  • 7/29/2019 10 - Regression 1

    17/58

    Scatter Diagram and Trend Line

    y = 5x + 10

    0

    5

    10

    15

    20

    25

    30

    0 1 2 3 4TV Ads

    Cars

    Sold

  • 7/29/2019 10 - Regression 1

    18/58

    Coefficient of Determination

    Relationship Among SST, SSR, SSE

    where:

    SST = total sum of squares

    SSR = sum of squares due to regression

    SSE = sum of squares due to error

    SST = SSR + SSE

    2( )iy y2

    ( )iy y 2

    ( )i iy y

  • 7/29/2019 10 - Regression 1

    19/58

    The coefficient of determination is:

    Coefficient of Determination

    where:

    SSR = sum of squares due to regressionSST = total sum of squares

    r2 = SSR/SST

  • 7/29/2019 10 - Regression 1

    20/58

    Coefficient of Determination

    r2 = SSR/SST = 100/114 = .8772

    The regression relationship is very strong; 87.7%

    of the variability in the number of cars sold can be

    explained by the linear relationship between thenumber of TV ads and the number of cars sold.

  • 7/29/2019 10 - Regression 1

    21/58

    Sample Correlation Coefficient

    2

    1 )of(sign rbrxy

    ionDeterminatoftCoefficien)of(sign 1brxy

    where: b1 = the slope of the estimated regression

    equation xbby 10

  • 7/29/2019 10 - Regression 1

    22/58

    21 )of(sign rbrxy

    The sign of b1 in the equation is +. 10 5y x

    = + .8772xyr

    Sample Correlation Coefficient

    rxy = +.9366

  • 7/29/2019 10 - Regression 1

    23/58

    Assumptions About the Error Term e

    1. The error e is a random variable with mean of zero.

    2. The variance of e, denoted by 2, is the same forall values of the independent variable.

    3. The values of e are independent.

    4. The error e is a normally distributed randomvariable.

  • 7/29/2019 10 - Regression 1

    24/58

    Testing for Significance

    To test for a significant regression relationship, wemust conduct a hypothesis test to determine whetherthe value ofb1 is zero.

    Two tests are commonly used:t Test and FTest

    Both the t test and Ftest require an estimate of 2,the variance of e in the regression model.

  • 7/29/2019 10 - Regression 1

    25/58

    An Estimate of 2

    Testing for Significance

    2

    10

    2)()(SSE iiii xbbyyy

    where:

    s2 = MSE = SSE/(n 2)

    The mean square error (MSE) provides the estimate

    of 2, and the notation s2 is also used.

  • 7/29/2019 10 - Regression 1

    26/58

    Testing for Significance

    An Estimate of

    2

    SSEMSE

    ns

    To estimate we take the square root of 2.

    The resulting s is called the standard error ofthe estimate.

  • 7/29/2019 10 - Regression 1

    27/58

    Hypotheses

    Test Statistic

    Testing for Significance: t Test

    0 1: 0H b

    1: 0aH b

    1

    1

    b

    bt

    s where

    1 2

    ( )

    b

    i

    ss

    x x

    S

  • 7/29/2019 10 - Regression 1

    28/58

    Rejection Rule

    Testing for Significance: t Test

    where:

    tis based on a t distribution

    with n - 2 degrees of freedom

    Reject H0 ifp-value < or t < -t or t > t

  • 7/29/2019 10 - Regression 1

    29/58

    1. Determine the hypotheses.

    2. Specify the level of significance.

    3. Select the test statistic.

    = .05

    4. State the rejection rule. Reject H0 ifp-value < .05or |t| > 3.182 (with3 degrees of freedom)

    Testing for Significance: t Test

    0 1: 0H b 1: 0aH b

    1

    1

    b

    bt

    s

  • 7/29/2019 10 - Regression 1

    30/58

    Testing for Significance: t Test

    5. Compute the value of the test statistic.

    6. Determine whether to reject H0.

    t = 4.541 provides an area of .01 in the uppertail. Hence, thep-value is less than .02. (Also,t = 4.63 > 3.182.) We can reject H0.

    1

    1 5 4.631.08b

    bt

    s

  • 7/29/2019 10 - Regression 1

    31/58

    Confidence Interval forb1

    H0 is rejected if the hypothesized value of b1 is notincluded in the confidence interval for b1.

    We can use a 95% confidence interval forb1 to test

    the hypotheses just used in the t test.

  • 7/29/2019 10 - Regression 1

    32/58

    The form of a confidence interval forb1 is:

    Confidence Interval forb1

    11 /2 bb t s

    where is the t value providing an area

    of /2 in the upper tail of a t distributionwith n - 2 degrees of freedom

    2/t

    b1 is thepoint

    estimator

    is themarginof error

    1/2 bt s

  • 7/29/2019 10 - Regression 1

    33/58

    Confidence Interval forb1

    Reject H0 if 0 is not included in

    the confidence interval for b1.

    0 is not included in the confidence interval.Reject H0

    = 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb or 1.56 to 8.44

    Rejection Rule

    95% Confidence Interval forb1

    Conclusion

  • 7/29/2019 10 - Regression 1

    34/58

    Hypotheses

    Test Statistic

    Testing for Significance: F Test

    F= MSR/MSE

    0 1: 0H b

    1: 0aH b

  • 7/29/2019 10 - Regression 1

    35/58

    Rejection Rule

    Testing for Significance: F Test

    where:Fis based on an Fdistribution with

    1 degree of freedom in the numerator and

    n - 2 degrees of freedom in the denominator

    Reject H0 ifp-value <

    or F> F

  • 7/29/2019 10 - Regression 1

    36/58

    1. Determine the hypotheses.

    2. Specify the level of significance.

    3. Select the test statistic.

    = .05

    4. State the rejection rule. Reject H0 ifp-value < .05

    or F> 10.13 (with 1 d.f.in numerator and3 d.f. in denominator)

    Testing for Significance: F Test

    0 1: 0H b

    1: 0aH b

    F= MSR/MSE

  • 7/29/2019 10 - Regression 1

    37/58

    Testing for Significance: F Test

    5. Compute the value of the test statistic.

    6. Determine whether to reject H0.

    F= 17.44 provides an area of .025 in the uppertail. Thus, thep-value corresponding to F= 21.43is less than 2(.025) = .05. Hence, we reject H0.

    F= MSR/MSE = 100/4.667 = 21.43

    The statistical evidence is sufficient to conclude

    that we have a significant relationship between thenumber of TV ads aired and the number of cars sold.

    Some Cautions about the

  • 7/29/2019 10 - Regression 1

    38/58

    Some Cautions about theInterpretation of Significance Tests

    Just because we are able to reject H0:b

    1= 0 and

    demonstrate statistical significance does not enableus to conclude that there is a linear relationshipbetween x and y.

    Rejecting H0:b1 = 0 and concluding that the

    relationship between x and y is significant doesnot enable us to conclude that a cause-and-effectrelationship is present between x and y.

    Using the Estimated Regression Equation

  • 7/29/2019 10 - Regression 1

    39/58

    Using the Estimated Regression Equationfor Estimation and Prediction

    / y t sp yp 2

    where:

    confidence coefficient is 1 - and

    t/2 is based on a t distribution

    with n - 2 degrees of freedom

    /2 indpy t s

    Confidence Interval Estimate of E(yp)

    Prediction Interval Estimate of yp

  • 7/29/2019 10 - Regression 1

    40/58

    If 3 TV ads are run prior to a sale, we expect

    the mean number of cars sold to be:

    Point Estimation

    y = 10 + 5(3) = 25 cars

    C fid I l f E( )

  • 7/29/2019 10 - Regression 1

    41/58

    2

    2

    ( )1

    ( )pp

    y

    i

    x xs s

    n x x

    Estimate of the Standard Deviation of py

    Confidence Interval for E(yp)

    2

    2 2 2 2 2

    (3 2)12.16025

    5 (1 2) (3 2) (2 2) (1 2) (3 2)pys

    1 12.16025 1.44915 4pys

    C fid I l f E( )

  • 7/29/2019 10 - Regression 1

    42/58

    The 95% confidence interval estimate of the mean

    number of cars sold when 3 TV ads are run is:

    Confidence Interval for E(yp)

    25 + 4.61

    / y t sp yp 2

    25 + 3.1824(1.4491)

    20.39 to 29.61 cars

    P di i I l f

  • 7/29/2019 10 - Regression 1

    43/58

    2

    ind 2

    ( )11

    ( )

    p

    i

    x xs s

    n x x

    Estimate of the Standard Deviation

    of an Individual Value of yp

    1 12.16025 1

    5 4pys

    2.16025(1.20416) 2.6013pys

    Prediction Interval for yp

    P di ti I t l f

  • 7/29/2019 10 - Regression 1

    44/58

    The 95% prediction interval estimate of the numberof cars sold in one particular week when 3 TV adsare run is:

    Prediction Interval for yp

    25 + 8.28

    25 + 3.1824(2.6013)

    /2 indpy t s

    16.72 to 33.28 cars

    R id l A l i

  • 7/29/2019 10 - Regression 1

    45/58

    Residual Analysis

    i iy y

    Much of the residual analysis is based on anexamination of graphical plots.

    Residual for Observation i

    The residuals provide the best information about e.

    If the assumptions about the error term e appear

    questionable, the hypothesis tests about thesignificance of the regression relationship and theinterval estimation results may not be valid.

    R id l Pl t A i t

  • 7/29/2019 10 - Regression 1

    46/58

    Residual Plot Against x

    If the assumption that the variance of e is the same

    for all values of x is valid, and the assumedregression model is an adequate representation of therelationship between the variables, then

    The residual plot should give an overall

    impression of a horizontal band of points

    R id l Pl t A i t

  • 7/29/2019 10 - Regression 1

    47/58

    x

    y y

    0

    Good Pattern

    Resid

    ual

    Residual Plot Against x

    R id l Pl t A i t

  • 7/29/2019 10 - Regression 1

    48/58

    Residual Plot Against x

    x

    y y

    0

    Resid

    ual

    Nonconstant Variance

    R id l Pl t A i t

  • 7/29/2019 10 - Regression 1

    49/58

    Residual Plot Against x

    x

    y y

    0

    Resid

    ual

    Model Form Not Adequate

    Residual Plot Against x

  • 7/29/2019 10 - Regression 1

    50/58

    Residuals

    Residual Plot Against x

    Observation Predicted Cars Sold Residuals

    1 15 -1

    2 25 -1

    3 20 -2

    4 15 2

    5 25 2

    Residual Plot Against x

  • 7/29/2019 10 - Regression 1

    51/58

    Residual Plot Against x

    TV Ads Residual Plot

    -3

    -2

    -1

    0

    1

    2

    3

    0 1 2 3 4TV Ads

    Re

    siduals

    Standardized Residuals

  • 7/29/2019 10 - Regression 1

    52/58

    Standardized Residual for Observation i

    Standardized Residuals

    i i

    i i

    y y

    y y

    s

    1

    i i iy ys s h

    2

    2

    ( )1

    ( )i

    i

    i

    x xh n x x

    where:

    Standardized Residual Plot

  • 7/29/2019 10 - Regression 1

    53/58

    Standardized Residual Plot

    The standardized residual plot can provide insight

    about the assumption that the error term e has anormal distribution.

    If this assumption is satisfied, the distribution of thestandardized residuals should appear to come from a

    standard normal probability distribution.

    Standardized Residual Plot

  • 7/29/2019 10 - Regression 1

    54/58

    Standardized Residuals

    Standardized Residual Plot

    Observation Predicted Y Residuals Standard Residuals1 15 -1 -0.5352 25 -1 -0.5353 20 -2 -1.0694 15 2 1.0695 25 2 1.069

    Standardized Residual Plot

  • 7/29/2019 10 - Regression 1

    55/58

    Standardized Residual Plot

    Standardized Residual Plot

    A B C D

    28

    29 RESIDUAL OUTPUT

    30

    31 Observation Predicted Y Residuals dard Resid32 1 15 -1 -0.534522

    33 2 25 -1 -0.534522

    34 3 20 -2 -1.069045

    354 15 2 1.069045

    36 5 25 2 1.069045

    37

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    0 10 20 30

    Cars Sold

    StandardR

    esidual

    Standardized Residual Plot

  • 7/29/2019 10 - Regression 1

    56/58

    Standardized Residual Plot

    All of the standardized residuals are between 1.5

    and +1.5 indicating that there is no reason to questionthe assumption that e has a normal distribution.

    Outliers and Influential Observations

  • 7/29/2019 10 - Regression 1

    57/58

    Outliers and Influential Observations

    Detecting Outliers

    An outlier is an observation that is unusual incomparison with the other data.

    Minitab classifies an observation as an outlier if itsstandardized residual value is < -2 or > +2.

    This standardized residual rule sometimes fails toidentify an unusually large observation as beingan outlier.

    This rules shortcoming can be circumvented byusing studentized deleted residuals.

    The |i th studentized deleted residual| will belarger than the |i th standardized residual|.

  • 7/29/2019 10 - Regression 1

    58/58