chapter 12: linear regression 1. introduction regression analysis and analysis of variance are the...
TRANSCRIPT
![Page 1: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/1.jpg)
Chapter 12:
Linear Regression
1
![Page 2: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/2.jpg)
Introduction
• Regression analysis and Analysis of variance are the two most widely used statistical procedures.
• Regression analysis:– Description– Prediction– Estimation
2
![Page 3: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/3.jpg)
12.1 Simple Linear Regression
• In (univariate) regression, there is always a single “dependent” variable, and one or more “independent” variables. – Number of non-conforming units is dependent on the amount
of time devoted to maintain control charts• Simple is used to denote the fact that a single
independent variable is being used.• Linear is referred to the parameters, not independent
variables.
3
(12.1)
(12.2)
![Page 4: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/4.jpg)
12.1 Simple Linear Regression
• is the general form of the equation for a straight line.• indicates that there is not an exact relationship between X
and Y.• Regression analysis is not used for variables that have an
exact linear relationship. • and are generally unknown and must be estimated.• The is generally thought as an error term.• Let Y denotes the number of non-conforming units
produced in each month, and X represents the amount of time devoted to use QC charts each month.
4
![Page 5: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/5.jpg)
Table 12.1 Quality Improvement Data
5
Month Time Devoted to Quality Impr.
# of Non-conforming
January 56 20February 58 19March 55 20April 62 16May 63 15June 68 14July 66 15August 68 13September 70 10October 67 13November 72 9December 64 8
![Page 6: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/6.jpg)
Figure 12.1 Scatter Plot
6
![Page 7: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/7.jpg)
Figure 12.1a Scatter Plot
7
![Page 8: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/8.jpg)
12.1 Simple Linear Regression
• Regression equation: a line through the center of the points minimizing the sum of the squares of the deviations from each point to the line. (Method of least squares)
• is to be minimized where
• Round-off error• Prediction equation
8
![Page 9: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/9.jpg)
12.1 Simple Linear Regression
The regression equation isY = 55.9 - 0.641 X
Predictor Coef SE Coef T PConstant 55.923 2.824 19.80 0.000X -0.64067 0.04332 -14.79 0.000
S = 0.888854 R-Sq = 95.6% R-Sq(adj) = 95.2%
Analysis of Variance
Source DF SS MS F PRegression 1 172.77 172.77 218.67 0.000Residual Error 10 7.90 0.79Total 11 180.67
9
![Page 10: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/10.jpg)
12.1 Simple Linear Regression
• Prediction equation: should only be used for values within the data range, or slightly outside the interval.
• Descriptive:– A decrease of 0.64 non-conforming units for every additional hour
devoted to quality improvement
10
![Page 11: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/11.jpg)
12.2 Worth of the Prediction Equation
11
Obs X Y Fit SE Fit Residual St Resid1 56.0 20.000 20.046 0.464 -0.046 -0.062 58.0 19.000 18.765 0.395 0.235 0.303 55.0 20.000 20.687 0.500 -0.687 -0.934 62.0 16.000 16.202 0.286 -0.202 -0.245 63.0 15.000 15.561 0.270 -0.561 -0.666 68.0 14.000 12.358 0.289 1.642 1.957 66.0 15.000 13.639 0.261 1.361 1.608 68.0 13.000 12.358 0.289 0.642 0.769 70.0 10.000 11.077 0.338 -1.077 -1.31
10 67.0 13.000 12.999 0.272 0.001 0.0011 72.0 9.000 9.795 0.400 -0.795 -1.0012 74.0 8.000 8.514 0.470 -0.514 -0.68
![Page 12: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/12.jpg)
12.2 Worth of the Prediction Equation
12
• Pure error: data points with the same X but different Y’s constitute pure error since regression line can’t be vertical.
• Measure of the worth of the prediction equation:
• Since , (
(
• If =0 (no relationship between X and Y), =0
(12.4)
![Page 13: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/13.jpg)
12.3 Assumptions
13
• The true relationship between X and Y can be adequately represented by the model
• The errors should be independent.• The errors are approximately normally distributed
Y = 𝛽0 + 1X + (12.1)
![Page 14: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/14.jpg)
12.4 Checking Assumptions through Residual Plots
14
• The residuals should be plotted against– X or – Time– Any other variable
• Residual plots– All points close to the midline– Form a tight cluster that can be enclosed in a rectangle
• If there were residual outliers, investigate• If the error variance increases or decreases, this
problem can be remedied by a transformation of X.• If in the form of parabola, X2 term would probably
needed.
![Page 15: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/15.jpg)
12.4 Checking Assumptions through Residual Plots
15
![Page 16: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/16.jpg)
12.5 Confidence Intervals
16
• Assumption: Normality of the error terms– Robust regression– Non-parametric regression
• Confidence Interval for
• Confidence Interval for
Where
![Page 17: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/17.jpg)
12.5 Hypothesis Test
17
• Hypothesis Test for
Where and
![Page 18: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/18.jpg)
12.6 Prediction Interval for Y
18
Where and
![Page 19: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/19.jpg)
12.6 Prediction Interval for Y
19
![Page 20: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/20.jpg)
12.7 Regression Control Chart
20
• To monitor the dependent variable using a control chart approach
• The center line is
• Control Limits for
Where and
(12.5)
(12.6)
![Page 21: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/21.jpg)
12.8 Cause-Selecting Control Chart
21
• The general idea is to try to distinguish between quality problems that occur at one stage in a process from problems that occur at a previous processing step.
• Let Y be the output from the second step and let X denote the output from the first step. The relationship between X and Y would be modeled.
![Page 22: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/22.jpg)
12.9 Linear, Nonlinear, and Nonparametric Profiles
22
• Profile refers to the quality of a process or product being characterized by a (Linear, Nonlinear, or Nonparametric) relationship between a response variable and one or more explanatory variables.
• A possible way is to monitor each parameter in the model with a Shewhart chart.– The independent variables must be fixed– Control chart for R2
![Page 23: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/23.jpg)
12.10 Inverse Regression
23
• An important application of simple linear regression for quality improvement is in the area of calibration.
• Assume two measuring tools are available – One is quite accurate but expensive to use and the other is not as expensive but also not as accurate. If the measurements obtained from the two devices are highly correlated, then the measurement that would have been made using the expensive measuring device could be predicted fairly from the measurement using the less expensive device.
• Let Y = measurement from the less expensive deviceX = measurement from the accurate device
![Page 24: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/24.jpg)
12.10 Inverse Regression
24
Classical estimation approach• First, regress Y on X, to obtain • Solve for X, • For a known value of Y, , the equation is
Inverse regression (X is regressed on Y)
• if X and Y were perfectly correlated
![Page 25: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/25.jpg)
12.10 Inverse RegressionExample
25
Classical estimation approach• First, regress Y on X, to obtain
Inverse regression (X is regressed on Y)
• At Y X
Y X2.3 2.42.5 2.62.4 2.52.8 2.92.9 3.02.6 2.72.4 2.52.2 2.32.1 2.22.7 2.7
![Page 26: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/26.jpg)
12.11 Multiple Linear Regression
• In multiple regression, there is more than one “independent” variable.
26
![Page 27: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/27.jpg)
12.12 Issues in Multiple Regression12.12.1 Variable Selection
• R2 will virtually always increase when additional variables are added to a prediction equation.
• increases when new regressors are added• A commonly used statistic for determining the number
of parameters is the Cp
Where p is the number of parameters in the modelSSEp is the residual sum of squares is the error variance using all the available regressors
• The idea is to look hard at those prediction equations for which Cp is small and close to p.
27
![Page 28: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/28.jpg)
12.12.3 Multicollinear Data
• Problems occur when at least two of the regressors are related in some manner.
• Solutions:– Discard one or more variables causing the multicollinearity– Use ridge regression
28
![Page 29: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/29.jpg)
12.12.4 Residual Plots
• Residual plots are used extensively in multiple regression for checking on the model assumptions
• The residuals should generally be plotted against , each of the regressors, time, and any potential regressor.
29
![Page 30: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression](https://reader035.vdocument.in/reader035/viewer/2022062222/56649ea05503460f94ba3535/html5/thumbnails/30.jpg)
12.12.6 Transformations
• A regression model can often be improved by transforming one or more of the regressors, and possibly the dependent variable as well.
• Transformation can also often be used to transform a nonlinear regression model into a linear one.
• For example, can be transformed into a linear model
30