a comparison of some robust regression techniques - · pdf filea comparison of some robust...
Post on 12-Mar-2018
231 Views
Preview:
TRANSCRIPT
A COMPARISON OF SOME ROBUST A COMPARISON OF SOME ROBUST REGRESSION TECHNIQUESREGRESSION TECHNIQUES
Ezgi AVCI
TSE, Personnel and System Certification Center, TURKEY
Gülser KÖKSAL
METU, Industrial Engineering Department, TURKEY
54th EOQ Congress
Izmir
26-27 October 2010
Outline
� Definition and Purpose of Regression
� Output of the Regression Process
� Regression Process Flow Diagram
� Why alternative Regression Methods?
� Robustness� Robustness
� Outliers
� Robust Regression Methods
� A simulation Study
� A Case Study
� Conclusions
RegressionRegression
� Investigates and models the relationship between the variables
� Application areas:� Application areas:
o Engineering
o Physical sciences
o Life and Biological Sciences
o Social Sciences
Purpose of RegressionPurpose of Regression
� To create an “equation” or “transfer function” from the measurements of the system’s inputs and outputs acquired during a passive or active experiment.
� The transfer function is then used for � The transfer function is then used for
-sensitivity analysis
-optimization for system performance
-tolerancing the system’s components
RegressionRegression
� Industrial applications:
◦ Quality Control and Improvement
ex: ISO 9001-2008 Standard; 8. item: Measurement, Analysis and ImprovementAnalysis and Improvement
◦ Data Mining
Output of RegressionOutput of Regression
� An estimation of the relative strength of the effect of each factor on the response
� An equation that analytically relates the critical parameters to the critical responses
� An estimate of how much of the total variation seen in the data is explained by the equation
Regression Process Flow DiagramRegression Process Flow Diagram
Select the Select the inputs and
Select and Run the
system in the The system to be assessed. (INPUT)
Select the environment in which the data
will be collected.
inputs and outputs to be measured in a passive or active
experiment.
Select and qualify the
measurement systems used to acquire the
data.
system in the prescribed environment and acquire
the data as the inputs vary.
Regression Process Flow DiagramRegression Process Flow Diagram
Inspect the data for
Postulate and build a Test the
Test the predictive
The transfer function that data for
outliers and remove them if root cause justifies their removal.
build a functional relationship between the inputs and the
output.
Test the statistical
adequacy of the functional relationship.
predictive ability of the functional
relationship on the physical system.
function that analytically relates the inputs to the outputs.
(OUTPUT)
Why Alternative Regression Why Alternative Regression Methods?Methods?
� It is not easy to satisfy the assumptions
� Normality Assumption Violation� Normality Assumption Violation
� Outliers !
� Robust Regression
IgnoringIgnoring OutliersOutliers� The Challenger Accident:
Thiokol engineers argued that if the O-rings were colder than 53 °F (12 °C), they did not have enough data to determine whether the joint would seal properly.
The shuttle and external tank did not actually “explode”. Instead they rapidly disintegrated under tremendous aerodynamic forces, since the shuttle was slightly past “Max Q", or maximum aerodynamic pressure.
OutliersOutliers� Defn: The observation that appears to deviate markedly from
the other members of the sample in which it occurs.
Two common waysTwo common ways toto detectdetectoutliersoutliers
1. Regression Diagnostics:
It is hard to detect the multiple outliers
2. Robust Regression:
It is easy to detect the outliers by their large residuals
WhatWhat toto do do withwith OutliersOutliers??
� Delete them ?
� Ignore them?
� Give less weight to them?
� Robust regression methods are a “smooth transition between full acceptance and full rejection of an observation”
� The best rejection procedures are not � The best rejection procedures are not competitive against the best robust procedures.
Robust Regression MethodsRobust Regression Methods
� Least Absolute Value (LAV)
� Huber –M method
� MM method
� Least Median Squares (LMS)� Least Median Squares (LMS)
� Least Trimmed Squares (LTS)
� Multivariate Adaptive Regression Splines (MARS)
� Local Weighted Scatter Plot Smoothing (LOESS)
A Simulation Study:A Simulation Study:
� Simulation has been a commonly used tool
to compare robust regression techniques.
� The seven robust regression methods are The seven robust regression methods are compared by some performance measures with respect to some scenarios.
� The results are discussed and the most promising robust methods are determined.
The The ResultsResults of the of the SimulationSimulation StudyStudy
� The most promising methods:
� OLS
� HUBER-M
� LAV� LAV
� LTS
� These methods are compared on an industrial data set.
Description of the Data SetDescription of the Data Set� Our data is taken from a real life
manufacturing process which includes the sub-processes core, molding, melting, casting, fettling and painting.
� The dependent variable is the percentage of defectives on a percentage of defectives on a cylinder head.
� Missing values are eliminated by the proper methods.
� The basic data set includes 36 independent variables and 92 observations.
CONCLUSIONSCONCLUSIONS� For our real life data we see that there is no significant difference
between the robust methods and the classical OLS method.
� We have explained this situation by complexity of the data and irrelevant variables.
� Moreover, even if the results of the OLS and the robust regression methods are the same; the model fitted by OLS is not valid because it is not applied with normality assumption satisfied.
� As a result, robust methods are the safest way to deal with outliers even if their performances are same with the classical methods since they do not have such strict assumptions.
top related