ivt 2016 june - stat tools binary data
TRANSCRIPT
Statistical Toolsfor Binary Qualitative DataHOW TO ANALYZE BINARY DATA FORQUALITY ENGINEERING AND VALIDATION
RAUL SOTO, MSC, CQEIVT STATS CONFERENCE - JUNE 2016
PHILADELPHIA, PA
The contents of this presentation represent the opinion of the speaker; and not necessarily that of his present or past employers.
2(C) 2016 / RAUL SOTO
About the Author• 20+ years of experience in the medical devices, pharmaceutical, biotechnology, and consumer electronics industries
• MS Biotechnology, emphasis in Biomedical Engineering• BS Mechanical Engineering• ASQ Certified Quality Engineer (CQE)
• I have led validation / qualification efforts in multiple scenarios:
• High-speed, high-volume automated manufacturing and packaging equipment; machine vision systems• Laboratory information systems and instruments• Enterprise resource planning applications (i.e. SAP)• IT network infrastructure, Cognos & Business Objects reports• Manufacturing Execution Systems (MES)• Mobile apps• Product improvements, material changes, vendor changes
• Contact information:• Raul Soto [email protected]
(c) 2016 / Raul Soto 3
What this talk is about
• Introduce and describe the main statistical tools used for binary qualitative (attributes) data
• Understand basic concepts, underlying assumptions, and limitations
• Understand why we can’t just plug in numbers into Minitab without knowing the fundamental assumptions (“the fine print”)
• Can’t teach two semesters of Statistics in 90minutes …
(c) 2016 / Raul Soto 4
Binary Data
• Attributes / qualitative data with only two possible outcomes
• Pass / Fail• Go / No Go• Good / Bad• Presence / No Presence• Heads / Tails in a coin flip
• Can be modeled using the binomial distribution
(c) 2016 / Raul Soto 5
Binomial Distribution• There are 4 conditions / assumptions that MUST hold to use these tools
• Only two possible outcomes
• Fixed number of trials, n • if you add samples later, the binomial distribution no longer models your process
• Independent trials (one trial has no effect on another)
• Probability of each outcome stays the same from trial to trial• sample size < 1/10 population size
(c) 2016 / Raul Soto 6
Tools that will be presented
• Process Capability : Binary Intervals
• Comparisons: Fisher’s Exact Test
• Paired Comparisons: McNemar’s Test
• Modeling: Binary Logistic Regression
(c) 2016 / Raul Soto 7
Process Capability: Binary Intervals
(c) 2016 / Raul Soto 8
Process Capability: Binary Intervals
• Process capability: quantify how well we are meeting our specifications
• For quantitative data, we typically use Cpk / Ppk.
• We can express process capability for a binary process (pass/fail, good/bad)• In terms of % defective ≤ specification (i.e. 5% or less)• In terms of probability of success ≥ specification (i.e. 99% or better)
(c) 2016 / Raul Soto 9
Process Capability: Binary Intervals
• Binary intervals allow us to make claims such as:
• We are 95% confident that the lot percent defective is ≤ x%.
• We are 99% confident that the probability of success is ≥ x%.
• Confidence: 99% for validation, 95% for regular production (recommended)
(c) 2016 / Raul Soto 10
Process Capability: Binary Intervals
Example:
• We want to determine if the laser drilling process for our medical device is capable.
• We drill 600 units, inspect them, and obtain 1 unit with Class 1 visual attributes defects.
• Our specification for Class 1 defects is 1% or less (99% success or better)
• Can we claim with 95% confidence that our process is capable?
(c) 2016 / Raul Soto 11
Process Capability: Binary Intervals
(c) 2016 / Raul Soto 12
Process Capability: Binary Intervals
• We are 95% confident that our probability of success is ≥ 99.992% (1 – 0.000085).
• We are 95% confident that our percent defective is ≤ 1% (0.0085%) for Class 1 visual attributes defects.
• Process is capable
(c) 2016 / Raul Soto 13
Process Capability: Binary Intervals
• What if we found 15 Class 1 visual attributes defects in those 600 units?
• We are 95% confident that our probability of success is ≥ 98.5% (1-0.015472)• We are 95% confident that our percent defective is ≤ 1.5% for Class 1 visual
attributes defects.
• Can’t claim the process is capable.
(c) 2016 / Raul Soto 14
Comparing Two Groups: Fisher’s Exact Test
(c) 2016 / Raul Soto 15
Comparing Two Groups: Fisher’s Exact Test
• For quantitative data, to compare the means of two data sets we use hypothesis tests (i.e. 2-sample t-test)
• For binary data, we compare proportions (or probability of success), not means
• Is the proportion for group A equal to the proportion for group B?
• To answer this question we can use Fisher’s Exact Test
(c) 2016 / Raul Soto 16
Comparing Two Groups: Fisher’s Exact Test
• Compare binary attribute defects for two groups (two data sets)
• raw material vendor A vs B• Line / shift / plant A vs B• Method A vs B• Machine tooling A vs B• Before vs after a process or equipment change
(c) 2016 / Raul Soto 17
Comparing Two Groups: Fisher’s Exact Test
• One-sided vs Two-sided comparisons
• Two-sided: are the % defective (or probability of success) equal for both groups equal?
• One-sided: is the % defective (or probability of success) for group A smaller / greater than for group B?
(c) 2016 / Raul Soto 18
Comparing Two Groups: Fisher’s Exact Test
Example:
• The manufacturing of product Green was transferred from our P site to our Q site. We would like to compare the % defective for visual attributes in both sites.
• In site P, a total of 2699 units were inspected, 25 were found defective• In site Q, a total of 3574 units were inspected, 38 were found defective
• Can we claim, with 95% confidence, that the % defective for Site Q is equal or lower than for site P?
(c) 2016 / Raul Soto 19
Comparing Two Groups: Fisher’s Exact Test
One-sided or two-sided(c) 2016 / Raul Soto 20
Comparing Two Groups: Fisher’s Exact Test
• Use the Fisher’s Exact Test result• H0: p(P site) = p(Q site)• Ha: p(P site) < p(Q site)
• Fisher’s p-value => probability H0 is true• If Fisher’s p-value > 0.05, do not reject H0
• With a p-value of 0.746, we are 95% confident that there is no statistical difference between the visual attributes % defectives for Plants P and Q.
The shaded area is a t-Test approximation of the binomial distribution. This was used frequently back when there was no software that could perform Fisher’s exact test, and approximations had to be used. Only accurate for large sample sizes and probability of success between 20 – 80%
(c) 2016 / Raul Soto 21
Comparing Two Groups: Fisher’s Exact Test
Example:
• The vendor for raw material G, used to make Product Green, goes out of business, and we decide to qualify a new vendor.
• We inspect 1000 units made using the new vendor’s raw material, and find 3 defective units for visual attributes.
• Can we claim, with 95% confidence, that the % defective for the new vendor is equal or lower than for the old vendor?
(c) 2016 / Raul Soto 22
Comparing Two Groups: Fisher’s Exact Test
• Old: N = 3574, x = 38• New: N = 1000, x = 3
• H0: p(P site) = p(Q site)• Ha: p(P site) < p(Q site)
• With a p-value of 0.012, we are 95% confident that the visual attributes % defective for Product Green using raw material G from the new vendor is lower than with material from the old vendor.
(c) 2016 / Raul Soto 23
Paired Comparison:
McNemar’s Test
(c) 2016 / Raul Soto 24
Paired Comparison: McNemar’s Test
• Paired data: one set of units measured by two methods / people / instruments / algorithms, etc.
• For quantitative data, to perform a paired test for means we use a paired t-test
• For binary data, we use McNemar’s test
• For pairs, is the proportion for group A equal to the proportion for group B?
(c) 2016 / Raul Soto 25
Paired Comparison: McNemar’s Test
• Non-paired tests (Fisher’s, t-test)• Compares two sets / groups of units
• Paired tests (McNemar’s, paired t-test)• Compares the same set of units, measured by two different means
(c) 2016 / Raul Soto 26
Paired Comparison: McNemar’s TestUses:
• Measurement System Capability Analysis for Attributes: compare• Inspection systems• Instruments• Vision inspection algorithms• Measurement system vs a standard
• Evaluate an inspector’s ability, before vs after training• Comparing two inspectors• Evaluate an attributes inspection system before vs after a change
(c) 2016 / Raul Soto 27
Paired Comparison: McNemar’s Test
• Measure set of units with both (A and B) instruments / methods / people / algorithms
B
Pass Fail
A
Pass Both Pass A PassB Fail
Fail A FailB Pass
Both Fail
Results fall into one of four categories:
• Both A and B PASS the unit• Both A and B FAIL the unit• A PASSES, B FAILS the unit• A FAILS, B PASSES the unit
(c) 2016 / Raul Soto 28
Paired Comparison: McNemar’s Test
• Ho: probability B rejects a unit = probability A rejects a unit• No statistically-significant difference
• Ha: probability B rejects a unit ≠ probability A rejects a unit• There is a statistically-significant difference
(c) 2016 / Raul Soto 29
Paired Comparison: McNemar’s Test
Example:
• Our non-destructive automated attributes inspection system runs vision Algorithm A.
• A prototype vision algorithm B is developed, which is much faster but is not expected to have a statistically significant impact on inspection accuracy.
• We wish to compare Algorithms A and B, and be able to claim with a 95% confidence that Algorithm B is equivalent to Algorithm A in terms of accuracy.
(c) 2016 / Raul Soto 30
Paired Comparison: McNemar’s Test
• Take 400 units, inspect the same units with both algoritms• Create the table
B
Pass Fail
A
Pass Both Pass A PassB Fail
Fail A FailB Pass
Both Fail
B
Pass Fail
A
Pass 219 21
Fail 15 145
(c) 2016 / Raul Soto 31
Paired Comparison: McNemar’s Test
(c) 2016 / Raul Soto 32
Paired Comparison: McNemar’s TestEnter Data
• Summarized• Select columns
(c) 2016 / Raul Soto 33
Paired Comparison: McNemar’s Test
• Proportions:• pa = 240 / 400 = 0.6• pb = 234 / 400 = 0.585• |difference| = 0.015
• McNemar’s p-value = 0.405
• Insufficient evidence to reject the null hypothesis
(c) 2016 / Raul Soto 34
Paired Comparison: McNemar’s Test
Claim:
• With a p-value of 0.405 we can claim with at least 95% confidence that the probability that Algorithm B rejects a unit is equal to the probability that Algorithm A rejects the same unit.
• Therefore we can consider Algorithm B as equivalent to Algorithm A in terms of accuracy.
(c) 2016 / Raul Soto 35
Paired Comparison: McNemar’s Test
Example:
• We wish to determine if a new QA inspector requires training for pass/fail visual attributes inspection.
• The person inspects a standard set of 12 units, some good, and some with defects.
• If there is a statistically significant difference between the inspector’s inspection results and the standards, then the operator requires retraining.
(c) 2016 / Raul Soto 36
Standard
Pass Fail
QA
Pass 1 8
Fail 1 2
Paired Comparison: McNemar’s Test
• McNemar’s p-value = 0.039, there is sufficient evidence to reject the null hypothesis
• Claim: with a p-value < 0.05, we are at least 95% confident that there is a statistically significant difference between this QA Inspector’s results and the standard.
• The person should be retrained.
(c) 2016 / Raul Soto 37
Logistic Regression for Binary Data
(c) 2016 / Raul Soto 38
What is “Logistic Regression” ?
• Regression model where the dependent variable is categorical, not quantitative
(c) 2016 / Raul Soto 39
Types of Logistic Regression
• Binary:• Response variable takes one of two values • Pass/Fail, Yes/No, 0/1, etc.
• Ordinal:• Response variable takes three or more possible outcomes, ranked in order• Low / Medium / High• Class 1, 2, 3 attribute defects
• Nominal:• Response variable takes three or more possible outcomes, with no order• Scratches, Dents, Tears
(c) 2016 / Raul Soto 40
Binary Logistic Regression
• Regression: determine if one or more factors (X) can be used to predict a response (Y) using a linear relationship
• For quantitative data, we use Simple Linear Regression (SLR)
• For binary responses:We can use binary logistic regression to determine if one or more factors (X) can be used to predict a binary response Y using a logistic relationship
(c) 2016 / Raul Soto 41
Binary Logistic Regression
• Response variable is binary (Pass / Fail, Presence / Absence, etc.)
• BLR models the probability (between 0 and 1) of success (or pass, or presence, etc.)
• Logistic regression equation is: logit (pi) = α+ βx1 + γx2 … etc.
• Where • α = intercept• β = coefficient for factor x1
• γ = coefficient for factor x2
(c) 2016 / Raul Soto 42
Binary Logistic Regression
Sample scenarios:• Determine if sealing temperature and pressure can predict the probability of a
pass/fail foil packaging defect. • Model the probability that a vision system will catch a defect as a function of the
size of the defect.• Understand if the presence of heart disease can be predicted from the amount of
physical activity, cholesterol concentration, glucose concentration, and body composition.
• Determine if the probability of failure of a system can be predicted from the total hours of use, number of times it is switched on/off, and the ambient temperature of the area where it operates.
(c) 2016 / Raul Soto 43
Binary Logistic Regression
Real-life example:
Cancer research
NAME: Prostate Cancer Study (PCS.DATA)
SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second EditionCopyright John Wiley & Sons Inc.
Raw data available here:https://www.umass.edu/statdata/statdata/stat-logistic.html
(c) 2016 / Raul Soto 44
Binary Logistic Regression
• Response:• CAPSULE: Probability of tumor penetrating the prostatic capsule
• Quantitative Factors:• AGE: age of the patient• PSA: Prostatic-specific antigen value (mg/ml)• VOL: tumor volume obtained from ultrasound (cm3)
• Study also included 4 qualitative factors
• For simplicity, we will model the response using only the quantitative factors
(c) 2016 / Raul Soto 45
Binary Logistic Regression
(c) 2016 / Raul Soto 46
Binary Logistic Regression
• p-values show that PSA and VOL have a statistically significant effect on the response, while AGE does not.
• VIF (Variance inflation factors) lower than 3 show that there is virtually no confounding / collinearity among the factors
• The model equation is displayed
• Goodness of Fit: The Hosmer-Lemeshow (p = 0.469) and Pearson (p = 0.454) tests are not statistically significant, which indicates that the model fits the data well. However the Deviance test (p = 0.003) comes out significant. This may be due to the factors removed to simplify the analysis for this example.
(c) 2016 / Raul Soto 47
Binary Logistic Regression
• Binary regression model after removing AGE
(c) 2016 / Raul Soto 48
Binary Logistic Regression
• This model can be used to predict what the probability of the response will be for specific values of the factors
(c) 2016 / Raul Soto 49
Binary Logistic Regression
Prediction• For a patient with
• Prostatic-specific antigen value of 10 mg/ml• Tumor volume (from ultrasound) of 56 cm3
• The probability that the tumor has penetrated the prostatic capsule is 22%
• With a 95% confidence interval of (13.5,33.6)
• Translate this to manufacturing:• Predict the probability of observing pass/fail
attributes defects for given settings of pressure, temperature, dwell time, etc.
(c) 2016 / Raul Soto 50
Binary Logistic Regression
• Contour / surface plot shows that the probability increases as VOL decreases, and as PSA increases
PSA
VOL
12010080604020
90
80
70
60
50
40
30
20
10
0
> – – – – < 0.1
0.1 0.30.3 0.50.5 0.70.7 0.9
0.9
CAPSULE
Contour Plot of CAPSULE vs VOL, PSA
(c) 2016 / Raul Soto 51
Binary Logistic RegressionOptimization• For this real-life example it doesn’t necessarily make a
lot of sense to talk about “optimization”, since the factors here can’t be manipulated to change the response.
• Correla on ≠ Causa on• But in a manufacturing scenario, we can use the
Response Optimizer in Minitab to find the settings for the factors that will maximize / minimize a response, or set the response to a target value.
• For % defects => minimize• For % probability of success => maximize• Can update multiple responses simultaneously, assign
priorities among them
(c) 2016 / Raul Soto 52
Binary Logistic Regression
• The response can be minimized to a probability of p = 0.0807 with the following factor values:
• PSA = 0.30• VOL = 97.60
(c) 2016 / Raul Soto 53
Binary Logistic Regression
• The response can be maximized to a probability of p = 0.998 with the following factor values:
• PSA = 139.7• VOL = 0
(c) 2016 / Raul Soto 54
References• Logistic Regression
• http://www.biostathandbook.com/simplelogistic.html• https://statistics.laerd.com/minitab-tutorials/binomial-logistic-regression-using-minitab.php• https://www.minitab.com/en-us/Published-Articles/Wine-Tasting-by-Numbers--Using-Binary-Logistic-Regression-to-
Reveal-the-Preferences-of-Experts/• https://www.umass.edu/statdata/statdata/stat-logistic.html
• Fisher’s Exact test• http://www.biostathandbook.com/fishers.html• http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/tables/other-statistics-and-
tests/what-is-fisher-s-exact-test/• McNemar’s test
• http://www.vassarstats.net/propcorr.html• http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/tables/other-statistics-and-
tests/why-should-i-use-mcnemar-s-test/• Binary Intervals
• http://www.sigmazone.com/binomial_confidence_interval.htm
(c) 2016 / Raul Soto 55
References• Fisher, Lloyd, and Gerald Van Belle. Biostatistics: A Methodology for the Health Sciences. New York: Wiley,
1993. Print.
• (1st edition)• p. 157 Binomial Confidence Intervals• p. 157 Fisher’s Exact Test• p. 179 McNemar’s Paired Test• p. 552 Logistic Regression
• There is a 2nd edition already available
(c) 2016 / Raul Soto 56
Questions
(c) 2016 / Raul Soto 57