ivt 2016 june - stat tools binary data

Statistical Toolsfor Binary Qualitative DataHOW TO ANALYZE BINARY DATA FORQUALITY ENGINEERING AND VALIDATION

RAUL SOTO, MSC, CQEIVT STATS CONFERENCE - JUNE 2016

PHILADELPHIA, PA

The contents of this presentation represent the opinion of the speaker; and not necessarily that of his present or past employers.

2(C) 2016 / RAUL SOTO

About the Author• 20+ years of experience in the medical devices, pharmaceutical, biotechnology, and consumer electronics industries

• MS Biotechnology, emphasis in Biomedical Engineering• BS Mechanical Engineering• ASQ Certified Quality Engineer (CQE)

• I have led validation / qualification efforts in multiple scenarios:

• High-speed, high-volume automated manufacturing and packaging equipment; machine vision systems• Laboratory information systems and instruments• Enterprise resource planning applications (i.e. SAP)• IT network infrastructure, Cognos & Business Objects reports• Manufacturing Execution Systems (MES)• Mobile apps• Product improvements, material changes, vendor changes

• Contact information:• Raul Soto [email protected]

(c) 2016 / Raul Soto 3

What this talk is about

• Introduce and describe the main statistical tools used for binary qualitative (attributes) data

• Understand basic concepts, underlying assumptions, and limitations

• Understand why we can’t just plug in numbers into Minitab without knowing the fundamental assumptions (“the fine print”)

• Can’t teach two semesters of Statistics in 90minutes …


Binary Data

• Attributes / qualitative data with only two possible outcomes

• Pass / Fail• Go / No Go• Good / Bad• Presence / No Presence• Heads / Tails in a coin flip

• Can be modeled using the binomial distribution


Binomial Distribution• There are 4 conditions / assumptions that MUST hold to use these tools

• Only two possible outcomes

• Fixed number of trials, n • if you add samples later, the binomial distribution no longer models your process

• Independent trials (one trial has no effect on another)

• Probability of each outcome stays the same from trial to trial• sample size < 1/10 population size


Tools that will be presented

• Process Capability : Binary Intervals

• Comparisons: Fisher’s Exact Test

• Paired Comparisons: McNemar’s Test

• Modeling: Binary Logistic Regression


Process Capability: Binary Intervals



• Process capability: quantify how well we are meeting our specifications

• For quantitative data, we typically use Cpk / Ppk.

• We can express process capability for a binary process (pass/fail, good/bad)• In terms of % defective ≤ specification (i.e. 5% or less)• In terms of probability of success ≥ specification (i.e. 99% or better)



• Binary intervals allow us to make claims such as:

• We are 95% confident that the lot percent defective is ≤ x%.

• We are 99% confident that the probability of success is ≥ x%.

• Confidence: 99% for validation, 95% for regular production (recommended)

(c) 2016 / Raul Soto 10


Example:

• We want to determine if the laser drilling process for our medical device is capable.

• We drill 600 units, inspect them, and obtain 1 unit with Class 1 visual attributes defects.

• Our specification for Class 1 defects is 1% or less (99% success or better)

• Can we claim with 95% confidence that our process is capable?

(c) 2016 / Raul Soto 11


(c) 2016 / Raul Soto 12


• We are 95% confident that our probability of success is ≥ 99.992% (1 – 0.000085).

• We are 95% confident that our percent defective is ≤ 1% (0.0085%) for Class 1 visual attributes defects.

• Process is capable

(c) 2016 / Raul Soto 13


• What if we found 15 Class 1 visual attributes defects in those 600 units?

• We are 95% confident that our probability of success is ≥ 98.5% (1-0.015472)• We are 95% confident that our percent defective is ≤ 1.5% for Class 1 visual

attributes defects.

• Can’t claim the process is capable.

(c) 2016 / Raul Soto 14

Comparing Two Groups: Fisher’s Exact Test

(c) 2016 / Raul Soto 15


• For quantitative data, to compare the means of two data sets we use hypothesis tests (i.e. 2-sample t-test)

• For binary data, we compare proportions (or probability of success), not means

• Is the proportion for group A equal to the proportion for group B?

• To answer this question we can use Fisher’s Exact Test

(c) 2016 / Raul Soto 16


• Compare binary attribute defects for two groups (two data sets)

• raw material vendor A vs B• Line / shift / plant A vs B• Method A vs B• Machine tooling A vs B• Before vs after a process or equipment change

(c) 2016 / Raul Soto 17


• One-sided vs Two-sided comparisons

• Two-sided: are the % defective (or probability of success) equal for both groups equal?

• One-sided: is the % defective (or probability of success) for group A smaller / greater than for group B?

(c) 2016 / Raul Soto 18


Example:

• The manufacturing of product Green was transferred from our P site to our Q site. We would like to compare the % defective for visual attributes in both sites.

• In site P, a total of 2699 units were inspected, 25 were found defective• In site Q, a total of 3574 units were inspected, 38 were found defective

• Can we claim, with 95% confidence, that the % defective for Site Q is equal or lower than for site P?

(c) 2016 / Raul Soto 19


One-sided or two-sided(c) 2016 / Raul Soto 20


• Use the Fisher’s Exact Test result• H0: p(P site) = p(Q site)• Ha: p(P site) < p(Q site)

• Fisher’s p-value => probability H0 is true• If Fisher’s p-value > 0.05, do not reject H0

• With a p-value of 0.746, we are 95% confident that there is no statistical difference between the visual attributes % defectives for Plants P and Q.

The shaded area is a t-Test approximation of the binomial distribution. This was used frequently back when there was no software that could perform Fisher’s exact test, and approximations had to be used. Only accurate for large sample sizes and probability of success between 20 – 80%

(c) 2016 / Raul Soto 21


Example:

• The vendor for raw material G, used to make Product Green, goes out of business, and we decide to qualify a new vendor.

• We inspect 1000 units made using the new vendor’s raw material, and find 3 defective units for visual attributes.

• Can we claim, with 95% confidence, that the % defective for the new vendor is equal or lower than for the old vendor?

(c) 2016 / Raul Soto 22


• Old: N = 3574, x = 38• New: N = 1000, x = 3

• H0: p(P site) = p(Q site)• Ha: p(P site) < p(Q site)

• With a p-value of 0.012, we are 95% confident that the visual attributes % defective for Product Green using raw material G from the new vendor is lower than with material from the old vendor.

(c) 2016 / Raul Soto 23

Paired Comparison:

McNemar’s Test

(c) 2016 / Raul Soto 24

Paired Comparison: McNemar’s Test

• Paired data: one set of units measured by two methods / people / instruments / algorithms, etc.

• For quantitative data, to perform a paired test for means we use a paired t-test

• For binary data, we use McNemar’s test

• For pairs, is the proportion for group A equal to the proportion for group B?

(c) 2016 / Raul Soto 25


• Non-paired tests (Fisher’s, t-test)• Compares two sets / groups of units

• Paired tests (McNemar’s, paired t-test)• Compares the same set of units, measured by two different means

(c) 2016 / Raul Soto 26

Paired Comparison: McNemar’s TestUses:

• Measurement System Capability Analysis for Attributes: compare• Inspection systems• Instruments• Vision inspection algorithms• Measurement system vs a standard

• Evaluate an inspector’s ability, before vs after training• Comparing two inspectors• Evaluate an attributes inspection system before vs after a change

(c) 2016 / Raul Soto 27


• Measure set of units with both (A and B) instruments / methods / people / algorithms

B

Pass Fail

A

Pass Both Pass A PassB Fail

Fail A FailB Pass

Both Fail

Results fall into one of four categories:

• Both A and B PASS the unit• Both A and B FAIL the unit• A PASSES, B FAILS the unit• A FAILS, B PASSES the unit

(c) 2016 / Raul Soto 28


• Ho: probability B rejects a unit = probability A rejects a unit• No statistically-significant difference

• Ha: probability B rejects a unit ≠ probability A rejects a unit• There is a statistically-significant difference

(c) 2016 / Raul Soto 29


Example:

• Our non-destructive automated attributes inspection system runs vision Algorithm A.

• A prototype vision algorithm B is developed, which is much faster but is not expected to have a statistically significant impact on inspection accuracy.

• We wish to compare Algorithms A and B, and be able to claim with a 95% confidence that Algorithm B is equivalent to Algorithm A in terms of accuracy.

(c) 2016 / Raul Soto 30


• Take 400 units, inspect the same units with both algoritms• Create the table

B

Pass Fail

A

Pass Both Pass A PassB Fail

Fail A FailB Pass

Both Fail

B

Pass Fail

A

Pass 219 21

Fail 15 145

(c) 2016 / Raul Soto 31


(c) 2016 / Raul Soto 32

Paired Comparison: McNemar’s TestEnter Data

• Summarized• Select columns

(c) 2016 / Raul Soto 33


• Proportions:• pa = 240 / 400 = 0.6• pb = 234 / 400 = 0.585• |difference| = 0.015

• McNemar’s p-value = 0.405

• Insufficient evidence to reject the null hypothesis

(c) 2016 / Raul Soto 34


Claim:

• With a p-value of 0.405 we can claim with at least 95% confidence that the probability that Algorithm B rejects a unit is equal to the probability that Algorithm A rejects the same unit.

• Therefore we can consider Algorithm B as equivalent to Algorithm A in terms of accuracy.

(c) 2016 / Raul Soto 35


Example:

• We wish to determine if a new QA inspector requires training for pass/fail visual attributes inspection.

• The person inspects a standard set of 12 units, some good, and some with defects.

• If there is a statistically significant difference between the inspector’s inspection results and the standards, then the operator requires retraining.

(c) 2016 / Raul Soto 36

Standard

Pass Fail

QA

Pass 1 8

Fail 1 2


• McNemar’s p-value = 0.039, there is sufficient evidence to reject the null hypothesis

• Claim: with a p-value < 0.05, we are at least 95% confident that there is a statistically significant difference between this QA Inspector’s results and the standard.

• The person should be retrained.

(c) 2016 / Raul Soto 37

Logistic Regression for Binary Data

(c) 2016 / Raul Soto 38

What is “Logistic Regression” ?

• Regression model where the dependent variable is categorical, not quantitative

(c) 2016 / Raul Soto 39

Types of Logistic Regression

• Binary:• Response variable takes one of two values • Pass/Fail, Yes/No, 0/1, etc.

• Ordinal:• Response variable takes three or more possible outcomes, ranked in order• Low / Medium / High• Class 1, 2, 3 attribute defects

• Nominal:• Response variable takes three or more possible outcomes, with no order• Scratches, Dents, Tears

(c) 2016 / Raul Soto 40

Binary Logistic Regression

• Regression: determine if one or more factors (X) can be used to predict a response (Y) using a linear relationship

• For quantitative data, we use Simple Linear Regression (SLR)

• For binary responses:We can use binary logistic regression to determine if one or more factors (X) can be used to predict a binary response Y using a logistic relationship

(c) 2016 / Raul Soto 41


• Response variable is binary (Pass / Fail, Presence / Absence, etc.)

• BLR models the probability (between 0 and 1) of success (or pass, or presence, etc.)

• Logistic regression equation is: logit (pi) = α+ βx1 + γx2 … etc.

• Where • α = intercept• β = coefficient for factor x1

• γ = coefficient for factor x2

(c) 2016 / Raul Soto 42


Sample scenarios:• Determine if sealing temperature and pressure can predict the probability of a

pass/fail foil packaging defect. • Model the probability that a vision system will catch a defect as a function of the

size of the defect.• Understand if the presence of heart disease can be predicted from the amount of

physical activity, cholesterol concentration, glucose concentration, and body composition.

• Determine if the probability of failure of a system can be predicted from the total hours of use, number of times it is switched on/off, and the ambient temperature of the area where it operates.

(c) 2016 / Raul Soto 43


Real-life example:

Cancer research

NAME: Prostate Cancer Study (PCS.DATA)

SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second EditionCopyright John Wiley & Sons Inc.

Raw data available here:https://www.umass.edu/statdata/statdata/stat-logistic.html

(c) 2016 / Raul Soto 44


• Response:• CAPSULE: Probability of tumor penetrating the prostatic capsule

• Quantitative Factors:• AGE: age of the patient• PSA: Prostatic-specific antigen value (mg/ml)• VOL: tumor volume obtained from ultrasound (cm3)

• Study also included 4 qualitative factors

• For simplicity, we will model the response using only the quantitative factors

(c) 2016 / Raul Soto 45


(c) 2016 / Raul Soto 46


• p-values show that PSA and VOL have a statistically significant effect on the response, while AGE does not.

• VIF (Variance inflation factors) lower than 3 show that there is virtually no confounding / collinearity among the factors

• The model equation is displayed

• Goodness of Fit: The Hosmer-Lemeshow (p = 0.469) and Pearson (p = 0.454) tests are not statistically significant, which indicates that the model fits the data well. However the Deviance test (p = 0.003) comes out significant. This may be due to the factors removed to simplify the analysis for this example.

(c) 2016 / Raul Soto 47


• Binary regression model after removing AGE

(c) 2016 / Raul Soto 48


• This model can be used to predict what the probability of the response will be for specific values of the factors

(c) 2016 / Raul Soto 49


Prediction• For a patient with

• Prostatic-specific antigen value of 10 mg/ml• Tumor volume (from ultrasound) of 56 cm3

• The probability that the tumor has penetrated the prostatic capsule is 22%

• With a 95% confidence interval of (13.5,33.6)

• Translate this to manufacturing:• Predict the probability of observing pass/fail

attributes defects for given settings of pressure, temperature, dwell time, etc.

(c) 2016 / Raul Soto 50


• Contour / surface plot shows that the probability increases as VOL decreases, and as PSA increases

PSA

VOL

12010080604020

90

80

70

60

50

40

30

20

10

0

> – – – – < 0.1

0.1 0.30.3 0.50.5 0.70.7 0.9

0.9

CAPSULE

Contour Plot of CAPSULE vs VOL, PSA

(c) 2016 / Raul Soto 51

Binary Logistic RegressionOptimization• For this real-life example it doesn’t necessarily make a

lot of sense to talk about “optimization”, since the factors here can’t be manipulated to change the response.

• Correla on ≠ Causa on• But in a manufacturing scenario, we can use the

Response Optimizer in Minitab to find the settings for the factors that will maximize / minimize a response, or set the response to a target value.

• For % defects => minimize• For % probability of success => maximize• Can update multiple responses simultaneously, assign

priorities among them

(c) 2016 / Raul Soto 52


• The response can be minimized to a probability of p = 0.0807 with the following factor values:

• PSA = 0.30• VOL = 97.60

(c) 2016 / Raul Soto 53


• The response can be maximized to a probability of p = 0.998 with the following factor values:

• PSA = 139.7• VOL = 0

(c) 2016 / Raul Soto 54

References• Logistic Regression

• http://www.biostathandbook.com/simplelogistic.html• https://statistics.laerd.com/minitab-tutorials/binomial-logistic-regression-using-minitab.php• https://www.minitab.com/en-us/Published-Articles/Wine-Tasting-by-Numbers--Using-Binary-Logistic-Regression-to-

Reveal-the-Preferences-of-Experts/• https://www.umass.edu/statdata/statdata/stat-logistic.html

• Fisher’s Exact test• http://www.biostathandbook.com/fishers.html• http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/tables/other-statistics-and-

tests/what-is-fisher-s-exact-test/• McNemar’s test

• http://www.vassarstats.net/propcorr.html• http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/tables/other-statistics-and-

tests/why-should-i-use-mcnemar-s-test/• Binary Intervals

• http://www.sigmazone.com/binomial_confidence_interval.htm

(c) 2016 / Raul Soto 55

References• Fisher, Lloyd, and Gerald Van Belle. Biostatistics: A Methodology for the Health Sciences. New York: Wiley,

1993. Print.

• (1st edition)• p. 157 Binomial Confidence Intervals• p. 157 Fisher’s Exact Test• p. 179 McNemar’s Paired Test• p. 552 Logistic Regression

• There is a 2nd edition already available

(c) 2016 / Raul Soto 56

Questions

(c) 2016 / Raul Soto 57

ivt 2016 june - stat tools binary data

Documents