03 - measure current state

Session 03 - Measuring Current State

Pat Hammett, University of Michigan 1

1

Six Sigma Measure Phase

Measuring the CurrentState of a Process

2

Case Study Scanner Mfg

Key Output Variables (Ys) Weld Shear Force (from destructive test)

Specification: Shear Force > 13 lbs Visual Weld Inspection (binary: pass/fail)

Process Variables (Xs) Material (melt flow index) Surface condition Press force Clamping force Temperature

1

3

54

2

Problem:Weld Defects betweenMylar Motor and Attachment Bracket(UltrasonicWeld Operation)



3

Topics

I. Review Types of Data

II. Review of Exploring Data Patterns and Descriptive Statistics

III. Six Sigma Metric Calculations* Yield Defects Per Million (DPM) Defects Per Million Opportunities (DPMO) DPM based on Variable (Numerical) Data

* Note: Other metrics will be discussed in future lectures

4

I. Types of Data Variables

Selection of analysis method/tool depends on type of data

Discrete/ Continuous Variables (Numerical/Quantitative Data) Discrete variables - vary by whole units (# of customers) Continuous variables - vary to any degree, limited only by

precision of measurement system. Time to complete a task Manufactured hole diameter measurement may be 10 mm, 10.0 mm, 10.01 mm, 10.008 mm

Qualitative (Categorical) Variables (Attribute Data) Binary (pass/fail; defective/ not defective) Ordinal (ordered classification system such as survey rating systems) Nominal (non-ordered groups or classifications)



5

Qualitative (Categorical) Data

To analyze qualitative data, we typically assign discrete numerical values and/or use them to stratify or group other numerical data by categories

Some examples are: Binary Variables assign discrete binary outcome (0/1)

Examples: On Time Delivery, Service Quality Binary Attribute: On Time (0) / Late (1); OK (0) / NOK (1)

Ordinal Variables assign discrete ordinal scale to classify responses Ordinal Attribute natural order is implied between categories but the magnitude

of difference is unknown Example 1: Variable = Size

Small, Medium, and Large Example 2: Variable = Survey Response to Question (with ordinal attribute scale)

Strongly Disagree(1), Disagree(2), Neutral(3), .. Strongly Agree (5)

Nominal (Categorical or Grouping) Variables use to stratify or group data Variable Example: Distribution Center

Nominal Attributes: Northeast, Southwest, Central Other Examples: Shift (e.g., Day or Night); Plant; Department; Model Type

6

II. Review of Exploring Data Patterns and Descriptive Statistics

To characterize a variable, we typically observe a Sample from a Population and run statistical analysis (e.g., compute Statistics).

Some Common Statistical Analysis/Tools to characterize a variable include:A. Data patterns regardless of time order

Common Tools (Sample size, N > 30): Histogram, Box Plot If small sample size (e.g., N < 30): use Dot Plot

B. Data patterns in time order (i.e., to evaluate process stability over time) Run Chart (also known as trend chart or time series plot) Statistical Process Control (SPC) Chart (refer to SPC lecture)

C. Descriptive Statistics Summary Table common statistics to report include: Sample Size, N Location Statistics: Mean and Median Dispersion (Variation) Statistics: St Dev, Variance, Range (with Min and Max) Symmetry and Peakedness of Distribution Shape: Skewness and Kurtosis Additional Statistics: Trimmed Mean, Quartiles, or Percentiles



7

A. Histogram Example

Typical Y-Axis: frequency or relative frequency May use relative frequency (%) if sample size is large May create using Excel or Minitab

Minitab Commands:>> Graph>> Histogram>> Select VariableShearForce

ShearForce

Freq

uenc

y

24181260

16

14

12

10

8

6

4

2

0

Histogram of ShearForce

Note: Requirement is Shear Force >= 13 (Lower Specification Limit (LSL) = 13)

8

Normal Vs. Skewed Data

Does shear force data appear normally distributed or another (e.g., skewed right, skew left, or bi-modal)? Is this likely a natural

phenomenon?

Normal Skewed Right Bi-Modal

ShearForce

Freq

uenc

y

24181260

16

14

12

10

8

6

4

2

0


Skewed Left



9

Statistical Test - Normality We may use Minitab to test for Normality

Null Hypothesis (Ho): Data are Normal; Ha ~ Data are not Normal Test Conclusion: p-value is ~0.000 (note: if p-value < alpha, reject Ho)

ShearForce

Perc

ent

403020100

1.0E+02

99

9590

80706050403020

10

5

1

0.1

Mean

> Stat>> Basic Statistics>> Normality TestSelect Variable

ShearForce

Note: Selected Anderson Darling Test

Default: alpha error = 0.05

10

Box Plot Calculations

**

Mild Outlier(s)

Upper Whisker:Highest value within

upper limit

Median

Third quartile (Q3)

First quartile (Q1)

Q3 75th PercentileMedian - 50th PercentileQ1 25th Percentilefs = Q3 Q1

Upper Limit:Q3 + 1.5 fsLower Limit:Q1 1.5 fs*Lower Whisker:

Lowest value within lower limit

Extreme Outlier(s)

< extremeoutlier

> extremeoutlier

Q1 - 1.5 fs > Q1 - 3.0 fs> mildoutlier

Q3 + 1.5 fs < Q3 + 3.0 fs< mildoutlier

Excel Command (E.g., Q3)=percentile(data array, 0.75)



11

Box Plot Shear Force What does this box plot suggest?

Minitab Commands:>> Graph>> Boxplot>> Select VariableY = ShearForce

Shea

rFor

ce

30

25

20

15

10

5

0

Boxplot of ShearForce

12

Histogram Vs. Box Plot

Box plots provide a similar representation of distribution as Histogram (for Normal, skewed right, skewed left) Exception: must show multi-modal with histogram

ShearForce

Freq

uenc

y

24181260

16

14

12

10

8

6

4

2

0


Shea

rFor

ce

30

25

20

15

10

5

0

Boxplot of ShearForce



13

Outlier Analysis (Extreme Values)

Box plots provide an effective tool to identify possible outliers

Outliers are non-representative values in a data set and generally result from measurement or data entry error (e.g., record using wrong units) observation being obtained under a different set of circumstances

(e.g., special cause) data recorded during peak volume versus typical conditions

Outliers may significantly affect descriptive statistics such asmean/standard deviation and other statistics (e.g., correlation between two variables)

14

Outliers: Good Or Bad?

Data Analysis Trap is to automatically exclude outliers

Outliers may suggest a better set of operating conditions are available

Unfortunately, deciding whether to include or exclude outliers is an experience-developed skill Try to understand the source of outliers before discarding

If decide to remove outlier, some typical strategies are: With a large sample size, remove the entire observation For smaller samples (N < 100) where you collect data on

several variables, you may want to keep the sample. Here, we typically replace the outlier sample value with median value for that variable. Why Median?



15

Multiple Box Plots

Minitab Commands:>> Graph>> Boxplot>> Select Graph VariableY = ShearForceX = Batch

During the analyze phase, we often stratify Box Plot Results for Y output by grouping variables (e.g., Nominal Variables) Is shear force consistent across all batches of incoming material?

Production Batch*

Shea

rFor

ce

P3P2P1

30

25

20

15

10

5

0

Boxplot of ShearForce vs Production Batch*

16

B. Run Chart (Time Series Plot)

If time sequence available, we often like to examine data by time (look for time trends)

Index

Shea

r Fo

rce

(lb)

60544842363024181261

30

25

20

15

10

5

0

Time Series Plot of Shear Force (lb)

Minitab Commands:>> Graph>> Time Series Plot>> Select Graph VariableY = ShearForce



17

C. Minitab - Descriptive Statistics

Another common analysis to perform during the measure phase is to compute descriptive statistics for Y (if Y may be evaluated as continuous variable)

Descriptive Statistics: ShearForce Minitab Command >> Stat >> Basic Statistics

Descriptive Statistics: Shear Force (lb)

Variable N N* Mean SE Mean StDev Minimum Q1 MedianShear Force (lb) 60 0 17.670 0.883 6.841 1.400 11.350 20.200

Variable Q3 Maximum Skewness KurtosisShear Force (lb) 23.275 26.900 -0.75 -0.53

Or, Use Excel to Create Table with: N, Mean, StDev, Min, Max, Range, Skew

Questions: What does a skewness of -0.75 suggest? Why does the median differ from the mean for these data?

18

Stratification Analysis of Descriptive Statistics

May wish to stratify an output by an X variable Descriptive Statistics: ShearForce

Minitab Command >> Stat >> Basic Statistics By Variable: Batch

What do these data suggest?

Descriptive Statistics: ShearForce

Production

Batch* N Mean TrMean StDev Minimum Median Maximum

P1 20 22.170 22.272 2.859 16.200 22.450 26.300

P2 20 16.30 16.47 7.07 2.60 18.05 26.90

P3 20 14.55 14.71 7.32 1.40 12.30 24.70



19

III. Six Sigma Metric Calculations

1. Yield (e.g., Simple Quality Yield)

2. Defects Per Million (DPM) (Attribute Data) Note: DPM also known as PPM for parts per million defective

3. Defects Per Million Opportunities (DPMO)

4. Defects per Million (Observed DPM)

5. Defects per Million (Expected DPM)

Note: Other Six Sigma Metrics covered later in course Process Capability, Reliability, Rolled Throughput Yield

20

Specifications

To calculate Yield (or % defective, DPM, DPMO) we need standards or specification limits LSL Lower Specification Limit; USL Upper Specification Limit

Specification limits identify acceptance levels. Unilateral Specification Limit Examples

Process time = 13 lbs

Bilateral Specification Limit Examples 30 +/- 5 days (Nominal=30; LSL=25; USL=35) Width 1000 +/- 0.5 mm



21

1. Quality Yield (% Acceptable)

Quality Yield = (# Good Units) / (Total # Units) x 100% Unit: part, service, customer, document, procedure, etc.

Or, Yield = (1 Fraction Defective) x 100% Where Fraction Defective = # Defective / Total # Units # Defective is a binary assessment (e.g., 0-not late; 1-late)

typical convention for binary let defect = 1

Example: Suppose 232 of 1034 bills are late (802 are on-time),

calculate the Quality YieldQuality Yield = 802/1034 = 77.6%

22

DPM and DPMO Methods Depending on type of data, often convert Yield to defects per million

(DPM) or defects per million opportunity (DPMO ) Method used varies based on type of data/ assumptions



23

2. Defective Method for DPM

Suppose you have a process where each unit is classified as defective or not defective

DPM = Fraction Defective x 1 MillionNote: Yield = 1 fraction defective

Suppose you fabricate 4000 welds and find that 35 are defective. What is the DPM?

InspectedUnitsTotalDefectiveTotal

# # DefectiveFraction =

Fraction Defective: 35 / 4000 = 0.008750

DPM =

24

3. # Defects per Unit Opportunity Method (DPMO)

Use if a particular inspection unit or part has 1 or more defects (multiple opportunities)

Example: Suppose we visually inspect weld manufacturing process for various conditions A: Excess Part Deflection after welding B: Poor weld penetration C: Poor weld appearance (e.g., excess flash)

Note: each weld (unit) could have 0 - 3 defects



25

Defects per Million Opportunity (DPMO)

Here, we use opportunities to summarize the total number of possible chances for error (i.e., defects) in system

Where: Total # Defects = Total # defects across all units

Million 1(TOP) iesOpportunit Total

Defects#TotalDPMO x=

categorydefect iesOpportunit # Total

==

iiesOpportunit i

26

DPMO Example

Given the following data set of three features per unit: Suppose you have 1,000 welds (TOP = 3 x 1,000 = 3000)

Fraction nonconforming = 59/3,000 = 1.967% DPMO = 19,667

Part Feature DefectsA 22B 19C 18

total 59



27

DPMO Hotel Survey Example Varying Opportunities per Unit

the number of opportunities may vary by unit (customer) In hotel example below, not all guests may use hotel meal service Here, the total opportunities is obtained by summing the

opportunities for each category Given the following data set, what is the DPMO?

Concern GuestsDefects

(Not Satisfied)

Opportunities

Poor Meal Service* 447 111 447Poor House Keeping 1000 82 1000Problems with Reservations 1000 34 1000Long Check In 1000 96 1000Long Check Out 1000 58 1000

Total 381 4447# defects TOP

* Note: not all guests used a hotel meal service

28

Overall DPMO For Multiple Groups (Facilities)

DPMO also may be used to summarize multiple groups (e.g., departments, facilities) Note: Opportunity per group also provides a measure of complexity For example, perhaps one of the hotel does not offer any meal services

DPMO = 1054/13786 * 1M

Hotel Poor Meal Service*

Poor House

Keeping

Problems with Reservations

Long Check

In

Long Check

Out

Total Defects TOP

A 111 82 34 96 58 381 4447B 120 89 37 102 62 410 5114C n/a 75 28 90 70 263 4225

TOTAL 1054 13786

OVERALL DPMO 76,454

Defects



29

Feature # Defects # Opportunities DPM0A 3 200,000 15

B 0 200,000C 0 200,000D 0 200,000E 0 200,000

0

} CombinedDPMO= 3(3 / 1M)

DPMO The Denominator Game

Suppose we measure 200,000 units with 1 feature per unit. What happens to the DPMO as the # of features (concerns) with NO defects increases? NOTE: Features MUST BE Customer Related and should not just

be added to improve DPMO

Total Defects: 3 Total Opportunities: 1,000,000

30

Denominator Game Example

Suppose you have a hole specification

Could you have one defect opportunity for oversized and another for undersized?

What if we added the category missing weld to our example? How might we include that in determining total opportunities?



31

4. Variable Data Methodfor Observed DPM

If you collect numerical measurements for a characteristic (dimension) of each unit, we may convert each observation to a binary result based on specification limits of the characteristic and then compute DPM

Either In-Specification or Out-Specification (Defect)

Here, fraction defective = # units observed out-of-specification / total # units

DPM = Fraction Defective x 1 Million Also known as parts per million (PPM) defective

32

DPM Example: Shear Force(based on Observed Out-Specification)

Specifications: Ok, if shear force

>= 13

To compute DPM, need to convert each observed measurement to a binary output (0-within specification, 1= outside specification or a defect)

Note: Observed DPM also may be obtained using Minitab with Process Capability Summary Analysis Tool



33

5. Variable Data Methodfor Expected DPM

Used when collecting variable data and data may be reasonably assumed to follow a known or assumed distribution (e.g., normal)

Use software to fit data to statistical distribution (e.g., Normal Distribution) and estimate the probability (Pr) of a defect based on the distribution and its properties

Expected (Predicted) DPM = Pr (Defect) x 1 Million

DEFECT DEFECT

LOWERSPECIFICATION

UpperSPECIFICATION

NormalExample:bilateraltolerance

34

Expected DPM Using Minitab Capability Analysis: Minitab will compute expected DPM (based on assumed

distribution). Note will examine non-normal distributions in later module or see appendix)

Note: Menu will vary based on Minitab Version Used

Suppose weassume Normality

Version 14



35

Minitab Process Capability Analysis (excellent all-in-one analysis tool**)

Minitab (Version 14) Command:Stat >> Quality Tools >> Capability Analysis (Normal)Variable: ShearForce Subgroup Size ~ 1; LSL=13(minitab assumptions: unbiasing constants, average moving range method with length=2)

Does NormalityAssumption Matterin this example?

3024181260

LSLProcess Data

Sample N 60StDev(Within) 4.56185StDev(Overall) 6.86963

LSL 13Target *USL *Sample Mean 17.67

Potential (Within) Capability

CCpk 0.34

Overall Capability

Pp *PPL 0.23PPU *Ppk

Cp

0.23Cpm *

*CPL 0.34CPU *Cpk 0.34

Observed PerformancePPM < LSL 316666.67PPM > USL *PPM Total 316666.67

Exp. Within PerformancePPM < LSL 152986.54PPM > USL *PPM Total 152986.54

Exp. Overall PerformancePPM < LSL 248314.53PPM > USL *PPM Total 248314.53

WithinOverall

Process Capability of ShearForce

Observed DPM:316,667

Expected (Predicted)DPM: 248,314

36

Observed Vs. Expected DPM

If collect variable data (e.g., continuous) and have specifications, we may always convert to a binary outcome and compute Observed DPM

Or, we can predict the DPM (Expected DPM) by fitting sample data to a distribution and then determining the probability of a defect x 1M.

Of note: neither is wrong ultimately you want to use the most representative estimate -- Rule of thumb:

If data reasonably fit a distribution shape (e.g., Normal or Weibull), report the Expected (Predicted) DPM. Particularly if data are from a smaller sample size (e.g., 30-100).

If data do not reasonably fit a distribution and large sample size is available (> 200), use observed DPM.

If not sure, report them both in current state note: data often are not normal when assessing the current state

during measure phase as some problems create non-normality



37

Summary In the measure phase, we typically include:

Histogram and/or Box Plot of raw data (if continuous data) May include Normality Test or Distribution ID Probability Plot Analysis (see appendix)

Run Chart (or SPC Chart) to show any time series trends Summary Statistics (if continuous data)

N, mean, median, standard deviation, variance, min, max, range, skew Estimate of Current State in terms of: Yield, DPM, or DPMO

Calculations vary depending on type of data, best fit distribution, defect opportunity classification, # opportunities for defect per unit, etc.

For numerical variables, use Expected DPM for smaller samples sizes (< 100), particularly if data reasonably fit a known distribution. For larger sample sizes, may use either observed DPM and/or Expected DPM (if good distribution fit).

When identifying opportunities for DPMO, they should be important to the customer and independent of other categories (avoid denominator game).

38

Appendix: Distribution ID Plot Minitab has a tool to help determine best distribution fit

STAT >> Reliability/Survival >> Distribution Analysis Right Censoring >> Distribution ID Plot

Choose distribution with highest correlation coefficient / lowest AD score

Common DistributionOptions:Weibull (best result)ExponentialLognormalNormalOthers available



39

Shear Force Results Best Result look for:

lowest AD score based on max

likelihood estimation

highest correlation coefficient based on Least

Squares Estimation

Here, we do not have a good distribution fit for any of the options (recall, bi-modal)!

ShearForce

Pe

rce

nt

100101

1.0E+02

90

50

10

ShearForce

Pe

rce

nt

100101

1.0E+0299

90

50

10

10.1

ShearForce

Pe

rce

nt

100.010.01.00.1

1.0E+02

90

50

10

ShearForce

Pe

rce

nt

40200

1.0E+0299

90

50

10

10.1

Correlation CoefficientWeibull0.948

Lognormal0.865

Exponentia*

Normal0.954

Probability Plot for ShearForceLSXY Estimates-Complete Data

Weibull Lognormal

Exponential Normal

40

Use Best Fit Distribution to Estimate DPM

Note: topic covered in process capability analysis module

Select Desired Distribution

03 - measure current state

Documents

data patterns

qualitative data

binary variables

ordinal variables

variable numerical data

qualitative categorical

group data variable

customers continuous