mod 5 05 basic statistics march 02

1© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02

Basic Statistics for Process Improvement


The Breakthrough StrategyDefine BB Works with Management

1 Select Output Characteristic and identify key process input and output variables

2 Define Performance Standards

3 Validate Measurement System

4 Establish Product Capability

5 Define Performance Objectives

6 Identify Variation Sources

7 Screen Potential Causes

8 Discover Variable Relationships

9 Establish Operating Tolerances

10 Validate Measurement System

11 Determine Process Capability

12 Implement Process Controls

Measure

Analyze

Improve

Control

Characterize

Optimize


Measurement Phase• Project Definition:

– Problem Description – Project Metrics

• Process Exploration:– Process Flow Diagram– C&E Matrix, PFMEA, Fishbones– Data collection system

• Measurement System(s) Analysis (MSA):– Attribute / Variable Gage Studies

• Capability Assessment (on each Y)– Capability (Cpk, Ppk, s Level, DPU, RTY)

• Graphical & Statistical Tools• Project Summary

– Conclusion(s)– Issues and barriers– Next steps

• Completed “Local Project Review”


2520151050

75

70

65

Sample Number

Sam

ple

Mea

n

X-Bar Chart for Process A

X=70.91

UCL=77.20

LCL=64.62

2520151050

80

70

60

50

Sample Number

Sam

ple

Mea

n

X-Bar Chart for Process B

X=70.98

UCL=77.27

LCL=64.70

Basic Statistics Fundamentals of Improvement

• Variability– Is the process on target with minimum variability?– We use the mean to determine if process is on target. We

use the Standard Deviation to determine spread• Stability

– How does the process perform over time?– Stability is represented by a constant mean and predictable

variability over time.


20100

1451351251151059585756555

Sample Number

Sam

ple

Mea

n

X-bar Chart for Machine A

X=100.7

138.4

62.9320100

110

100

90

Sample Number

Sam

ple

Mea

n

X-bar Chart for Machine B

1

1

X=101.0

108.5

93.42

20100

120

115

110

Sample NumberS

ampl

e M

ean

X-bar Chart for Machine C

X=115.0

119.7

110.4

Warm-Up Exercise• Assume machines A, B, and C make identical products (w/range

charts in control)• Assume that the target value for each product output variable is 100

mm• Answer the following questions:

– Which machines exhibit(s) variation?– Where is each machine centered?– Which machines are predictable over time?– Which machines have special cause variation?– Which machine would you want making your product?– Which machine would probably be easiest to fix?


Co

stC

ost

LSLLSLUSLUSLNomNom

Taguchi Loss Function

(New View)

Taguchi Loss Function

(New View)

LSLLSLUSLUSLNomNom USLUSL

Traditional View

Traditional ViewAcceptableAcceptable

Can We Tolerate Variability?• There will always be variability present in any process• We can tolerate variability if:

– the process is on target– the total variability is relatively small compared to the

process specifications– the process is stable over time


Data Analysis Tasks for Improvement

• Determine if process is stable– If process is not stable, identify and remove causes

(X’s) of instability (obvious non-random variation)• Estimate the magnitude of the total variability. Is it

acceptable with respect to the customer requirements (spec limits)?– If not, identify the sources of the variability and

eliminate or reduce their influence on the process• Determine the location of the process mean. Is it on

target?– If not, identify the variables (X’s) which affect the mean

and determine optimal settings to achieve target value• We will now review statistics that help this process


Types of Outputs (Data)

• Attribute Data (Qualitative)– Categories– Yes, No– Go, No go– Machine 1, Machine 2, Machine 3– Pass/Fail

• Variable Data (Quantitative)– Discrete (Count) Data

• Maintenance equipment failures, fiber breakouts, number of clogs

– Continuous Data• Decimal subdivisions are meaningful• Dimension, chemical yield, cycle time


Discrete (Attribute) Continuous (Variable)

Continuous

(Variable)

Discrete

(Attribute)

Outputs

Inp

uts

Chi-square Analysis of Variance

Discriminate Analysis

Logistic regression

Correlation

Multiple Regression

Selecting Statistical Techniques

• There are statistical techniques available to analyze all combinations of input / output data.


Statistical Distributions

• We can describe the behavior of any process or system by plotting multiple data points for the same variable– over time– across products– on different machines, etc.

• The accumulation of this data can be viewed as a distribution of values

• Represented by:– dot plots– histograms– normal curve or other “smoothed” distribution


:

:

. . . : . .

:: : :::.:: :: . ::

. : .. .:.:.:::::::::::::::.::.::::..: : .

-------+---------+---------+---------+---------+-------GPM

49.00 49.50 50.00 50.50 51.00

Dot plot distribution• Imagine a metering pump, geared to pump material at 50

gallons/minute• The actual pump rate is measured at 100 separate instances

in time.• Each dot is plotted and represents one “event” of output at a

given value (pump speed). As the dots accumulate, the nature of the pump’s actual performance can be seen as a “distribution” of pump speed values.


51.350.850.349.849.348.8

40

30

20

10

0

GPM

Freq

uenc

y

Histogram Distribution• Now imagine the same data, grouped into “intervals”

with the number of times that a pump speed data point falls within a given interval determining the height of the interval bar.


52.051.551.050.550.049.549.048.548.0

GPM

Smoothed (Normal) distribution• Finally, we can view the data as a smoothed distribution (red line).• In this example using the “normal distribution” assumption (we’ll

discuss this later) provides an approximation of how the data might look if we were to collect an infinite number of data points.


mean Sample=X

“Population Parameters” “Sample Statistics”

m = Population mean

s = Sample standard deviation

Sample

Population

s = Population standard deviation

Population Parameters Vs Sample Statistics

• Population:– an entire group of objects that have been made or will be

made containing a characteristic of interest– is it likely we can ever know the true population parameters

• Sample:– the group of objects actually measured in a statistical study– a sample is usually a subset of the population of interest


Population Mean

N

X= 1

i

N

i

Sample Mean

n

x=x

n

1=ii

Population Standard Deviation

N

) (X=

N

1=i

2i

Sample Standard Deviation

1 ˆ

2

1

n

xxs

n

ii

Computational Equations


• Mean: Arithmetic average of a set of values

– Reflects the influence of all values

– Strongly Influenced by extreme values

• Median: Reflects the 50%rank - the center number after a set of numbers has been sorted

– Does not necessarily include all values in calculation

– Is “robust” to extreme scores

• Mode:

– Most frequently occurring value in a data set

• Why would we mainly use the mean, instead of the median, in process improvement efforts?

n

n

n nxx

1

Measures of Central Tendency


1n

)X(Xn

1i

2i

s

minmax Range

1n

)X(Xn

1i

2i

2

s

Measures of Variability:

• Range:– Numerical distance between the

highest and the lowest values in a data set.

• Variance (s2 ; s2 ):– The average squared deviation

of each individual data point from the mean.

• Standard Deviation (s ; s):– The square root of the variance.

• most commonly used measurement to quantify variability


1050

100

50

0

Deviates

Sq-D

ev

The Quadratic Deviation

• Squaring the deviation weights extreme deviations from the natural mean very heavily

(x - x) 2


22 Total

222

22X

12X

2total

21

21

2

1

So,

then,

;X VariableInput todue variance

;X VariableInput todue variance

output; process theof varianceIf

XX

XXTotal

Principle of Six Sigma• Variances add, standard deviations do not• Variances of the inputs add to calculate the total

variance in the output


The Normal Distribution

• The “Normal” Distribution is a distribution of data which has certain consistent properties

• These properties are very useful in our understanding of the characteristics of the underlying process from which the data were obtained

• Most natural phenomena and man-made processes are distributed normally, or can be represented as normally distributed


• Property 1: A normal distribution can be described completely by knowing only the:– mean, and– standard deviation

The Normal Distribution

Distribution OneDistribution One

Distribution Two

Distribution Two

Distribution ThreeDistribution Three

What is the difference among these three normal distributions?


The Normal Curve and Its Probabilities

43210-1-2-3-4

40%

30%

20%

10%

0%

Pro

bab

ilit

y o

f sa

mp

le v

alu

e

Number of standard deviations from the mean

99.73%

• Property 2: The area under sections of the curve can be used to estimate the cumulative probability of a certain “event” occurring

95%

68% Cumulative probability of obtaining a value between two values

Cumulative probability of obtaining a value between two values


Number ofStandard

DeviationsTheoretical

NormalEmpiricalNormal

+/- 168% 60-75%

+/- 295% 90-98%

+/- 399.7% 99-100%

Empirical Rules for the Standard Deviation• The previous rules of cumulative probability closely apply even

when a set of data is not perfectly normally distributed.• Let’s compare the values for a theoretical (perfect) normal

distributions to empirical (real-world) distributions.


Normal Probability Plots

• We can test whether a given data set can be described as “normal” with a test called a Normal Probability Plot

• If a distribution is close to normal, the normal probability plot will be a straight line.

• Minitab makes the normal probability plot easy.– Open Distskew.Mtw– Choose: Stat > Basic Stats > Normality Test >

• Produce a normal plot of each of the first 3 columns. Which appear to be normal?

• Now, graph a histogram of each.• What does this reveal?



80706050403020100

300

200

100

0

C3

Freq

uenc

y


13012011010090807060

300

200

100

0

C2

Freq

uenc

y


1101009080706050403020

100

50

0

C1

Freq

uenc

y


1069686766656463626

.999

.99

.95

.80

.50

.20

.05

.01

.001

Prob

abilit

y

Normal

p-value: 0.328A-Squared: 0.418

Anderson-Darling Normality Test

N of data: 500Std Dev: 10Average: 70

Normal Distribution

13012011010090807060

.999.99.95

.80

.50

.20

.05.01

.001

Prob

abilit

y

Pos Skew




Positive Skewed Distribution

80706050403020100

.999

.99

.95

.80

.50

.20

.05.01

.001

Prob

abilit

y

Neg Skew




Negative Skewed Distribution


Mystery Distribution

• Generate a Normal Probability Plot for the Mystery variable in C5.

• What is your conclusion? Is this a normal distribution?

15010050

.999

.99

.95

.80

.50

.20

.05

.01

.001

Prob

abilit

y

Mystery



N of data: 500Std Dev: 32.3849Average: 100

Mystery Distribution


Variable N Mean Median Tr Mean StDev SE Mean

Normal 500 70.000 69.977 70.014 10.000 0.447

Pos Skew 500 70.000 65.695 68.554 10.000 0.447

Neg Skew 500 70.000 73.783 71.368 10.000 0.447

Mystery 500 100.00 104.20 99.94 32.38 1.45

Variable Min Max Q1 Q3

Normal 29.824 103.301 63.412 76.653

Pos Skew 62.921 130.366 63.647 72.821

Neg Skew 1.866 77.106 67.891 76.290

Mystery 41.77 162.82 68.69 130.81

Exercise

• Open file DISTSKEW.MTW

• Stat > Basic Statistics > Display Descriptive Statistics


1801308030

95% Confidence Interval for Mu

1201101009080

95% Confidence Interval for Median

Variable: Mystery

82.78

30.49

97.15

Maximum3rd QuartileMedian1st QuartileMinimum

n of dataKurtosisSkewnessVarianceStd DevMean

p-value:A-Squared:

117.66

34.53

102.85

162.82 130.81 104.20 68.69 41.77

500.00 -1.63 0.01

1048.78 32.38 100.00

0.00 27.11

95% Confidence Interval for Median

95% Confidence Interval for Sigma

95% Confidence Interval for Mu


Descriptive Statistics

Stat > Basic Statistics > Display Descriptive Statistics> Graphs > Graphical Summary

Graphical Summary


Exercise in “Data Mining”

• Remember the basic premise of Six Sigma, that sources of variation can be:– Identified– Quantified– Eliminated or Controlled

• The following example investigates potential sources of variation in breaking strength in a spin draw process.– Output: Breaking Strength– Inputs Tracked: Day, Doff, Spinneret and Draw ratio

• Objective: Which X’s affects variation in Y• Filename: Bhhmult.mtw


Column Count Missing Name

C1 36 0 Day

C2 36 0 Doff

C3 36 0 Spinnert

C4 36 0 DrwRatio

C5 36 0 BrkStren The Info window of Minitab shows that the data set contains information about Day, Doff, Spinneret, Draw Ratio and Breaking Strength. There are 36 observations. The challenge is to determine what inputs are causing variation in the output.

Data Set


2927252321191715

10

5

0

BrkStren

Fre

qu

en

cy

Using the Graph > Histogram function we see the distribution of Breaking Strength. Values range from about 15 to about 30.

Variable N Mean Median Tr Mean StDev SE Mean

BrkStren 36 21.865 22.380 21.819 3.428 0.571

Variable Min Max Q1 Q3

BrkStren 15.330 29.720 19.242 24.138

Total Variation of Breaking Strength


Let’s look at Draw Ratio and its effects on the variability of Breaking Strength. We can go to Stat > Basic Stats > Display Descriptive Statistics. Use the “By” statement.

Mining the Data


Variable DrwRatio N Mean Median Tr Mean StDev SE Mean

BrkStren 1 12 18.774 18.990 18.625 2.560 0.739

5 12 22.282 22.815 22.377 1.821 0.526

10 12 24.538 24.565 24.621 3.017 0.871

Variable DrwRatio Min Max Q1 Q3

BrkStren 1 15.330 23.710 16.373 20.317

5 18.960 24.650 20.888 23.220

10 18.530 29.720 22.715 26.898

These results show that, as Draw Ratio varies from 1% to 10%, the average Breaking Strength varies from 18.8 to 24.5. If we could center Draw Ratio on 5%, the sigma for Breaking Strength would be reduced from 3.0 to about 1.8.

Breakdown by Draw Ratio


Go to Graph > Character Graph > Dotplot and display Break Strength BY Draw Ratio.

Data Mining Graphically


DrwRatio 1

. ... . .... . . .

---+---------+---------+---------+---------+---------+---BrkStren

DrwRatio 5

.. . : :: ..

---+---------+---------+---------+---------+---------+---BrkStren

DrwRatio 10

. . . . .. :. . . .

---+---------+---------+---------+---------+---------+---BrkStren

15.0 18.0 21.0 24.0 27.0 30.0

Exercise: Investigate Day, Doff and Spinneret in the same way and be ready to report conclusions. Which is the strongest input in explaining variation in Breaking Strength.

Dotplots

mod 5 05 basic statistics march 02

Documents