mod 5 05 basic statistics march 02
DESCRIPTION
Mod 5 05 Basic Statistics March 02TRANSCRIPT
1© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Basic Statistics for Process Improvement
2© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
The Breakthrough StrategyDefine BB Works with Management
1 Select Output Characteristic and identify key process input and output variables
2 Define Performance Standards
3 Validate Measurement System
4 Establish Product Capability
5 Define Performance Objectives
6 Identify Variation Sources
7 Screen Potential Causes
8 Discover Variable Relationships
9 Establish Operating Tolerances
10 Validate Measurement System
11 Determine Process Capability
12 Implement Process Controls
Measure
Analyze
Improve
Control
Characterize
Optimize
3© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Measurement Phase• Project Definition:
– Problem Description – Project Metrics
• Process Exploration:– Process Flow Diagram– C&E Matrix, PFMEA, Fishbones– Data collection system
• Measurement System(s) Analysis (MSA):– Attribute / Variable Gage Studies
• Capability Assessment (on each Y)– Capability (Cpk, Ppk, s Level, DPU, RTY)
• Graphical & Statistical Tools• Project Summary
– Conclusion(s)– Issues and barriers– Next steps
• Completed “Local Project Review”
4© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
2520151050
75
70
65
Sample Number
Sam
ple
Mea
n
X-Bar Chart for Process A
X=70.91
UCL=77.20
LCL=64.62
2520151050
80
70
60
50
Sample Number
Sam
ple
Mea
n
X-Bar Chart for Process B
X=70.98
UCL=77.27
LCL=64.70
Basic Statistics Fundamentals of Improvement
• Variability– Is the process on target with minimum variability?– We use the mean to determine if process is on target. We
use the Standard Deviation to determine spread• Stability
– How does the process perform over time?– Stability is represented by a constant mean and predictable
variability over time.
5© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
20100
1451351251151059585756555
Sample Number
Sam
ple
Mea
n
X-bar Chart for Machine A
X=100.7
138.4
62.9320100
110
100
90
Sample Number
Sam
ple
Mea
n
X-bar Chart for Machine B
1
1
X=101.0
108.5
93.42
20100
120
115
110
Sample NumberS
ampl
e M
ean
X-bar Chart for Machine C
X=115.0
119.7
110.4
Warm-Up Exercise• Assume machines A, B, and C make identical products (w/range
charts in control)• Assume that the target value for each product output variable is 100
mm• Answer the following questions:
– Which machines exhibit(s) variation?– Where is each machine centered?– Which machines are predictable over time?– Which machines have special cause variation?– Which machine would you want making your product?– Which machine would probably be easiest to fix?
6© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Co
stC
ost
LSLLSLUSLUSLNomNom
Taguchi Loss Function
(New View)
Taguchi Loss Function
(New View)
LSLLSLUSLUSLNomNom USLUSL
Traditional View
Traditional ViewAcceptableAcceptable
Can We Tolerate Variability?• There will always be variability present in any process• We can tolerate variability if:
– the process is on target– the total variability is relatively small compared to the
process specifications– the process is stable over time
7© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Data Analysis Tasks for Improvement
• Determine if process is stable– If process is not stable, identify and remove causes
(X’s) of instability (obvious non-random variation)• Estimate the magnitude of the total variability. Is it
acceptable with respect to the customer requirements (spec limits)?– If not, identify the sources of the variability and
eliminate or reduce their influence on the process• Determine the location of the process mean. Is it on
target?– If not, identify the variables (X’s) which affect the mean
and determine optimal settings to achieve target value• We will now review statistics that help this process
8© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Types of Outputs (Data)
• Attribute Data (Qualitative)– Categories– Yes, No– Go, No go– Machine 1, Machine 2, Machine 3– Pass/Fail
• Variable Data (Quantitative)– Discrete (Count) Data
• Maintenance equipment failures, fiber breakouts, number of clogs
– Continuous Data• Decimal subdivisions are meaningful• Dimension, chemical yield, cycle time
9© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Discrete (Attribute) Continuous (Variable)
Continuous
(Variable)
Discrete
(Attribute)
Outputs
Inp
uts
Chi-square Analysis of Variance
Discriminate Analysis
Logistic regression
Correlation
Multiple Regression
Selecting Statistical Techniques
• There are statistical techniques available to analyze all combinations of input / output data.
10© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Statistical Distributions
• We can describe the behavior of any process or system by plotting multiple data points for the same variable– over time– across products– on different machines, etc.
• The accumulation of this data can be viewed as a distribution of values
• Represented by:– dot plots– histograms– normal curve or other “smoothed” distribution
11© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
:
:
. . . : . .
:: : :::.:: :: . ::
. : .. .:.:.:::::::::::::::.::.::::..: : .
-------+---------+---------+---------+---------+-------GPM
49.00 49.50 50.00 50.50 51.00
Dot plot distribution• Imagine a metering pump, geared to pump material at 50
gallons/minute• The actual pump rate is measured at 100 separate instances
in time.• Each dot is plotted and represents one “event” of output at a
given value (pump speed). As the dots accumulate, the nature of the pump’s actual performance can be seen as a “distribution” of pump speed values.
12© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
51.350.850.349.849.348.8
40
30
20
10
0
GPM
Freq
uenc
y
Histogram Distribution• Now imagine the same data, grouped into “intervals”
with the number of times that a pump speed data point falls within a given interval determining the height of the interval bar.
13© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
52.051.551.050.550.049.549.048.548.0
GPM
Smoothed (Normal) distribution• Finally, we can view the data as a smoothed distribution (red line).• In this example using the “normal distribution” assumption (we’ll
discuss this later) provides an approximation of how the data might look if we were to collect an infinite number of data points.
14© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
mean Sample=X
“Population Parameters” “Sample Statistics”
m = Population mean
s = Sample standard deviation
Sample
Population
s = Population standard deviation
Population Parameters Vs Sample Statistics
• Population:– an entire group of objects that have been made or will be
made containing a characteristic of interest– is it likely we can ever know the true population parameters
• Sample:– the group of objects actually measured in a statistical study– a sample is usually a subset of the population of interest
15© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Population Mean
N
X= 1
i
N
i
Sample Mean
n
x=x
n
1=ii
Population Standard Deviation
N
) (X=
N
1=i
2i
Sample Standard Deviation
1 ˆ
2
1
n
xxs
n
ii
Computational Equations
16© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
• Mean: Arithmetic average of a set of values
– Reflects the influence of all values
– Strongly Influenced by extreme values
• Median: Reflects the 50%rank - the center number after a set of numbers has been sorted
– Does not necessarily include all values in calculation
– Is “robust” to extreme scores
• Mode:
– Most frequently occurring value in a data set
• Why would we mainly use the mean, instead of the median, in process improvement efforts?
n
n
n nxx
1
Measures of Central Tendency
17© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
1n
)X(Xn
1i
2i
s
minmax Range
1n
)X(Xn
1i
2i
2
s
Measures of Variability:
• Range:– Numerical distance between the
highest and the lowest values in a data set.
• Variance (s2 ; s2 ):– The average squared deviation
of each individual data point from the mean.
• Standard Deviation (s ; s):– The square root of the variance.
• most commonly used measurement to quantify variability
18© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
1050
100
50
0
Deviates
Sq-D
ev
The Quadratic Deviation
• Squaring the deviation weights extreme deviations from the natural mean very heavily
(x - x) 2
19© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
22 Total
222
22X
12X
2total
21
21
2
1
So,
then,
;X VariableInput todue variance
;X VariableInput todue variance
output; process theof varianceIf
XX
XXTotal
Principle of Six Sigma• Variances add, standard deviations do not• Variances of the inputs add to calculate the total
variance in the output
20© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
The Normal Distribution
• The “Normal” Distribution is a distribution of data which has certain consistent properties
• These properties are very useful in our understanding of the characteristics of the underlying process from which the data were obtained
• Most natural phenomena and man-made processes are distributed normally, or can be represented as normally distributed
21© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
• Property 1: A normal distribution can be described completely by knowing only the:– mean, and– standard deviation
The Normal Distribution
Distribution OneDistribution One
Distribution Two
Distribution Two
Distribution ThreeDistribution Three
What is the difference among these three normal distributions?
22© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
The Normal Curve and Its Probabilities
43210-1-2-3-4
40%
30%
20%
10%
0%
Pro
bab
ilit
y o
f sa
mp
le v
alu
e
Number of standard deviations from the mean
99.73%
• Property 2: The area under sections of the curve can be used to estimate the cumulative probability of a certain “event” occurring
95%
68% Cumulative probability of obtaining a value between two values
Cumulative probability of obtaining a value between two values
23© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Number ofStandard
DeviationsTheoretical
NormalEmpiricalNormal
+/- 168% 60-75%
+/- 295% 90-98%
+/- 399.7% 99-100%
Empirical Rules for the Standard Deviation• The previous rules of cumulative probability closely apply even
when a set of data is not perfectly normally distributed.• Let’s compare the values for a theoretical (perfect) normal
distributions to empirical (real-world) distributions.
24© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Normal Probability Plots
• We can test whether a given data set can be described as “normal” with a test called a Normal Probability Plot
• If a distribution is close to normal, the normal probability plot will be a straight line.
• Minitab makes the normal probability plot easy.– Open Distskew.Mtw– Choose: Stat > Basic Stats > Normality Test >
• Produce a normal plot of each of the first 3 columns. Which appear to be normal?
• Now, graph a histogram of each.• What does this reveal?
25© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Normal Probability Plots
80706050403020100
300
200
100
0
C3
Freq
uenc
y
Normal Probability Plots
13012011010090807060
300
200
100
0
C2
Freq
uenc
y
Normal Probability Plots
1101009080706050403020
100
50
0
C1
Freq
uenc
y
Normal Probability Plots
1069686766656463626
.999
.99
.95
.80
.50
.20
.05
.01
.001
Prob
abilit
y
Normal
p-value: 0.328A-Squared: 0.418
Anderson-Darling Normality Test
N of data: 500Std Dev: 10Average: 70
Normal Distribution
13012011010090807060
.999.99.95
.80
.50
.20
.05.01
.001
Prob
abilit
y
Pos Skew
p-value: 0.000A-Squared: 46.447
Anderson-Darling Normality Test
N of data: 500Std Dev: 10Average: 70
Positive Skewed Distribution
80706050403020100
.999
.99
.95
.80
.50
.20
.05.01
.001
Prob
abilit
y
Neg Skew
p-value: 0.000A-Squared: 43.953
Anderson-Darling Normality Test
N of data: 500Std Dev: 10Average: 70
Negative Skewed Distribution
26© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Mystery Distribution
• Generate a Normal Probability Plot for the Mystery variable in C5.
• What is your conclusion? Is this a normal distribution?
15010050
.999
.99
.95
.80
.50
.20
.05
.01
.001
Prob
abilit
y
Mystery
p-value: 0.000A-Squared: 27.108
Anderson-Darling Normality Test
N of data: 500Std Dev: 32.3849Average: 100
Mystery Distribution
27© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Variable N Mean Median Tr Mean StDev SE Mean
Normal 500 70.000 69.977 70.014 10.000 0.447
Pos Skew 500 70.000 65.695 68.554 10.000 0.447
Neg Skew 500 70.000 73.783 71.368 10.000 0.447
Mystery 500 100.00 104.20 99.94 32.38 1.45
Variable Min Max Q1 Q3
Normal 29.824 103.301 63.412 76.653
Pos Skew 62.921 130.366 63.647 72.821
Neg Skew 1.866 77.106 67.891 76.290
Mystery 41.77 162.82 68.69 130.81
Exercise
• Open file DISTSKEW.MTW
• Stat > Basic Statistics > Display Descriptive Statistics
28© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
1801308030
95% Confidence Interval for Mu
1201101009080
95% Confidence Interval for Median
Variable: Mystery
82.78
30.49
97.15
Maximum3rd QuartileMedian1st QuartileMinimum
n of dataKurtosisSkewnessVarianceStd DevMean
p-value:A-Squared:
117.66
34.53
102.85
162.82 130.81 104.20 68.69 41.77
500.00 -1.63 0.01
1048.78 32.38 100.00
0.00 27.11
95% Confidence Interval for Median
95% Confidence Interval for Sigma
95% Confidence Interval for Mu
Anderson-Darling Normality Test
Descriptive Statistics
Stat > Basic Statistics > Display Descriptive Statistics> Graphs > Graphical Summary
Graphical Summary
29© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Exercise in “Data Mining”
• Remember the basic premise of Six Sigma, that sources of variation can be:– Identified– Quantified– Eliminated or Controlled
• The following example investigates potential sources of variation in breaking strength in a spin draw process.– Output: Breaking Strength– Inputs Tracked: Day, Doff, Spinneret and Draw ratio
• Objective: Which X’s affects variation in Y• Filename: Bhhmult.mtw
30© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Column Count Missing Name
C1 36 0 Day
C2 36 0 Doff
C3 36 0 Spinnert
C4 36 0 DrwRatio
C5 36 0 BrkStren The Info window of Minitab shows that the data set contains information about Day, Doff, Spinneret, Draw Ratio and Breaking Strength. There are 36 observations. The challenge is to determine what inputs are causing variation in the output.
Data Set
31© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
2927252321191715
10
5
0
BrkStren
Fre
qu
en
cy
Using the Graph > Histogram function we see the distribution of Breaking Strength. Values range from about 15 to about 30.
Variable N Mean Median Tr Mean StDev SE Mean
BrkStren 36 21.865 22.380 21.819 3.428 0.571
Variable Min Max Q1 Q3
BrkStren 15.330 29.720 19.242 24.138
Total Variation of Breaking Strength
32© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Let’s look at Draw Ratio and its effects on the variability of Breaking Strength. We can go to Stat > Basic Stats > Display Descriptive Statistics. Use the “By” statement.
Mining the Data
33© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Variable DrwRatio N Mean Median Tr Mean StDev SE Mean
BrkStren 1 12 18.774 18.990 18.625 2.560 0.739
5 12 22.282 22.815 22.377 1.821 0.526
10 12 24.538 24.565 24.621 3.017 0.871
Variable DrwRatio Min Max Q1 Q3
BrkStren 1 15.330 23.710 16.373 20.317
5 18.960 24.650 20.888 23.220
10 18.530 29.720 22.715 26.898
These results show that, as Draw Ratio varies from 1% to 10%, the average Breaking Strength varies from 18.8 to 24.5. If we could center Draw Ratio on 5%, the sigma for Breaking Strength would be reduced from 3.0 to about 1.8.
Breakdown by Draw Ratio
34© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
Go to Graph > Character Graph > Dotplot and display Break Strength BY Draw Ratio.
Data Mining Graphically
35© Visteon Corporation BB Mod #5 Basic Stats Rev 1.0 3/02
DrwRatio 1
. ... . .... . . .
---+---------+---------+---------+---------+---------+---BrkStren
DrwRatio 5
.. . : :: ..
---+---------+---------+---------+---------+---------+---BrkStren
DrwRatio 10
. . . . .. :. . . .
---+---------+---------+---------+---------+---------+---BrkStren
15.0 18.0 21.0 24.0 27.0 30.0
Exercise: Investigate Day, Doff and Spinneret in the same way and be ready to report conclusions. Which is the strongest input in explaining variation in Breaking Strength.
Dotplots