1 design and analysis of experiments (2) basic statistics kyung-ho park
DESCRIPTION
3 Population –Totality of the observations with which we are concerned Sample –A subset of observations selected from a populationTRANSCRIPT
1
Design and Analysis of Experiments (2)
Basic Statistics
Kyung-Ho Park
2
Descriptive Statistics:deals with procedures used to summarize the information contained in a set of measurements.
Inferential Statistics: deals with procedures used to make inferences (predictions) about a population parameter from information contained in a sample.
3
• Population– Totality of the observations with which we are
concerned
• Sample– A subset of observations selected from a
population
4
Population Sample
Mean μ
Variance σ2
Standard deviation σ
Mean x
Variance S2
Standard deviation S
5
Descriptive statistics
Numerical Methods
Graphical Methods
6
Measures of Central Tendency (Location)
1) sample mean:
2) sample median: the middle number when the measurements are arranged in ascending order
3) sample mode: most frequently occurring value
Numerical methods
nx
x
i) sample is sensitive to extreme values
ii) the median is insensitive to extreme values
7
Measures of Dispersion (Variability)
1) range: max – min
2) sample variance:
3) sample standard deviation:
Numerical methods
1)( 2
2
n
xxs
2ss
8
Numerical methods
Measures of Central Measures of DispersionTendency (Location) (Variability)
1. Sample mean 1. Range2. Sample median 2. Mean Absolute Deviation (MAD)3. Sample mode 3. Sample Variance 4. Sample Standard Deviation
9
Graphical Methods
105 221 183 186 121 181 180 143
97 154 153 174 120 168 167 141
245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149
Table1: Compressive Strength (in psi) of 80 Aluminum-Lithium Alloy Speciments
10
Graphical Methods
c1
Freq
uenc
y
24020016012080
25
20
15
10
5
0
Mean 162.7StDev 33.77N 80
Histogram of c1Normal
c125022520017515012510075
Dotplot of c1
Histogram
Dot Plot
Stem-and-Leaf Display: c1
Stem-and-leaf of c1 N = 80Leaf Unit = 1.0
LO 76, 87
3 9 7 5 10 15 8 11 058 11 12 013 17 13 133455 25 14 12356899 37 15 001344678888(10) 16 0003357789 33 17 0112445668 23 18 0011346 16 19 034699 10 20 0178 6 21 8 5 22 189
HI 237, 245
11
c1
250
200
150
100
Boxplot of c1
second quartilefirst quartile third quartile
whisker extends to smallest data point with 1.5 interquartile ranges from first quartile
Extremeoutliers
whisker extends to largest data point with 1.5 interquartile ranges from third quartile
outliers
IQR 1.5 IQR 1.5 IQR1.5 IQR1.5 IQR
Box Plot
12
Probability PlotsGraphical method for determining whether sample data conform to a hypothesized distribution based on a subjective visual examination of the data
10 observations on the effective service life in minutes of batteries in a portable personal computer
176, 191, 214, 220, 205, 192, 201, 190, 193, 185
j X(j) (j-0.5)/10 Zj1 176 0.05 -1.642 183 0.15 -1.043 185 0.25 -0.674 190 0.35 -0.395 191 0.45 -0.136 192 0.55 0.137 201 0.65 0.398 205 0.75 0.679 214 0.85 1.0410 220 0.95 1.64
(j-0.5)/n=P(Z ≤ zi)
c1
Perc
ent
230220210200190180170160
99
9590
80706050403020
105
1
Mean
0.636
195.7StDev 14.03N 10AD 0.257P-Value
Probability Plot of c1Normal
13
Probability Plots (table 1)
c1
Perc
ent
25020015010050
99.9
99
959080706050403020105
1
0.1
Mean
0.668
162.7StDev 33.77N 80AD 0.270P-Value
Probability Plot of c1Normal
14
Population Sample
Mean μ
Variance σ2
Standard deviation σ
Mean x
Variance S2
Standard deviation S
Estimation
15
Normal DistributionDistribution of a random variable (sampling): Normal distribution
y: a normal random variable
the probability distribution of y
2])()[21exp{(
21)(
yyf
),( 2NY
16
c1
Freq
uenc
y
24020016012080
25
20
15
10
5
0
Mean 162.7StDev 33.77N 80
Histogram of c1Normal
17
Standard Normal Distribution
1, 2 orandom variable
yz
)1,0(Nz
%73.993%44.952%26.681
18
Ex.1 Suppose the current measurement in a strip of wire are assumed to follow a normal distribution with a mean of 10 milliamperes and a variance of 4 (milliamperes)2. What is the probability that a measurement will exceed 13 milliamperes?
06681.0)5.1()2
)1013(2
)10(()13(
ZPXPXP
Cumulative Distribution Function Normal with mean = 10 and standard deviation = 2 x P(?X?<=?x?)13 0.933193
MiniTab
Cal – Probability distribution – Normal
Mean=10.0, S.D=2, Input Constant=13.0
19
Confidence Interval (CI)
20
Confidence Interval (CI)
sampling variability :
x
Interval estimate for a population parameter : confidence interval
CI is constructed so that we have high confidence that it does contain the unknown population parameter
If is the sample mean of a random sample of size n from a normal population with known variance σ2, a 100(1-α)% CI on μ is given by
Where zα/2 is the upper 100 α/2 percentage point of the standard normal distribution
x
nzxnzx // 2/2/
21
Ex 2. Ten measurements of impact energy(J) on specimens of A238 steel cut at 60 as a follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2 and 64.3. ℃Assume that impact energy energy is normally distributed with σ=1J. We want to find a 95% CI for μ, the mean impact energy
nzxnzx // 2/2/
zα/2 = z0.025=1.96
n=10, σ=1
46.64x
08.6584.6310196.146.64
10196.146.64
22
64.1
64.7
64.5
64.6
64.5
64.3
64.6
64.8
64.2
64.3
One-Sample Z: C1
The assumed standard deviation = 1
Variable N Mean StDev SE Mean 95% CIC1 10 64.4600 0.2271 0.3162 (63.8402, 65.0798)
Stat -> Basic Stat -> 1 sample Z
(Example 2)
23
Confidence Interval (CI)
If and s are the mean and standard deviation of of a random sample from a normal population with unknown variance σ2, a 100(1-α)% CI on μ is given by
Where tα/2,n-1 is the upper 100 α/2 percentage point of the t distribution with n-1 degrees fo freedom
x
nstxnstx nn 1,2/1,2/ /
24
Ex. 3 An article describes the results of tensile adhesion tests on 22U-700 alloy specimens. The load at specimen failure is as follows (in megapascals):
19.8 10.1 14.9 7.5 15.4 15.4 15.4 18.5 7.9 12.7 11.9 11.4 11.4 14.1 17.6 16.7 15.8 19.5 8.8 13.6 11.9 11.4
We want to find a 95% CI for μ
55.3,71.13 sx
n=22, n-1=21, t0.025,21 = 2.080
nstxnstx nn 1,2/1,2/ /
28.1514.1257.17.1357.171.13
22/)55.3(080.271.1322/)55.3(080.271.13
25
Variable N Mean StDev SE Mean 95% CIC1 22 13.7136 3.5536 0.7576 (12.1381, 15.2892)
C1
20
18
16
14
12
10
8
6
Boxplot of C1
C1
Perc
ent
22.520.017.515.012.510.07.55.0
99
9590
80706050403020
105
1
Mean
0.838
13.71StDev 3.554N 22AD 0.211P-Value
Probability Plot of C1Normal
26
Hypothesis Test
27
Hypothesis Test
We illustrated how to construct a confidence interval estimate of a parameter from sample data
Many problems in engineering require that we decide whether to accept or reject a statement about some parameter : Hypothesis
Decision-making procedure about the hypothesis : hypothesis testing
Hypothesis testing and CI estimation of parameters : Data analysis stage of a comparative experiment
28
Tensile adhesion tests on 22U-700 alloy specimens (Example.3)
We are interested in deciding whether or not the tensile adhesion is 14 megapascals
H0: μ= 14 megapascals Null hypothesis
H1 μ≠14. megapascals Alternative hypothesis
H1 μ≠14 Two-sided alternative hypothesis
H1 μ<>14 One-sided alternative hypothesis
29
Probability of making a type I error: significance level, (α-error)
α=0.05, 0.01 (confidence level : 95.0, 99.0)
α = P(type I error) = P(reject H0 when H0 is true)
β = P(type II error) = P(fail to reject H0 when H0 is false)
30
One-Sample T: C1 Test of mu = 15 vs not = 15Variable N Mean StDev SE Mean 95% CI T PC1 22 13.7136 3.5536 0.7576 (12.1381, 15.2892) -1.70 0.104
MiniTab
Stat-Basic statistics -1t
Test mean=15
Option
Confidence level:95.0, Alternative: not equal
Hypotheses Tests for a Single Sample
31
Hypotheses Tests for Two Samples
Number Catalyst 1
Catalyst 2
1 91.50 89.19 2 94.18 90.95 3 92.18 90.46 4 95.39 93.21 5 91.79 97.19 6 89.07 97.04 7 94.72 91.07 8 89.21 92.75
Average 92.255 92.733 s 2.39 2.98
Table. Catalyst Yield Data
Data
Catalyst 2Catalyst 1
98
97
96
95
94
93
92
91
90
89
Boxplot of Catalyst 1, Catalyst 2
Data
Perc
ent
100.097.595.092.590.087.585.0
99
9590
80706050403020
105
1
Mean0.516
92.73 2.983 8 0.454 0.194
StDev N AD P92.26 2.385 8 0.292
VariableCatalyst 1Catalyst 2
Probability Plot of Catalyst 1, Catalyst 2Normal
32
MiniTab
Hypotheses Tests for Two Samples
Two-Sample T-Test and CI: Catalyst 1, Catalyst 2 Two-sample T for Catalyst 1 vs Catalyst 2 N Mean StDev SE MeanCatalyst 1 8 92.26 2.39 0.84Catalyst 2 8 92.73 2.98 1.1Difference = mu (Catalyst 1) - mu (Catalyst 2)Estimate for difference: -0.47750095% CI for difference: (-3.394928, 2.439928)T-Test of difference = 0 (vs not =): T-Value = -0.35 P-Value = 0.729 DF = 13
Stat-Basic statistics -2t(2-sample t)
Sample in different columns
Option
-Confidence level:95.0, - Alternative: not equal
33
Hypotheses Tests for Two Paired Samples
Specimen Tip1 Tip2
1 7 62 3 33 3 54 4 35 8 86 3 27 2 48 9 99 5 4
10 4 5
Ex.5 Data for Hardness testing Experiment
Data
Tip2Tip1
9
8
7
6
5
4
3
2
Boxplot of Tip1, Tip2
Data
Perc
ent
1086420
99
9590
80706050403020
105
1
Mean0.120
4.9 2.234 10 0.337 0.425
StDev N AD P4.8 2.394 10 0.542
VariableTip1Tip2
Probability Plot of Tip1, Tip2Normal
34
MiniTab
Hypotheses Tests for Two Samples
Stat-Basic statistics t-t (paired t)
Sample in different columns
Option
-Confidence level:95.0, - Alternative: not equal
35
Hypotheses Tests for Two Paired SamplesPaired T for Tip1 - Tip2
N Mean StDev SE MeanTip1 10 4.80000 2.39444 0.75719Tip2 10 4.90000 2.23358 0.70632Difference 10 -0.100000 1.197219 0.378594
95% CI for mean difference: (-0.956439, 0.756439)T-Test of mean difference = 0 (vs not = 0): T-Value = -0.26 P-Value = 0.798
Two-sample T for Tip1 vs Tip2
N Mean StDev SE MeanTip1 10 4.80 2.39 0.76Tip2 10 4.90 2.23 0.71
Difference = mu (Tip1) - mu (Tip2)Estimate for difference: -0.10000095% CI for difference: (-2.284675, 2.084675)T-Test of difference = 0 (vs not =): T-Value = -0.10 P-Value = 0.924 DF = 17