power and sample size - pyzdek...

8
A naI y ze Ph as e 369 The team met to discuss these results. They decided to set all factors that were not found to be statistically significant to the levels that cost the least to operate, and fac- tors Band D at their midpoints. The process would be monitored at these settings for a while to determine that the results were similar to what the team expected based on the experimental analysis. While this was done, another series of experiments would be planned to further explore the significant effects uncovered by the screening exper- iment. Based on the screening experiment, the linear model for estimating the defect rate was found from the coefficients in Table 10.10 to be Defect rate = 70.375 + 4B - 9.25D Power and Sample Size The term power of a statistical test refers to the probability that the will lead to correctly rejecting a false Null Hypothesis, that is, where beta is the probability of failing to reject the false Null Hypothesis. Generally, the power of a statistical test is improved when: There is a large difference between the null and alternative conditions, • The population sigma is small, The sample size is large; or, • The significance (a) is large. Many statistical software packages provide Power and Sample Size calculations. Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety of test formats. Example Consider a one-way ANOVA test of the hypothesis that four populations have equal means. A sample of n = 5 is taken from each population whose historical standard devi- ation is 2.0. If we are interested in detecting a difference of 3 units in the means, the software can estimate the power of the test after completing the Power and Sample Size for one-way ANOVA dialog box as: Number of levels: 4 Sample sizes: 5 • Values of the maximum difference between means: 3 Standard deviation: 2 Significance level (in the Options dialog): 0.05 The probability the assumption of equal means is rejected is found to be about 39% in this case. Note that if the sample size is increased to 10 the power is improved to 77%. Testing Common Assumptions Many statistical tests are only valid if certain underlying assumptions are met. In most cases, these assumptions are stated in the statistical textbooks along with the descriptions

Upload: others

Post on 20-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

A n a I y z e Ph a s e 369

The team met to discuss these results. They decided to set all factors that were not found to be statistically significant to the levels that cost the least to operate, and fac­tors Band D at their midpoints. The process would be monitored at these settings for a while to determine that the results were similar to what the team expected based on the experimental analysis. While this was done, another series of experiments would be planned to further explore the significant effects uncovered by the screening exper­iment.

Based on the screening experiment, the linear model for estimating the defect rate was found from the coefficients in Table 10.10 to be

Defect rate = 70.375 + 4B - 9.25D

Power and Sample Size The term power of a statistical test refers to the probability that the will lead to correctly rejecting a false Null Hypothesis, that is, 1-~, where beta is the probability of failing to reject the false Null Hypothesis. Generally, the power of a statistical test is improved when:

• There is a large difference between the null and alternative conditions,

• The population sigma is small,

• The sample size is large; or,

• The significance (a) is large.

Many statistical software packages provide Power and Sample Size calculations. Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety of test formats.

Example Consider a one-way ANOVA test of the hypothesis that four populations have equal means. A sample of n = 5 is taken from each population whose historical standard devi­ation is 2.0. If we are interested in detecting a difference of 3 units in the means, the software can estimate the power of the test after completing the Power and Sample Size for one-way ANOVA dialog box as:

• Number of levels: 4

• Sample sizes: 5

• Values of the maximum difference between means: 3

• Standard deviation: 2

• Significance level (in the Options dialog): 0.05

The probability the assumption of equal means is rejected is found to be about 39% in this case. Note that if the sample size is increased to 10 the power is improved to 77%.

Testing Common Assumptions Many statistical tests are only valid if certain underlying assumptions are met. In most cases, these assumptions are stated in the statistical textbooks along with the descriptions

tom
Line
Page 2: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

370 C hap te r Ten

of the particular statistical technique. This chapter describes some of the more common assumptions encountered in Six Sigma project work and how to test for them. However, the subject of testing underlying assumptions is a big one and you might wish to explore it further with a Master Black Belt.

Continuous versus Discrete Data Data come in two basic flavors: continuous and discrete, as discussed in Chap. 7. To review the basic idea, continuous data are numbers that can be expressed to any desired level of precision, at least in theory. For example, using a mercury thermometer I can say that the temperature is 75 degrees Fahrenheit. With a home digital thermometer I could say it's 75.4 degrees. A weather bureau instrument could add additional decimal places. Discrete data can only assume certain values. For example, the counting num­bers can only be integers. Some survey responses force the respondent to choose a par­ticular number from a list (pick a rating on a scale from 1 to 10).

Some statistical tests assume that you are working with either continuous or dis­crete data. For example, ANOVA assumes that continuous data are being analyzed, while chi-square and correspondence analysis assume that your data are counts. In many cases the tests are insensitive to departures from the data-type assumption. For example, expenditures can only be expressed to two decimal places (dollars and cents), but they can be treated as if they are continuous data. Counts can usually be treated as continuous data if there are many different counts in the data set. For example, if the data are defect counts ranging from 10 to 30 defects with all 21 counts showing up in the data (10, 11, 12, 28, 29, 30).

You Have Discrete Data But Need Continuous Data In some cases, however, the data type matters. For example, if discrete data are plotted on control charts intended for continu­ous data the control limit calculations will be incorrect. Run tests and other non­parametric tests will also be affected by this. The problem of "discretized" data is often caused by rounding the data to too few decimal places when they are recorded. This rounding can be human caused, or it might be a computer program not recording or displaying enough digits. The simple solution is to record more digits. The problem may be caused by an inadequate measurement system. This situation can be identified by a measurement system analysis (see Chap. 9). The problem can be readily detected by creating a dot plot of the data.

You Have Continuous Data But Need Discrete Data Let's say you want to determine if operator experience has an impact on the defects. One way to analyze this is to use a technique such as regression analysis to regress X = years of experience on Y = defects. Another would be to perform a chi-square analysis on the defects by experience level. To do this you need to put the operators into discrete categories, then analyze the defects in each category. This can be accomplished by "discretizing" the experience variable. For example, you might create the following discrete categories:

Experience (years) Experience Category

Less than 1 New

1 to 2 Moderately experienced

3 to 5 Experienced

More than 5 Very experienced

Page 3: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

Analyze Phase 371

The newly classified data are now suitable for chi-square analysis or other tech­niques that require discrete data.

Independence Assumption Statistical independence means that two values are not related to one another. In other words, knowing what one value provides no information as to what the other value is. If you throw two dice and I tell you that one of them is a 4, that information doesn't help you predict the value on the other die. Many statistical techniques assume that the data are independent. For example, if a regression model fits the data adequately, then the residuals will be independent. Control charts assume that the individual data values are independent; that is, knowing the diameter of piston 100 doesn't help me predict the diameter of piston 101, nor does it tell me what the diameter of piston 99 was. If I don't have independence, the results of my analysis will be wrong. I will believe that the model fits the data when it does not. I will tamper with controlled processes.

Independence can be tested in a variety of ways. If the data are normal (testing the normality assumption is discussed below) then the run tests described for control charts can be used.

A scatter plot can also be used. Let y = Xt_

1 and plot X versus Y. You will see random

patterns if the data are independent. Software such as Minitab offer several ways of examining independence in time series data. Note: lack of independence in time series data is called autocorrelation.

If you don't have independence you have several options. In many cases the best course of action is to identify the reason why the data are not independent and fix the underlying cause. If the residuals are not independent, add terms to the model. If the process is drifting, add compensating adjustments.

If fixing the root cause is not a viable option, an alternative is to use a statistical technique that accounts for the lack of independence. For example, the EWMA control chart or a time series analysis that can model autocorrelated data. Another is to modify the technique to work with your autocorrelated data, such as using sloped control lim­its on the control chart. If data are cyclical you can create uncorrelated data by using a sampling interval equal to the cycle length. For example, you can create a control chart comparing performance on Monday mornings.

Normality Assumption Statistical techniques such as t-tests, Z-tests, ANOVA, and many others assume that the data are at least approximately normal. This assumption is easily tested using software. There are two approaches to testing normality: graphical and statistical.

Graphical Evaluation of Normality One graphical approach involves plotting a histogram of the data, then superimposing a normal curve over the histogram. This approach works best if you have at least 200 data points, and the more the merrier. For small data sets the interpretation of the histogram is difficult; the usual problem is seeing a lack of fit when none exists. In any case, the interpretation is subjective and two people often reach different conclusions when viewing the same data. Figure 10.28 shows four histo­grams for normally distributed data with mean = 10, sigma = 1 and sample sizes rang­ing from 30 to 500.

An alternative to the histogram/normal curve approach is to calculate a "goodness-of-fit" statistic and a P-value. This gives an unambiguous acceptance criterion; usually the researcher rejects the assumption of normality if P < 0.05.

Page 4: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

372 Chapter Ten

_ (] x _ Ll "

1'i~I(]grim (If 100 . WIt" Normal Curve HISIogriilm or '50) • .....,11'1 Norm~1 CUI'IIIB

Cl,IC'ortI W'QIt!htet' W.ott ~I'w:~~1 ~ ________________ ~ ___ ~

FIGURE 10.28 Histograms with normal curves for different sample sizes.

However, it has the disadvantage of being nongraphical. This violates the three rules of data analysis:

1. Plot the data

2. Plot the data

3. Plot the data

To avoid violating these important rules, the usual approach is to supplement the statistical analysis with a probability plot. The probability plot is scaled so that normally distributed data will plot as a straight line. Figure 10.29 shows the prob­ability plots that correspond to the histograms and normal curves in Fig. 10.28. The table below Fig. 10.29 shows that the P-values are all comfortably above 0.05, lead­ing us to conclude that the data are reasonably close to the normal distribution.

N 30 100 200 500

P-Value 0.139 0.452 0.816 0.345

What to Do If the Data Aren't Normal When data are not normal, the following steps are usually pursued:

• Do nothing-Often the histogram or probability plot shows that the normal model fits the data well"where it counts." If the primary interest is in the tails,

Page 5: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

E" -"'" M....., J;.oI; .. Il""" (oj<" 'll'- to-\>

~~~.:J~~~~

.!.I

_. oI joo1 i l

.~

,0

100

- ... ~-"­.,.._ .....

'2

.~~~ .. JfiI

tI.W...D~

_ ... ...... u

'.",

FIGURE 10.29 Normal probability plots and goodness of fit tests.

200

10

500

A n a I y z e P has e 373

for example, and the curve fits the data well there, then proceed to use the nor­mal model despite the fact that the P-value is less than 0.05. Or if the model fits the middle of the distribution well and that's your focus, go with it. Likewise, if you have a very large sample you may get P-values greater than 0.05 even though the model appears to fit well everywhere. I work with clients who routinely analyze data sets of 100,000+ records. Samples this large will flag functionally and economically unimportant departures from normality as "statistically significant," but it isn't worth the time or the expense to do anything about it.

• Transform the data-It is often possible to make the data normal by performing a mathematical operation on the data. For example, if the data distribution has very long tails to the high side, taking the logarithm often creates data that are normally distributed. Minitab's control chart feature offers the Box-Cox nor­malizing power transformation that works with many data distributions encountered in Six Sigma work. The downside to transforming is that data have to be returned to the original measurement scale before being presented to non­technical personnel. Some statistics can't be directly returned to their original units; for example, if you use the log transform then you can't find the mean of the original data by taking the inverse log of the mean of the transformed data.

• Use averages-Averages are a special type of transformation because averages of subgroups always tend to be normally distributed, even if the underlying

Page 6: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

374 C hap te r Ten

data are not. Sometimes the subgroup sizes required to achieve normality can be quite smalL

• Fit another statistical distribution-The normal distribution isn't the only game in town. Try fitting other curves to the data, such as the Weibull or the exponen­tiaL Most statistics packages, such as Minitab, have the ability to do this. If you have a knack for programming spreadsheets, you can use Excel's solver add-in to evaluate the fit of several distributions.

• Use a non-parametric technique-There are statistical methods, called non­parametric methods, that don't make any assumptions about the underlying distribution of the data. Rather than evaluating the differences of parameters such as the mean or variance, non-parametric methods use other comparisons. For example, if the observations are paired they may be compared directly to see if the after is different than the before. Or the method might examine the pattern of points above and below the median to see if the before and after val­ues are randomly scattered in the two regions. Or ranks might be analyzed. Non-parametric statistical methods are discussed later in this chapter.

Equal Variance Assumption Many statistical techniques assume equal variances. ANOVA tests the hypothesis that the means are equal, not that variances are equaL In addition to assuming normality, ANOVA assumes that variances are equal for each treatment. Models fitted by regres­sion analysis are evaluated partly by looking for equal variances of residuals for differ­ent levels of XS and Y.

Minitab's test for equal variances is found in Stat> ANOVA > Test for Equal Variances. You need a column containing the data and one or more columns specify­ing the factor level for each data point. If the data have already passed the normal­ity test, use the P-value from Bartlett' s test to test the equal variances assumption. Otherwise, use the P-value from Levene's test. The test shown in Fig. 10.30 involved five factor levels and Minitab shows a confidence interval bar for sigma of each of the five samples; the tick mark in the center of the bar represents the sample sigma. These are the data from the sample of 100 analyzed earlier and found to be normally distributed, so Bartlett' s test can be used. The P-value from Bartlett's test is 0.182, indicating that we can expect this much variability from populations with equal variances 18.2% of the time. Since this is greater than 5%, we fail to reject the null hypothesis of equal variances. Had the data not been normally distributed we would've used Levene's test, which has a P-value of 0.243 and leads to the same conclusion.

Linear Model Assumption Many types of associations are nonlinear. For example, over a given range of x values, y might increase, and for other x values, y might decrease. This curvilinear relationship is shown in Fig. 10.31.

Here we see that y increases when x increases and is less than 1, and decreases as x increases when x is greater than 1. Curvilinear relationships are valuable in the design of robust systems. A wide variety of processes produces such relationships.

It is often helpful to convert these nonlinear forms to linear form for analysis using standard computer programs or scientific calculators. Several such transformations are shown in Table 10.11.

Page 7: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

Test for equal variances

95% confidence intervals for sigmas Factor levels

I

0.5 I

1.0 I

1.5 I

2.0

2

3

4

5

FIGURE 10.30 Output from Minitab's test for equal variances.

12

• • • • 10 •

• 8

Y 6

4

2

0 0 2

X

FIGURE 10.31 Scatter diagram of a curvilinear relationship.

3

A n a I y z e P has e 375

Bartlett's test

Test statistic: 6.233 P-value : 0.182

Levene's test

Test statistic: 1.391 P-value : 0.243

4

Page 8: Power and Sample Size - Pyzdek Institutepyzdek.mrooms.net/file.php/1/reading/bb-reading/test...Minitab's Power and Sample Size option in the Stat menu can estimate these for a variety

376 C hap te r Ten

Convert Straight Line Constants

Plot the (80

And 81

) to Original

If the Relationship Is Transformed Variables Constants

of the Form: Yr Xr bo b1

b y 1 b Y=a+- - a

X X

1 1 X b y= a+bX - a

Y

X X X b Y=-- - a

a+bX Y

Y= ab x log Y X log a log b

Y= ae bx log Y X log a b log e

Y= aXb log Y log X log a b

Y= a + bxn Y xn b

where n is known a

(From Natrella (1963), pp. 5-31)

TABLE 10.11 Some Linearizing Transformations

Fit the straight line YT

= bo + b1XT

using the usual linear regression procedures (see below). In all formulas, substitute Y

T for Y and X

T for X. A simple method for selecting

a transformation is to simply program the transformation into a spreadsheet and run regressions using every transformation. Then select the transformation which gives the largest value for the statistic R2.

There are other ways of analyzing nonlinear responses. One common method is to break the response into segments that are piecewise linear, and then to analyze each piece separately. For example, in Fig. 10.31 Y is roughly linear and increasing over the range 0 < x < 1 and linear and decreasing over the range x > 1. Of course, if the analyst has access to powerful statistical software, nonlinear forms can be analyzed directly.

Analysis of Categorical Data

Making Comparisons Using Chi-Square Tests In Six Sigma, there are many instances when the analyst wants to compare the percent­age of items distributed among several categories. The things might be operators, meth­ods, materials, or any other grouping of interest. From each of the groups a sample is taken, evaluated, and placed into one of several categories (e.g., high quality, marginal quality, reject quality) . The results can be presented as a table with m rows representing the groups of interest and k columns representing the categories. Such tables can be analyzed to answer the question "Do the groups differ with regard to the proportion of items in the categories?" The chi-square statistic can be used for this purpose.

tom
Line