proportion testing

Proportion Testing

October 2, 2009

Statistics Symposium

2

Outline1. What we are covering and what we are not covering today

2. Virtual Scavenger Hunt

3. Statistical Decisions and Risk

4. Six Sigma DMAIC application

5. The Business Approach

6. Hypothesis Test Approach

7. Understanding Distributions

8. Sample Size

9. Test of Independence

10. Example 1: Regulatory Compliance Documentation

11. Example 2: Workload Balance (Productivity)

12. References and Web Sites

13. Q&A

Hypothesis Tests: What we are covering?Continuous Data

Attribute Data

1 sample t-test : Δ mean from known test mean

2 sample t-test: Δ mean between 2 independent sample means

Paired t-test: Δ mean between 2 dependent sample means

One Way ANOVA :At least 1 sample mean Δ

between 3 or more samples

Kruskal Wallis & Mood’s Median:At least 1 sample median Δ between 3 or more samples

F-test, Levene’s test, & Bartlett’s test: At least 1 sample standard deviation Δ

between 3 or more samples

Correlation/Regression/DOE: 2 or more factors are correlated/

Predictor affects the sampled process

1 proportion test: A sample proportion Δ against a known value

2 proportion test: Proportions from the two samples are different

Chi Square test: At least one sample proportion Δ from others:

4

Scavenger HuntFind another person who can sign off on these statements. Each person can only sign once.

1. Has used Chi-Square or Proportion Test

2. Has more than $50 on them

3. Used Minitab to determine sample size

4. Worked on a project with a value proposition >$1 million

5. Knows Chris Connors' middle name

6. Has more than three children

7. Has met a movie star or celebrity (and was not arrested)

8. Knows the difference between a confidence interval and a confidence level

9. Knows what a quark or a fantod is

10. Has more than one academic degree, license or certification

5

Statistical Decision: Setting up your risk levelType I and II errors

There are two kinds of errors that can be made in significance testing:

(1) a true null hypothesis can be incorrectly rejected and

(2) a false null hypothesis can fail to be rejected. The former error is called a Type I error and the latter error is called a Type II error. These two types of errors are defined in the table.

The probability of a Type I error is designated by the Greek letter alpha () and is called the Type I error rate; the probability of a Type II error (the Type II error rate) is designated by the Greek letter beta (ß). A Type II error is only an error in the sense that an opportunity to reject the null hypothesis correctly was lost. It is not an error in the sense that an incorrect conclusion was drawn since no conclusion is drawn when the null hypothesis is not rejected.

H0 True H0 False

Reject H0

Type I error Correct

Do not Reject H0 CorrectType II error

Statistical Decision

True State of the Null Hypothesis

6

Six Sigma DMAIC method: Hypothesis Tests

Six Sigma DMAIC method has 5 phases:

1. Define Opportunity/Problem

2. Measure Performance

3. Analyze Process and Performance

4. Improve Process and Performance

5. Control Process and Performance

I typically use this diagram to depict the continuous focus of measurement in the Six Sigma method by placing Measure in the center of the DMAIC method.

Measure

Define

Control

Improve

Analyze

7

6S Black Belt Level of Cognition for Hypothesis Testing

Topic Level of Cognition My Development

Introduction to Statistical Comparisons

2*

Normality and Transformation 2

Correlation Analysis 3

Regression Analysis 3

Introduction to Multiple Linear

Regression

1

t-tests 3

ANOVA 3

1 and 2 proportion test 3

Chi-Square Analysis 3

Binary Logistic Regression 1

*1= learned, 2= know, 3 = used, 4 = taught

8

6S BB Level of Cognition for Hypothesis Testing

Topic Level of Cognition My Development

Introduction Experimental Design

2*

Background on Experimental

design

2

DOE Designs and terminology 2

Full Factorial design 2

Half factorial designs 2

Robust Designs 2

Checklists for designing and conducting DOE

BBC exercise: DOE Simulation

2

Results of DOE Simulation 2

*1= learned, 2= know, 3 = used, 4 = taught

9

ProportionTests

When we want to make a statistical comparison of a discrete variable with a target, or between two discrete variables, Proportion Tests should be used.

StatisticalProblem

StatisticalProblemStatisticalProblem

StatisticalProblem

BusinessProblem

BusinessProblem

BusinessSolution

BusinessSolution

StatisticalSolution

StatisticalSolution

Potential Root

Causes Identified

Root Causes Verified

The Business Approach

10

Dis

cre

teD

iscr

ete

DiscreteDiscrete ContinuousContinuous

ProportionTests

ProportionTests

Logistic Regression

Logistic Regression

t testANOVA

DOE

t testANOVA

DOE

CorrelationRegression

CorrelationRegression

X

YC

on

tinuo

us

Co

ntin

uou

s

Selecting the Right Statistical Tool

11

Determine if a statistically significant difference of proportion exists between:

- A sample and a target- Two independent samples- Two samples or less

Tests of Proportion

1 ProportionTest

1 Sample

Comparing Proportions

2 ProportionTest

Chi-SquareTest

More Than 2 Samples2 Samples

Use samples to make inferences about population proportions

12

Proportion Test Approach1. State the null and alternative hypotheses

Null H0 P1 = P2 Number of tails = 2

P1 - P2 0 Number of tails = 1

P1 - P2 0 Number of tails = 1

Alternatives Ha P1 - P2 0 P1 P2 Number of tails = 1, left or right

P1 - P2 0

P1 - P2 0

2. Formulate an analysis plan: 1 Proportion to known value (z) or 2 Proportions test

3. Analyze sample dataa. Independence Test: Fisher’s, Barnard’s, G-Testb. Pooled sample proportion to compute standard errorc. P value for test statistic

4. Interpret results: for a statistical decision (hopefully a business decision, not not always)

If P is low, H0 must be no go

13

One Tail or Two Tails: Placing the Alpha Risk

14

Useful Discrete DistributionsBinomial distribution for:

The number X of successes (or failures!) in n trials when p is the chance of success (or failure!) or each trial.

Examples:

• number X of faulty expense reports out of n=100 submitted in a particular month, when the faulty expense report rate typically runs at p=0.03 (i.e., 3%)

• number of voters out of a random sample of n=800 expressing approval of the President’s performance, when the approval rating in the entire population of voters is p=0.42 (i.e., 42%)

X is discrete: it must be one of 0, 1, 2, … , n

15

Useful fact: has approximately a normal distribution when n is large

(more than 25 or 30) and np and n(1-p) are not too small (say >5).

Binomial - key facts

A p p lic a t io n s : a t t r ib u te d a ta :

n

pppnp

pnp

pnXp

pX

pX

)1( and )1(

and

/ˆ

ˆ

ˆ

:ssd'

:smean'

estimates

p̂

16

Binomial - Normal Approximation

X w o u l d b e a p p r o x i m a t e l y n o r m a l , w i t h m e a n a n d s d g i v e n b y :

6717.497.003.0750)1(

5.2203.0750

pnp

np

X

X

A p p r o x i m a t e a r e a u n d e r c u r v e l e f t o f X = 1 2 c a n b e f o u n d f r o m t h e z - s c o r e :

248.26717.4

5.2212

z

F r o m t a b l e s ( o r M i n i t a b ) , t h e a r e a u n d e r t h e n o r m a l c u r v e t o t h e l e f t o f - 2 . 2 4 8 i s 0 . 0 1 2 3 . T h e e x a c t p r o b a b l y t h a t X 1 2 c a n a l s o b e f o u n d i n M i n i t a b t o b e 0 . 0 1 0 9 , s o t h e n o r m a l a p p r o x i m a t i o n g i v e s a v e r y c l o s e a n s w e r . W h a t a r e t h e c h a n c e s t h e s i t e w o u l d b e s o l o w i n “ d e f e c t s ” i f i t w e r e f o l l o w i n g t h e d e f e c t r a t e ? P r e t t y l o w ! W h a t w o u l d y o u c o n c l u d e ?

17

Histogram: n=20

20

Frequency

115110105100959085

7

6

5

4

3

2

1

0

Mean 101.1StDev 7.423N 20

Histogram of 20Normal

18

Histogram: n=100

100

Frequency

120.0112.5105.097.590.082.5

20

15

10

5

0

Mean 100.6StDev 6.913N 100

Histogram of 100Normal

19

Sample Size

General Guidelines (if not followed, test may not run):

• Each Sample includes at least 10 failures and 10 successes (some texts say 5)

• The sample is from a population 10 x the sample

• Use Minitab sample size calculator

• Use TI 83 or TI 84 Graphing Calculator (see web)

20

Hypothesis testing - terms

Null hypothesis (H0) – e.g., µ1 = µ2 - this is the hypothesis to be tested and should be in the form of a true/false statement . This hypothesis states that there is NO DIFFERENCE between the data sets or samples or populations. Null hypotheses are never accepted – we either reject them or fail to reject them. The null hypothesis has PRIORITY and should not be rejected unless there is strong statistical evidence to do so.

Alternate hypothesis (H1, HA) – e.g., µ1 ≠ µ2 - the alternative to the null hypothesis – states that there IS A DIFFERENCE between the data sets or populations.

Type 1 error – rejecting the null hypothesis when it is really true – e.g., “convicting the innocent”

Type 2 error – failing to reject the null hypothesis when it really is false – e.g., “letting the guilty go free”

Level (or size) of a test = Alpha (α) – is the probability of a type 1 error – default = 5%

Beta (β) – is the probability of a type 2 error – default = 10%

Power of a test or power – is the probability of correctly rejecting a false null hypothesis. Since β is the probability of a type I error, power is calculated by the formula (1 - β). Power = (1 - β) when the null hypothesis is false. The default value for power is 90%This means that you have an 90% chance of finding a difference when you really want to find it.

Critical region (rejection region) – set of values of the test statistic that cause the null hypothesis to be rejected. If the test statistic falls into the rejection region, the null hypothesis is rejected.

21

Hypothesis testing steps• State the null hypothesis H0 and the alternate hypothesis HA (e.g., the

mean incomes of college graduates does not equal that of other people)

Choose the level of significance, alpha (α default = 0.05) and the sample size (default n = 25)

Choose the appropriate statistical techniques (t test, Chi-square, etc.,) and test statistic (e.g., mean)

Collect the data and calculate the sample value of the test statistic

Calculate the p value based on the test statistic and compare it with alpha (α = 0.05)

Make a statistical decision – if p is greater than or equal to alpha, fail to reject the null hypothesis. If the p value is less than alpha, reject the null hypothesis.

22

Hypothesis tests are either one tailed or two tail tests

Fail to Reject H0Reject H0

1% or 5%

significance level

Fail to Reject H0Reject H0

On

e t

ail

te

st

- A

ns

we

rs o

nly

O

NE

qu

es

tio

n -

is

th

e t

es

t s

tati

sti

c

les

s t

ha

n o

r g

rea

ter

tha

n t

he

k

no

wn

dis

trib

uti

on

Fail to Reject H0 Reject H0Reject H0

Tw

o t

ail

ed

te

st

– O

nly

as

ks

if

the

te

st

sta

tis

tic

is

dif

fere

nt

fro

m t

he

kn

ow

n d

istr

ibu

tio

n –

H

A u

su

all

y h

as

“n

ot

eq

ua

l to

” in

th

e w

ord

ing

2.5% significance level 2.5% significance level

23

Clinical Testing One-tailed example by hand

The “Feel Good” Drug company has discovered a new drug which prevents acne. Since the market for skin care products is larger for woman than men, the company would like to be able to show a treatment advantage for women vs men. The company statistician chooses a simple random sample of 110 women and 207 men from a population of 100,000 healthy volunteers. After 6 months, 48% of women had no acne, vs 61% of men. Can the company claim a benefit for women vs men at the 0.01 level of significance?

1)What are the hypotheses?

2)Calculate the pooled sample proportion and the Standard Error and consult the z-score statistic

3)What do the results tell us?

24

Clinical Testing One-tailed example by hand1) What are the hypotheses?

Ho - P1 = P2 Ha – P1 < > P2

The null hypothesis will be rejected if the proportion of women developing acne (p1) is substantially smaller than the proportion of men developing acne (p2)

2) Calculate the pooled sample proportion and the Standard Error and consult the z-score statistic:

P = (p1 * n1 + p2 * n2)/(n1 + n2) = [(0.48 *110) + (0.61 * 207)]/(110 + 207) = 52.8 + 126.3 / 317= 0.564

SE = sqrt { p * (1 - p) * [(1/n1) + (1/n2)]}= [ 0.564 * 0.436 * (1/110 + 1/207)

= sqrt 0.245 * (0.009 + 0.005) = 0.058

Z = (p1 - p2)/SE = (0.48 - 0.61) / 0.058 = -2.24Since this is a one tailed test, the P value is the probability that the z-score is less than -2.24. The Normal distribution calculator for P (z < -2.24) = 0.013 P value = 0.013. Since 0.013 is greater than the chosen significance level (0.01), WE FAIL TO REJECT THE NULL HYPOTHESIS – THERE IS NO STATISTICAL

DIFFERENCE BETWEEN THE POPULATIONS

25

Test of IndependenceFisher’s Exact Test is most commonly used for 2 x 2 tables to determine if there is a nonrandom relationship between two categorical variables. Fisher’s calculates conditional probability for

the observed row and column matrix.

Fisher’s exact test in Minitab:

Trials Events %200 120 60.0%300 210 70.0%

countsadverse drug

120 y old

80 n old210 y new90 n new

Rows: adverse Columns: drug

new old All

n 90 80 170

y 210 120 330

All 300 200 500

Cell Contents: Count

Fisher's exact test: P-Value = 0.0265193

26

Regulatory Compliance Documentation Sample Size: Minitab

27

1-Proportion Test

StatisticalProblem


StatisticalProblem

BusinessProblem

BusinessProblem

BusinessSolution

BusinessSolution

StatisticalSolution

StatisticalSolution

Potential Root

Causes Identified



28

Regulatory Compliance Documentation ExampleA Black Belt is studying the company’s ability to get regulatory compliance documentation to the record center with in 5 days from project completion.

What is the binomial characteristic?

A random sample of 130 project documentation records showed that 74 of them met the 5 day deadline.

The business was heard saying “at least we’re over the half way mark!”

Test the hypothesis at 95% confidence that more than 50% of engagements met the deadline.

What is the Null Hypothesis?

29

Regulatory Compliance Documentation Example - Hypothesis

Ho : The proportion of compliance documentation filed at

the record center on time is 50% (interim target value).

Ha : The proportion of external work papers filed at the

record center on time is greater than 50%.

Note: Typically the alternative is stated as “there is a difference.”

Why does this example state “greater than?”

30

Compliance Documentation Example – Minitab CommandsTool Bar Menu > Stat > Basic Statistics > 1 Proportion Analysis

target

31

Compliance Documentation Example – Minitab Results

What’s our interpretation?

Test and CI for One Proportion

Test of p = 0.5 vs p > 0.5

95% Lower ExactSample X N Sample p Bound P-Value

1 74 130 0.569231 0.493309 0.068

32

Regulatory Compliance Documentation Sample SizePower and Sample Size

Test for Two Proportions

Testing proportion 1 = proportion 2 (versus <)

Calculating power for proportion 2 = 0.7

Alpha = 0.05

Sample Target

Proportion 1 Size Power Actual Power

0.6 388 0.9 0.900148

0.6 281 0.8 0.800923

The sample size is for each group.

Is the sample size a concern?

33

2-Proportion Test

StatisticalProblem


StatisticalProblem

BusinessProblem

BusinessProblem

BusinessSolution

BusinessSolution

StatisticalSolution

StatisticalSolution

Potential Root

Causes Identified



34

Analysis of Proportions for Workload BalanceJack Lairdieson, MBB, Vanguard

Pro

port

ion

SPAWestSoutheastNortheastNYCentralTotal

0.54

0.52

0.50

0.48

0.46

0.44

0.42

0.40

August WLB In-Range Proportions with 95% Confidence Bands

Interpret as an Interval Plot for Multiple Proportions

Total Region 5 Region 6 Region 3 Region 1 Region 3 Region 2

35

Workload Balance Example

The Workload Balance (WLB) metrics were being discussed at a regional meeting. The Region 1 representative scoffed at the Region 2 representative that the Region 2’s “In-range” WLB performance metrics were at the “bottom of the barrel”. The Region 2 representative quickly responded, “Really, Region 1 is no better than Region 2.”Once back to the office the concerned Region 1 representative gave the following Workload Balance data to a Black Belt.WLB Stats In-Range Staff

Region 1 663 1411Region 2 141 353

Should Region 1 be concerned about his conclusion? What is the null hypothesis?

36

Workload Balance Example - Hypothesis

Ho : The proportion of Region 1 “In-Range” staff is

equal to the proportion of Region 2 “In-Range” staff.

Ha : The proportion of Region 1 “In-Range” staff is not

equal to the proportion of Region 2 “In-Range” staff.

or

Ha : The proportion of Region 1 “In-Range” staff is

greater than the proportion of Region 2 “In-Range” staff.

37

Workload Balance Example – Minitab Commands

Tool Bar Menu > Stat > Basic Statistics > 2 Proportion

Analysis through MINITAB™

38

Workload Balance Example – Minitab Results

Session Window Output

What’s our interpretation? What Hypothesis did we choose to test?

Is the sample size a concern?

Test and CI for Two Proportions

Sample X N Sample p 1 663 1411 0.469880 2 141 353 0.399433

Difference = p (1) - p (2)Estimate for difference: 0.070446195% lower bound for difference: 0.0223190Test for difference = 0 (vs > 0): Z = 2.41 P-Value = 0.008

39

Sample Size: MinitabTesting proportion 1 = proportion 2 (versus >)

Calculating power for proportion 2 = 0.399

Alpha = 0.05

Sample Target

Proportion 1 Size Power Actual Power

0.469 857 0.9 0.900072

0.469 619 0.8 0.800094

The sample size is for each group.

40

ReferencesFisher RA (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh

Barnard GA (1945). A new test for 2 x 2 tables. Nature 156:177

Chan I (1998) Exact tests of equivalence and efficacy with non-zero lower bound for comparative studies. Statistics in Medicine 17, 1403-1413

Mehta CR and Senchaudhuri P (2003). Conditional versus unconditional tests for comparing two binomials. Cytel Software.

Web Sites:

http://www.minitab.com/support/documentation/answers/

SampleSize2p.pdf

www.statsoft.com/textbook/stathome

http://sofia.fhda.edu/gallery/statistics/lessons/lesson10-2

41

Six Sigma LinksSix SigmaMotorola, Inc. - Motorola UniversitySix Sigma - What is Six Sigma?i Six Sigma - Six Sigma Quality Resources for Achieving Six Sigma ResultsGeneral Electric : Our Company : What is Six Sigma?QualityAmerican Society for Quality - ASQTQM Virtual CoursePackSPC Press - HomeStatisticshttp://www.statsoft.com/textbook/stathome.htmlPenn State Statistical Education Resource Kit--Overview of Statistics DataStatistics Video CourseThe Sofia Open Content Initiative - Elementary StatisticsResource: Learning Math: Data Analysis, Statistics, and ProbabilityLean Six SigmaKaizen and Lean Manufacturing Consulting: Gemba Research - | Kaizen ProductsConquering Complexity, Fast Innovation, Lean Six Sigma Quality. George Group Consulting Six Sigma Training BookLEAN.org - Lean Enterprise Institute| Lean Production| Lean Manufacturing| LEI| Lean Services| Lean Enterprise Training Course| Lean Consumption| Lean Resources| Lean Experts| Lean Healthcare| Lean in Healthcare| Training on Lean Manufacturing| Lean Business Excel Statistics Add onhttp://www.qimacros.com/

proportion testing

Documents

sample proportion

error rate

known test mean2 sample

bartletts test

levenes test

sample median

sample size4

sigma method