engineering statistics (anova)

48
1

Upload: muhamad-haffiez

Post on 07-Apr-2015

502 views

Category:

Documents


0 download

DESCRIPTION

Analysis of Variance (ANOVA)

TRANSCRIPT

Page 1: Engineering Statistics (ANOVA)

1

Page 2: Engineering Statistics (ANOVA)

2

Page 3: Engineering Statistics (ANOVA)

� The one-way analysis of variance is used to test the claim that three or more population means are equal

� This is an extension of the two independent samples t-testsamples t-test

� One-way ANOVA – An analysis of variance procedure using one dependent and one independent variable.

3

Page 4: Engineering Statistics (ANOVA)

� The response variable is the variable you’re comparing

� The factor variable is the categorical variable being used to define the groups◦ We will assume k samples (groups)

� The one-way is because each value is classified in exactly one way◦ Examples include comparisons by gender, race, political

party, color, etc.

4

Page 5: Engineering Statistics (ANOVA)

To use the one-way ANOVA test, the following assumptions must be true

◦ The population under study have normal distribution

◦ The samples are drawn randomly, and each sample is independent of the other samples.

◦ All the populations from which the samples values are obtained, have the same unknown population variance, that is for k number of populations,

2 2 21 2 kσ σ σ= = =K

5

Page 6: Engineering Statistics (ANOVA)

� There is a “family” of F Distributions.

� Each member of the family is determined by two parameters:◦ the numerator degrees of freedom ◦ the denominator degrees of freedom.

� F cannot be negative, and it is a continuous distribution.

� The F distribution is positively skewed.

� Its values range from 0 to ∞

� As F → ∞ the curve approaches the X-axis.

6

Page 7: Engineering Statistics (ANOVA)

� Only one classification factor is considered

Factor

1Response/ outcome/

Treatment1

2

i

(The level of the factor)

Response/ outcome/ dependent variable

(samples)

Replicates (1,…,j)The object to a given treatment

7

Page 8: Engineering Statistics (ANOVA)

� HHHH0000: : : : µµµµ1111 = = = = µµµµ2222 = = = = µµµµ3333 = ... = = ... = = ... = = ... = µµµµkkkk

— All population meansAll population meansAll population meansAll population meansare equalare equalare equalare equal

— No treatment effectNo treatment effectNo treatment effectNo treatment effect

� HHHHaaaa: Not All : Not All : Not All : Not All µµµµiiii Are EqualAre EqualAre EqualAre Equal— At least 2 pop. meansAt least 2 pop. meansAt least 2 pop. meansAt least 2 pop. means

X

f(X)

µµµµ1 = µµµµ2 = µµµµ3

Mean square (variance)

within

— At least 2 pop. meansAt least 2 pop. meansAt least 2 pop. meansAt least 2 pop. meansare differentare differentare differentare different

— Treatment effectTreatment effectTreatment effectTreatment effect

— µµµµ1111 ≠≠≠≠ µµµµ2222 ≠≠≠≠ ... ... ... ... ≠≠≠≠ µµµµk k k k isisisisWrongWrongWrongWrong

µµµµ1 µµµµ2 µµµµ3

1 2 3X

f(X)

µµµµ = µµµµ µµµµ

Mean square among

8

Page 9: Engineering Statistics (ANOVA)

� If the null hypothesis is true, ◦ we would expect all the sample means to be close

to one another (and as a result, close to the grand mean).mean).

� If the alternative hypothesis is true, ◦ at least some of the sample means would differ.

9

Page 10: Engineering Statistics (ANOVA)

� Variation◦ Variation is the sum of the squares of the

deviations between a value and the mean ofthe value.

� As long as the values are not identical, � As long as the values are not identical, there will be variation

� Denoted as SS for Sum of Squares

10

Page 11: Engineering Statistics (ANOVA)

� Are all of the values identical?◦ No, so there is some variation in the data◦ This is called the total variation◦ Denoted SS(Total) for the total Sum of ◦ Denoted SS(Total) for the total Sum of

Squares (variation)◦ Sum of Squares is another name for variation

11

Page 12: Engineering Statistics (ANOVA)

� VARIATION BETWEEN GROUPS◦ Are all of the sample means identical?� No, so there is some variation between the groups� for each data value look at the difference between its

group mean and the overall mean. This is called the group mean and the overall mean. This is called the between group variation

� Sometimes called the variation due to the factor� Denoted SS(A) for Sum of Squares (variation)

between the groups

( )2xxi −

12

Page 13: Engineering Statistics (ANOVA)

VARIATION WITHIN GROUPS ◦ Are each of the values within each group identical?

� No, there is some variation within the groups.� for each data value we look at the difference between that

value and the mean of its group.This is called the within value and the mean of its group.This is called the within group variation

� Sometimes called the error variation� Denoted SS(E) for Sum of Squares (variation) within

the groups• for each data value we look at the difference between that value and the mean of its group

( )2iij xx −

13

Page 14: Engineering Statistics (ANOVA)

Variance is described as Sum of Squares

Total Variance is partitioned as follows:

SS TOTALSS TOTAL

SSBETWEENSS WITHIN

14

Page 15: Engineering Statistics (ANOVA)

� ONE-WAY ANOVA TABLE

Source SS df MS F

Between Between (Factor)

Within (Error)

Total

15

Page 16: Engineering Statistics (ANOVA)

One-way Analysis of Variance

Source DF SS MS F Factor 2 2510.5 1255.3 93.44 0.000Error 12 161.2 13.4

“F” means “F test statistic”

Error 12 161.2 13.4Total 14 2671.7

“Source” means “find the components of variation in this column”

“DF” means “degrees of freedom”

“SS” means “sums of squares”

“MS” means “mean squared”16

Page 17: Engineering Statistics (ANOVA)

One-way Analysis of Variance

Source DF SS MS FFactor 2 2510.5 1255.3 93.44 0.000Error 12 161.2 13.4Error 12 161.2 13.4Total 14 2671.7

“Factor” means “Variability between groups” or “Variability due to the factor of interest”

“Error” means “Variability within groups” or “unexplained random variation”

“Total” means “Total variation from the grand mean”17

Page 18: Engineering Statistics (ANOVA)

One-way Analysis of Variance

Source DF SS MS FFactor a-1 SS(Between) MSA MSA/MSEError n-a SS(Error) MSE Error n-a SS(Error) MSE Total n-1 SS(Total)

MSA = SS(Between)/(a-1)MSE = SS(Error)/(n-a)n-1 = (a-1) + (n-a)

SS(Total) = SS(Between) + SS(Error)18

Page 19: Engineering Statistics (ANOVA)

( )

)xx(SSE

n

xx)xx(SST

obs

2iij

2ij2

ij2

obsij

−=

−=−=

∑∑∑

( )

MSE

MSAF;

DF

SSMS;SSESSASST

n

x

n

)x()xx(SSA

2ij

i

2i2

obsi

obs

==+=

−=−= ∑∑∑

19

Page 20: Engineering Statistics (ANOVA)

αααα

If means are equal,F = MST / MSE ≈≈≈≈ 1. Only reject if large F!

Reject H 0

αααα

Always One-Tail!

F(α; k – 1, n – k)0

Do NotReject H 0

F

© 1984-1994 T/Maker Co.

If MST is close to MSE then both have same source of variation

20

Page 21: Engineering Statistics (ANOVA)

As production manager, you want to see if three fillingmachines have different mean filling times. You assign15 similarly trained and experienced workers, 5 permachine, to the machines. At the5% level ofsignificance, is there a difference inmean filling times?

Mach1Mach1 Mach2Mach2 Mach3Mach325.4025.40 23.4023.40 20.0020.0026.3126.31 21.8021.80 22.2022.2024.1024.10 23.5023.50 19.7519.7523.7423.74 22.7522.75 20.6020.6025.1025.10 21.6021.60 20.4020.40

21

Page 22: Engineering Statistics (ANOVA)

The summary statistics for the three filling machines of each row are shown in the table below

Row Mach 1 Mach 2 Mach 3Row Mach 1 Mach 2 Mach 3

Sample size

5 5 5

Total 124.65 113.05 102.95

22

Page 23: Engineering Statistics (ANOVA)

� The H0 is that the means are all equal

◦ H0: All machines have equal mean filling times

The alternative hypothesis is that at least one of the � The alternative hypothesis is that at least one of the means is different:

◦ H1 : Not All machines have equal mean filling times

23

Page 24: Engineering Statistics (ANOVA)

( )∑ ∑∑∑ −=−=

n

x

n

)x()xx(SSA

2

ij

i

2i2

obsi

( ) 65.34095.10205.11365.124 2222 ( )∑ −

++=

15

65.340

5

95.102

5

05.113

5

65.124 2222

162.7736326.7783 −=

164.47=

24

Page 25: Engineering Statistics (ANOVA)

( )∑∑

∑∑ −=−=

n

xx)xx(SST

2

ij2ij

2

obsij

[ ] 162.77364.20...1.2431.264.25 2222 −++++=

2172.58=

162.7736379.7794 −=

[ ] 162.77364.20...1.2431.264.25 2222 −++++=

25

Page 26: Engineering Statistics (ANOVA)

SSASSTSSE

SSESSASST

−=+=

SSASSTSSE −=

164.472172.58 −=

0532.11=

26

Page 27: Engineering Statistics (ANOVA)

SourceSource SSSS dfdf MSMS FF

BetweenBetween

(Machines)(Machines)47.1640

Within (Error)Within (Error)11.0532

TotalTotal58.2172

27

Page 28: Engineering Statistics (ANOVA)

Source SS df MS F

Between

(Machines)47.1640 3 - 1 = 2

� Filling in the degrees of freedom gives this …

(Machines)47.1640 3 - 1 = 2

Within (Error)11.0532 15 - 3 = 12

Total58.2172 15 - 1 = 14

28

Page 29: Engineering Statistics (ANOVA)

SourceSource SSSS dfdf MSMS FF

BetweenBetween

(Machines)(Machines)47.1640 3 - 1 = 2 23.5820

� Completing the MS gives …

(Machines)(Machines)47.1640 3 - 1 = 2 23.5820

Within (Error)Within (Error)11.0532 15 - 3 = 12 0.9211

TotalTotal58.2172 15 - 1 = 14

29

Page 30: Engineering Statistics (ANOVA)

Source SS df MS F

Between

(Machines)47.1640 3 - 1 = 2 23.5820 25.60

� Adding F to the table …

(Machines)47.1640 3 - 1 = 2 23.5820 25.60

Within (Error)11.0532 15 - 3 = 12 0.9211

Total58.2172 15 - 1 = 14

30

Page 31: Engineering Statistics (ANOVA)

HHHH0000: : : : µµµµ1111 = = = = µµµµ2222 = = = = µµµµ3333

HHHH1111: : : : Not all mean equalNot all mean equalNot all mean equalNot all mean equal

Critical Value(s):Critical Value(s):Critical Value(s):Critical Value(s):

� αααα = = = = .05

Test Statistic:

FMST

MSE==== ==== ====

23 5820

921125.6

.

.

F0 3.89

� αααα = = = = .05

� νννν1111 = = = = 2 νννν2222 = = = = 12

Decision:

Conclusion:

Reject at αααα = .05

There is evidence that three filling machines have different mean filling times

αααα = .05

.

31

Page 32: Engineering Statistics (ANOVA)

One-way ANOVA: time versus Machine

Source DF SS MS F PMachine 2 47.164 23.582 25.60 0.000Error 12 11.053 0.921Total 14 58.217

S = 0.9597 R-Sq = 81.01% R-Sq(adj) = 77.85%

Individual 95% CIs For Mean Based onPooled StDev

Level N Mean StDev -------+---------+---------+---------+--1 5 24.930 1.032 (-----*-----)2 5 22.610 0.882 (-----*-----)3 5 20.590 0.959 (-----*-----)

-------+---------+---------+---------+--20.8 22.4 24.0 25.6

Pooled StDev = 0.960

32

Page 33: Engineering Statistics (ANOVA)

33

Page 34: Engineering Statistics (ANOVA)

� An experiment was performed to determine whetherthe annealing temperature of ductile iron affects itstensile strength. Five specimens were annealed at eachof four temperatures. The tensile strength (in ksi) wasmeasured for each temperature. The results arepresentedin thefollowing table. Canyou concludethattherearedifferencesamongthemeanstrengths?presentedin thefollowing table. Canyou concludethattherearedifferencesamongthemeanstrengths?

Temperature(oC)

Sample Values

750 19.72 20.88 19.63 18.68 17.89

800 16.01 20.04 18.10 20.28 20.53

850 16.66 17.38 14.49 18.21 15.58

900 16.93 14.49 16.15 15.53 13.25

34

Page 35: Engineering Statistics (ANOVA)

Temperature(oC)

Sample Sample Sample Sample size (n)size (n)size (n)size (n)

TotalTotalTotalTotal

750

800

850

900

35

Page 36: Engineering Statistics (ANOVA)

36

Page 37: Engineering Statistics (ANOVA)

One-way ANOVA: strength versus Temperature

Source DF SS MS F PTemperature 3 58.65 19.55 8.49 0.001Error 16 36.84 2.30Total 19 95.49

S = 1.517 R-Sq = 61.42% R-Sq(adj) = 54.19%

Individual 95% CIs For Mean Based onPooled StDev

Level N Mean StDev -+---------+---------+---------+--------750 5 19.360 1.133 (------*------)800 5 18.992 1.924 (------*------)850 5 16.464 1.467 (------*-------)900 5 15.270 1.439 (------*-------)

-+---------+---------+---------+--------14.0 16.0 18.0 20.0

Pooled StDev = 1.517

37

Page 38: Engineering Statistics (ANOVA)

38

Page 39: Engineering Statistics (ANOVA)

MSE

Confidence interval for each mean, µi

ian n

MSEtx

−±

,2

α

39

Page 40: Engineering Statistics (ANOVA)

( )X X t MSEn n1 2

1 2

1 1− ± +

� where t is obtained from the t table with degrees of freedom (n - k).

� MSE = [SSE/(n - k)]

n n1 2

40

Page 41: Engineering Statistics (ANOVA)

� When the null hypothesis is rejected, it may be desirable to find which mean(s) is (are) different.

� Two statistical inference procedures, geared � Two statistical inference procedures, geared at doing this, are presented:◦ “regular” confidence interval calculations

◦ Tukey test

41

Page 42: Engineering Statistics (ANOVA)

� Two means are considered different if theconfidence interval for the differencebetween the corresponding samplemeans does not contain 0.

In this case the larger sample mean is� In this case the larger sample mean isbelieved to be associated with a largerpopulation mean.

42

Page 43: Engineering Statistics (ANOVA)

Tukey 95% Simultaneous Confidence IntervalsAll Pairwise Comparisons among Levels of Machine

Individual confidence level = 97.94%

Machine = 1 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----2 -3.9381 -2.3200 -0.7019 (------*-----)2 -3.9381 -2.3200 -0.7019 (------*-----)3 -5.9581 -4.3400 -2.7219 (------*-----)

----+---------+---------+---------+------5.0 -2.5 0.0 2.5

Machine = 2 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----3 -3.6381 -2.0200 -0.4019 (------*-----)

----+---------+---------+---------+------5.0 -2.5 0.0 2.5

43

Page 44: Engineering Statistics (ANOVA)

44

Page 45: Engineering Statistics (ANOVA)

Only two classification factor is consideredOnly two classification factor is consideredOnly two classification factor is consideredOnly two classification factor is considered

Factor B

1 2 j1 2 j

Factor A1

2

i

45

Page 46: Engineering Statistics (ANOVA)

� The standard two-way ANOVA tests are valid under the following conditions:

◦ The design must be complete � Observations are taken on every possible treatment

◦ The design must be balanced� The number of replicates is the same for each treatment� The number of replicates is the same for each treatment

◦ The number of replicates per treatment, k must be at least 2

◦ Within any treatment, the observations are a simple random sample from a normal population

◦ The sample observations are independent of each other (the samples are not matched or paired in any way)

◦ The population variance is the same for all treatments.

46

Page 47: Engineering Statistics (ANOVA)

22 .... .

1

1 b

jj

xSSB x

an abn== −∑

1

SSAMSA

a=

− test

MSAF

MSE=

22 .....

1

1 a

ii

xSSA x

bn abn=

= −∑

1

SSBMSB

b=

− test

MSBF

MSE=

Source (Df) Sum of Squares (SS) Mean of Squares (MS) F Value

A a - 1

B b - 1

Row effect

Column effect

22 ....

1 1

1 a b

iji j

xSSAB x

n abn= == −∑∑

2Error

SSEMSE

N kσ = =

−)

22 ...

1 1 1

a b n

ijki j k

xSST x

abn= = =

= −∑∑∑

( )( )1 1

SSABMSAB

a b=

− −

( )1

SSEMSE

ab n=

−SSE SST SSA

SSB SSAB

= −− −

1b − test MSE

test

MSABF

MSE=Interaction (a-1)(b-1)

Error ab(n-1)

Total abn -1

effect

Interaction effect

47

Page 48: Engineering Statistics (ANOVA)

� A chemical engineer is studying the effects of various reagentsand catalyst on the yield of a certain process. Yield is expressedas a percentage of a theoretical maximum. 4 runs of the processwere made for each combination of 3 reagents and 4 catalysts.Construct an ANOVA table and test is there an interactioneffect between reagents and catalyst.

ReagentCatalyst

Reagent

1 2 3

A 86.8 82.4 86.7 83.5

93.4 85.2 94.8 83.1

77.9 89.6 89.9 83.7

B 71.9 72.1 80.0 77.4

74.5 87.171.9 84.1

87.5 82.7 78.3 90.1

C 65.5 72.476.6 66.7

66.7 77.176.7 86.1

72.7 77.883.5 78.8

D 63.9 70.477.2 81.2

73.7 81.684.2 84.9

79.8 75.780.5 72.9

48