l12_anova
DESCRIPTION
lecture notesTRANSCRIPT
Analysis of Variance
David Chow
Nov 2014
Chap 11-1Chap 11-1
Learning ObjectivesLearning ObjectivesIn this chapter, you learn: The basic concepts of experimental design How to use one-way analysis of variance (ANOVA) How to use two-way analysis of variance and interpret the
interaction effect
Chap 11-2Chap 11-2
General ANOVA SettingGeneral ANOVA Setting
R h d i i t ll t d t d Researcher designs an experiment, collects data, and draw conclusions
Researcher controls one or more factors of interest Observe effects on the dependent variable
Main Question: Are the groups (populations) the same? Main Question: Are the groups (populations) the same?
Each factor (independent variable) contains two or more treatments (levels)
Levels can be numerical or categoricalDifferent levels give different groups with each group
Chap 11-3Chap 11-3
Different levels give different groups, with each group representing a population
Completely Randomized DesignCompletely Randomized Design
CRD i th i l t i t l d i CRD is the simplest experimental design Only one factor under consideration
T t bj t d l Test subjects (assumed to be homogeneous) randomly assigned to different treatment levels
Treatment (level)( )Placebo (P) Vaccine (V)
300 300
Eg: A medical experiment Subjects randomly assigned to get one treatment (either P or V)
Dependent variable = no of colds reported
Chap 11-4Chap 11-4
Dependent variable = no of colds reported Few no of colds in the vaccine group?
One-Way ANOVA: AssumptionsOne Way ANOVA: Assumptions
Evaluate the difference among the means of three Evaluate the difference among the means of three or more groups
t d dEg1: Accident rates for 1st, 2nd, and 3rd shiftEg2: Expected mileage for five brands of tires
AssumptionsPopulations are normally distributed Populations are normally distributed
Populations have equal variancesSamples are randomly and independently drawn
Chap 11-5Chap 11-5
Samples are randomly and independently drawn
Setting HypothesesSetting Hypotheses
μμμμ:H
All population means are equal (c = no of groups)i e no factor effect
c3210 μμμμ:H
i.e., no factor effect
thl tithfllN tH
At least one pair with different population means
samethearemeanspopulationtheofallNot :1H
i.e., there is a factor effect
Chap 11-6Chap 11-6
Graphical PresentationGraphical Presentation
H0 is True
321 μμμ
H0 NOT true
or
Chap 11-7Chap 11-7321 μμμ 321 μμμ
Idea: Partitioning the VariationIdea: Partitioning the Variation
Total variation can be split into two parts:
SST = SSA + SSW
SST = Total Sum of Squares
SST = SSA + SSW
SST = Total Sum of Squares(Total variation)
SSA = Sum of Squares Among Groups(A i ti d t f t )(Among-group variation – due to factor)
SSW = Sum of Squares Within Groups(Within-group variation – due to ____)
Chap 11-8Chap 11-8
( g p ____)
Obtaining the Mean SquaresObtaining the Mean SquaresThe Mean Squares are obtained by dividing the various
SSA
sum of squares by their associated degrees of freedom
Mean Square Among
1cSSAMSA Mean Square Among
(d.f. = c-1)
cnSSWMSW
Mean Square Within(d.f. = n-c)cn
SSTMST
( )
Mean Square Total
Chap 11-9Chap 11-9
1n
MST Mean Square Total(d.f. = n-1)
One-Way ANOVA TableOne Way ANOVA Table
S f S OfD f M SSource of Variation
Sum OfSquares
Degrees ofFreedom
Mean Square(Variance)
A
F
SSAAmong Groups c - 1 MSA =
Within
SSAMSA
SSAc - 1SSW
FSTAT =
Within Groups SSWn - c MSW =
T t l SST1
MSWSSWn - c
df1 = c – 1 dfTotal SSTn – 1
c = number of groups
df2 = n – c
Chap 11-10Chap 11-10
n = sum of the sample sizes from all groupsdf = degrees of freedom
Interpreting F StatisticInterpreting F Statistic
The F statistic is the ratio of two variance The F statistic is the ratio of two variance estimates: among groups to within groups The ratio must always be positive df1 = c -1 will typically be small df2 = n - c will typically be large
One-Tail F-testDecision Rule:ec s o u e Reject H0 if FSTAT > Fα,
otherwise do not reject H00
Chap 11-11Chap 11-11
0 Reject H0Do not reject H0
Fα
Eg: Are the Clubs Different?g
Wh th diff t lf l b
Club 1 Club 2 Club 3254 234 200
When three different golf clubs are used, they hit the ball different distances. Y d l l t fi
254 234 200263 218 222241 235 197237 227 206 You randomly select five
measurements for each club. At the 0.05 significance level, is there
diff i di t ?
237 227 206251 216 204
a difference in mean distance?
Computations by EXCEL
Chap 11-12Chap 11-12
Excel Output
SUMMARY
Excel Output
SUMMARYGroups Count Sum Average Variance
Club 1 5 1246 249.2 108.2Club 2 5 1130 226 77.5Club 3 5 1029 205.8 94.2ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 4716.4 2 2358.2 25.275 4.99E-05 3.89Within Groups 1119.6 12 93.3
Total 5836.0 14
Chap 11-13Chap 11-13
Statistical DecisionStatistical DecisionH : μ = μ = μ Test Statistic:H0: μ1 = μ2 = μ3
H1: μj not all equal = 0.05
Test Statistic:
25.2752358.2FSTAT MSA
df1= ___ df2 = ___
D i i
25.27593.3
FSTAT MSW
Critical Decision:
C l iReject H0 at = 0.05
Critical Value:
Fα = 3.89Conclusion:
There is evidence that at least one μ differs0
= .05
Chap 11-14Chap 11-14
FSTAT = 25.275
at least one μj differs from the rest
0Fα = 3.89
Reject H0Do not reject H0
Scatter PlotScatter Plot
270Distance270
260
250 •••
X
Club 1 Club 2 Club 3254 234 200263 218 222 250
240
230•••
••
1X
X
263 218 222241 235 197237 227 206251 216 204
•230
220
210•••
2X X251 216 204
••••
200
190
3X227.0 X
205.8 X 226.0X 249.2X 321
Chap 11-15Chap 11-15Club1 2 3
ANOVA AssumptionsANOVA Assumptions
Randomness and Independence Select random samples from the c groups (or randomly
assign the levels)assign the levels) Normality
The sample values for each group are from a normal The sample values for each group are from a normal population
Homogeneity of Varianceg y All populations sampled from have the same variance
Chap 11-16Chap 11-16
Chapter SummaryChapter Summary
One-way ANOVAy Its logic & assumptions F-test for difference in c means
(Below are not covered) If H0 rejected: Tukey-Kramer procedure for multiple comparisons
A ti h k L t t f h it f i Assumption check: Levene test for homogeneity of variance
Another experimental design: randomized block design Two-way analysis of variance Two way analysis of variance
Examined effects of multiple factors Examined interaction between factors
Chap 11-17Chap 11-17
Appendix: Math Details
Chap 11-18Chap 11-18
Total Sum of SquaresTotal Sum of Squares
SST = SSA + SSW
c n j
XXSST 2)(
SST = SSA + SSW
j i
ij XXSST1 1
)(Where:
SST = Total sum of squares
c = number of groups or levels
nj = number of observations in group j
Xij = ith observation from group j
Chap 11-19Chap 11-19
X = grand mean (mean of all data values)
Total VariationTotal Variation
2212
211 )()()( XXXXXXSST
ccn
Response, X
X
Chap 11-20Chap 11-20
Group 1 Group 2 Group 3
Among-Group VariationAmong Group Variation
SST = SSA + SSW
2)( XXnSSA j
c
SST = SSA + SSW
Where:1
)( XXnSSA jj
j
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Chap 11-21Chap 11-21
j p g p j
X = grand mean (mean of all data values)
Among-Group VariationAmong Group Variation
c2
1)( XXnSSA j
c
jj
Variation Due to Differences Among Groups 1
cSSAMSA1c
Mean Square Among =
j
SSA/degrees of freedom
Chap 11-22Chap 11-22
i j
Among-Group Variation
2222
211 )()()( XXnXXnXXnSSA cc
Response, X
XX 2X
3X
1X 2X
Chap 11-23Chap 11-23
Group 1 Group 2 Group 3
Within-Group VariationWithin Group Variation
SST = SSA + SSW
2)( j
nc
XXSSWj
SST = SSA + SSW
Where:
11)( jij
ijXXSSW
Where:
SSW = Sum of squares within groups
c = number of groupsc number of groups
nj = sample size from group j
X = sample mean from group j
Chap 11-24Chap 11-24
Xj = sample mean from group j
Xij = ith observation in group j
Within-Group VariationWithin Group Variation
n2
11)( jij
n
i
c
jXXSSW
j
Summing the variation within each group and then
SSWMSW
j
adding over all groups cnMean Square Within =
SSW/degrees of freedom
Chap 11-25Chap 11-25
jμ
Within-Group Variation
22212
2111 )()()( ccn XXXXXXSSW
c
Response, X
X3X
1X 2X
Chap 11-26Chap 11-26
Group 1 Group 2 Group 3
EgEg: Car Wax Effectiveness: Car Wax Effectiveness
•• The number of times each car went through the The number of times each car went through the carwash before its wax deteriorated is shown on the carwash before its wax deteriorated is shown on the next slide next slide
•• The wax producer must decide which wax to marketThe wax producer must decide which wax to marketA th th ll ff ti ?A th th ll ff ti ?•• Are the three waxes equally effective?Are the three waxes equally effective?
Factor :Factor : Car waxCar waxFactor :Factor : Car waxCar waxTreatments (Levels):Treatments (Levels): Type 1, Type 2, Type 3Type 1, Type 2, Type 3Subjects:Subjects: CarsCarsjjResponse variable:Response variable: Number of washesNumber of washes
EgEg: Car Wax Effectiveness: Car Wax Effectiveness
Obser ationObser ationWaxWax
Type 1Type 1WaxWax
Type 2Type 2WaxWax
Type 3Type 3
1122
27273030
33332828
29292828
ObservationObservation Type 1Type 1 Type 2Type 2 Type 3Type 3
223344
303029292828
282831313030
282830303232
55 3131 3030 3131
Sample MeanSample Mean 29.0 30.429.0 30.4 30.030.0ppSample VarianceSample Variance 2.52.5 3.33.3 2.52.5
EgEg: Car Wax Effectiveness: Car Wax Effectiveness
�� HypothesesHypotheses
HH : : == ==
where: where:
HH00: : 11==22==33HH11: Not all the means are equal: Not all the means are equal
where: where: 1 1 = mean number of washes using Type 1 wax= mean number of washes using Type 1 wax2 2 = mean number of washes using Type 2 wax= mean number of washes using Type 2 wax3 3 = mean number of washes using Type 3 wax= mean number of washes using Type 3 wax
EgEg: Car Wax Effectiveness: Car Wax Effectiveness
Source ofSource ofV i tiV i ti
Sum ofSum ofSS
Degrees ofDegrees ofF dF d
MeanMeanSS FF
�� ANOVA TableANOVA Table
V lV lVariationVariation SquaresSquares FreedomFreedom SquaresSquares FF
TreatmentsTreatments aa5.25.2 cc
pp--ValueValue
.42.42ee
ErrorError 33.233.2 bb dd
TotalTotal 141438.438.4
�� Rejection Rule (given Rejection Rule (given αα = 0.05)= 0.05)�� Rejection Rule (given Rejection Rule (given αα 0.05) 0.05)pp--Value Approach: Reject Value Approach: Reject HH00 if if pp--value value << .05.05Critical Value Approach: Reject Critical Value Approach: Reject HH00 if if FF >> FF.05.05 = h= h
ANSWERANSWER
�� ANOVA TableANOVA Table
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquaresSquares FF pp--ValueValue
TreatmentsTreatments
ErrorError
a=2a=25.25.2
33.233.2 b=12b=12
c=2.60c=2.60
d=2.77d=2.77
e=0.939e=0.939 .42.42
TotalTotal 141438.438.4
Critical Value: Critical Value: FF.05.05 = 3.89= 3.89
ANSWERANSWER
�� ConclusionConclusion
pp--value approachvalue approachFrom FFrom F--table, table, pp--value is greater than 0.10, where value is greater than 0.10, where FF = 2.81.= 2.81.
(E l i t (E l i t l f 0 42)l f 0 42)(Excel gives an exact (Excel gives an exact pp--value of 0.42)value of 0.42)Do not reject Do not reject HH00
Critical value approachCritical value approach: F: FTESTTEST=0.939 < F=0.939 < F.05.05, do not reject , do not reject HH00
There is insufficient evidence to conclude that the mean There is insufficient evidence to conclude that the mean There is insufficient evidence to conclude that the mean There is insufficient evidence to conclude that the mean number of washes for the three wax types are not the samenumber of washes for the three wax types are not the same