Transcript
  • Analysis of Variance (ANOVA)

    Lecture 26

    March 30, 2017

    Four Stages of Statistics

    • Data Collection �

    • Displaying and Summarizing Data �

    • Probability �

    • Inference• One Quantitative �

    • One Categorical �

    • One Categorical and One Quantitative• Matched Pairs Test �

    • Difference of Two Means Test �

    • ANOVA and Multiple Comparisons

    • Two Categorical

    • Two Quantitative

    F-Distribution

    • F-Distribution: continuous probability distribution with the following properties:

    • Unimodal and right-skewed

    • Always non-negative

    • Two parameters for degrees of freedom• One for numerator, one for denominator

    • Changes shape depending on degrees of freedom

    Examples of F-Distribution

  • Example of F-Table Example #1: F-Table

    • Question: What is the F-statistic with 6 df in the numerator, 11 df in the denominator, and 5% of the area in the upper tail?

    • Answer: ____________________

    _______

    Motivation: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Does it appear all the means are equal or is there some difference?

    • Answer:• Means appear ________________

    _____________________

    • But…____________ exists withineach group

    • Need inferential technique that can handle _______________

    Analysis of Variance (ANOVA)

    • Analysis of Variance (ANOVA): statistical technique used to compare the means of three or more populations

    • Different from difference of two means test

    • Use two sources of variability to compare means• Between group

    • Within group

  • Types of Variation

    • Between Group Variation: measures the amount of variation between the means of the individual groups

    • “How different are the sample means from one another?”

    • Within Group Variation: measures the amount of variation that exists in the samples

    • “How different are the observations from each other within the individual samples?”

    Comparing Types of Variation

    Small Between Group• Means ____________________

    Large Within Group• Observations within

    groups ____________________

    Large Between Group• Means _____________________

    Small Within Group• Observations within

    groups ____________________

    ANOVA: Hypotheses and Conditions

    • Used For:• Comparing if the means of three or more

    independent groups are equal or if some difference exists between a pair of means

    • Hypotheses: • ��: �� = �� = ⋯ = �• ��: At least two means are not equal.

    • Conditions:• Data comes from normally distributed populations

    • Variances of populations approximately equal

    • Sampled observations independent

    : Number of groups

    being compared

    Shortcut: Assume all of the conditions for ANOVA hold.

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:1. Hypotheses:

    • ��: _____________________________________________________

    • ��: _____________________________________________________

    2. Conditions: _____________________________________________

  • ANOVA: Types of Variation

    • Between Group Variation:

    • Within Group Variation:

    ��� = �� �̅� − �̿� + �� �̅� − �̿

    � +⋯+ � �̅ − �̿�

    Sample

    Size Sample

    Mean

    Grand Mean: mean of all observations

    �̿ =���̅� + ���̅� +⋯+ ��̅

    �� + �� +⋯+ �

    ��� = �� − 1 ��� + �� − 1 ��

    � +⋯+ � − 1 ��

    Sample

    SizeSample

    Variance

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors?

    • Data:Comp. Sci. Economics History

    680 600 400

    800 680 480

    660 550 650

    750 730 540

    710 640 570

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Statistics:

    Statistic Comp. Sci. Economics History

    Mean 720 640 530

    Std. Dev. 56.12 69.64 90.83

    Sample Size 5 5 5

    Example #2: ANOVA

    • Grand Mean:

    �̿ = ___________________________________________________

    • Between Group Variation:

    • Within Group Variation:

    ��� = ____________________________________________

    = ____________________________________________

    = ____________

    ��� = ___________________________________________

    = ___________________________________________

    = ____________

  • ANOVA: Test Statistic

    • Mean Squared Treatment

    ��� =���

    − 1• Mean Squared Error

    ��� =���

    � −

    • Test Statistic:

    � =���

    ���

    • Follows F-distribution with − 1 df in numerator and � − df in denominator

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:3. Test Statistic: Need:

    • Degrees of freedom• Numerator: ______________________________

    • Denominator: ______________________________

    • Between group variation: ____________________

    • Within group variation: ____________________

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:3. Test Statistic:

    � = _________________________________________________

    4. Critical Value: ____________________

    ANOVA: Decision and Conclusion

    • Decision: Reject �� for large values of the test statistic

    • Implies between group variation (difference between means) is large relative to within group variation.

    • Conclusion: Either…• No difference between any means

    • At least two means differ

  • • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:5. Decision: _____________________ (_______________________)

    6. Conclusion: ____________________________________________ __________________________________________________________

    Example #2: ANOVA

    Test Statistic:

    _______________

    Rejection

    Region

    Drawback of ANOVA

    • When we reject ��, we only conclude “at least two means are not equal”

    • Problem: Many different ways of rejecting ��• �� ≠ ��, �� = ��, �� = ��• �� = ��, �� ≠ ��, �� = ��• �� = ��, �� = ��, �� ≠ ��• �� ≠ ��, �� ≠ ��, �� = ��• �� ≠ ��, �� = ��, �� ≠ ��• �� = ��, �� ≠ ��, �� ≠ ��• �� ≠ �� ≠ ��

    Looking Ahead: ________________________ (next class) will tell

    us which of these scenarios is true.

    _______________________ not equal

    _______________________ not equal

    ________________ are equal

    Example #3: Role of Mean

    • Scenario: Boxplots could have same spreads but different sample means.

    • Question: What impact does having more spread out means have on rejecting ��?

    • Answer: _____________________________________________• Between group variation

    ______________________

    • Within group variation______________________

    • Test statistic ______________

    Example #4: Role of Standard Deviation

    • Scenario: Boxplots could have means of 30, 40, and 50 on left or right depending on std. devs.

    • Question: What impact does having smaller standard deviations have on rejecting ��?

    • Answer: _____________________________________________• Between group variation

    ______________________

    • Within group variation______________________

    • Test statistic ______________

  • Summary

    • ANOVA: used to compare means of three or more independent populations

    • Compare between group and within groupvariation to calculate test statistic

    • Test Statistic: � =��

    ��!with − 1 and � − df

    • Conclusion: either all means are equal or at least two means differ

    • Drawback: Cannot tell which means differ (yet)


Top Related