application of anova
TRANSCRIPT
Business StatisticsPresentation
Presented by:-1. Siddharth Nahata2. Rohit Patidar3. Deepali Agarwal4. Rajat Srivastava5. Prachi Mandhani6. Sumant Singh
Application of ANOVA
2
STATITICAL DATA ANALYSIS
COMMON TYPES OF ANALYSIS?
1. Examine Strength and Direction of Relationships
a. Bivariate (e.g., Pearson Correlation—r) Between one variable and another:
rxy or Y = a + b1 x1
b. Multivariate (e.g., Multiple Regression Analysis) Between one dep. var. and each of several indep. variables, while
holding all other indep. variables constant:
Y = a + b1 x1 + b2 x2 + b3 x3 + . . . + bk xk
2. Compare Groups
a. Compare Proportions (e.g., Chi-Square Test—2) H0: P1 = P2 = P3 = … = Pk
b. Compare Means (e.g., Analysis of Variance) H0: µ1 = µ2 = µ3 = …= µk
ONE-WAY ANOVA
• To compare the mean values of a certain characteristic among two or more groups.
• To see whether two or more groups are equal (or different) on a given metric characteristic.
3
ANOVA was developed in 1919 by Sir Ronald Fisher, a British statistician and geneticist/evolutionary biologist
When Do You Use ANOVA?
Sir Ronald Fisher (1890-1962)
ONE-WAY ANOVA
4
H0: There are no differences among the mean values of the groups being compared (i.e., the group means are all equal)– H0: µ1 = µ2 = µ3 = …= µk
H1 (Conclusion if H0 rejected)?Not all group means are equal(i.e., at least one group mean is different from the rest).
H0 in ANOVA?
ONE-WAY ANOVA
• Scenario 1. When comparing 2 groups, a one-step test : 2 Groups: A B
Step 1: Check to see if the two groups are different or not, and if so, how.
• Scenario 2. When comparing >3 groups, if H0 is rejected, it isa two-step test: >3 Groups: A B C
Step 1: Overall test that examines if all groups are equal or not. And, if not all are equal (H0 rejected), then:
Step 2: Pair-wise (post-hoc) comparison tests to see where (i.e., among which groups) the differences exit, and how.
5
So, the number of steps involved in ANOVA depend on if we are comparing 2 groups or > 2 groups:
ANOVA TABLE
Sum of Squares
df
Mean Squares
F-Ratio
SSB (Between Groups Sum Of Squares)
K – 1 MSB = SSB / K-1 F = MSB / MSW corresponding
SSW (Within Groups Sum of Squares)
N – K MSW = SSW / N-K
SST (Total Sum of Squares)
N – 1
Kn
xxxxxx kkiii
222 )(...)()(MSW 2211
1
2222
211 )(...)()(
Kkknnn xxxxxx
MSB
6
Typical solution presented in statistics classes require…• Constructing an ANOVA TABLE
Test Statistic
ONE-WAY ANOVA
• Sample Data: A random sample of 9 banks, 10 retailers, and 10 utilities.• Table 1. Earnings Per Share (EPS) of Sample Firms in the Three
IndustriesBanking Retailing Utility
6.42 3.52 3.552.83 4.21 2.138.94 4.36 3.246.80 2.67 6.475.70 3.49 3.064.65 4.68 1.806.20 3.30 5.292.71 2.68 2.968.34 7.25 2.90----- 0.16 1.73
nB = 9 nR = 10 nU = 10 n = 29
H0: There were no differences in average EPS of Banks, Utilities, and Retailers.
First logical thing you do?
_ _ _ =xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21 7
EXAMPLE: Whether or not average earnings per share (EPS) for commercial banks, retailing operations, & utility companies (variable Industry) was the same last year.
8
EPS in various sectors
1 2 3 4 5 6 7 8 9 100.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
BankingRetailingUtility
ONE-WAY ANOVA
9
Why is it called ANOVA?• Differences in EPS (Dep. Var.) among all 29
firms hastwo components--differences among the groups and differences within the groups. That is,
a. There are some differences in EPS among the three groups of firms (Banks vs. Retailers vs. Utilities), and
b. There are also some differences/variations in EPS of the firms within each of these groups (among banks themselves, among retailers themselves, and among utilities themselves).
• ANOVA will partition/analyze the variance of the dependent variable (i.e., the differences in EPS) and traces it to its two components/sources--i.e., to differences between groups vs. differences within groups.
ONE-WAY ANOVA• Table 1. Earnings Per Share (EPS) of Sample Firms in the
Three Industries
Banking Retailing Utility6.42 3.52 3.552.83 4.21 2.138.94 4.36 3.246.80 2.67 6.475.70 3.49 3.064.65 4.68 1.806.20 3.30 5.292.71 2.68 2.968.34 7.25 2.90----- 0.16 1.73
nB = 9 nR = 10 nU = 10 n = 29
_ _ _ =xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21
Total WITHIN Group Variance (or Mean Square WITHIN)?10)310109(
)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42MSW
222222
Mean Square WITHIN Groups (MSW):
11
K)-(N
WithinSS
Groups ofNumber -Size Sample Total
Means Group RespectiveTheir From nsObservatio All of Deviations Squared of SumMSW
K
xxxxxx
nUUiRRiBBi
222 )()()(
MSW
350.326
112.87MSW
Called “Degrees of Freedom”=
(nB-1)+(nR-1)+(nU-1)
)310109(
)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42MSW
222222
Let’s see what we just did:
The generic mathematical formula for MSW:
ONE-WAY ANOVA
ONE-WAY ANOVA
12
• Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries
Banking Retailing Utility6.42 3.52 3.552.83 4.21 2.138.94 4.36 3.246.80 2.67 6.475.70 3.49 3.064.65 4.68 1.806.20 3.30 5.292.71 2.68 2.968.34 7.25 2.90----- 0.16 1.73
nB = 9 nR = 10 nU = 10 n = 29
_ _ _ =xB = 5.84 xR = 3.63 xU = 3.31 x = 4.21
Let’s now compute the BETWEEN Group Variance (Mean Square BETWEEN--MSB)?
698.172
397.35
13
)21.431.3(10)21.463.3(10)21.484.5(9 222
MSB
698.172
397.35
13
)21.431.3(10)21.463.3(10)21.484.5(9 222
MSB
Mean Square BETWEEN Groups (MSB):
13
1 -K
Between SS
1- Groups ofNumber
Mean Grand thefrom Means Group of Deviations Squared of SumMSB
Called Degrees ofFreedom
1
)()()( 222
K
xxnxxnxxnMSB uuRRBB
Let’s see what we just did:
Mathematical formula for MSB:
Weighted by respective group sizes
ONE-WAY ANOVA
ONE-WAY ANOVA
14
Mean Square Between Groups = MSB = 17.698
MSB represents the portion of the total differences/variations in EPS (the dependent variable) that is attributable to (or explained by) differences BETWEEN groups (e.g., industries)
• That is, the part of differences in companies’ EPS that result from whether they are banks, retailers, or utilities.
ONE-WAY ANOVA
15
Mean Square Within Groups (MS Residual/Error) =MSW = 3.35
MSW represents:a. The differences in EPS (the dependent variable) that
aredue to all other factors that are not examined and not controlled for in the study (e.g., diversification level, firm size, etc.)
Plus . . .
b. The natural variability of EPS (the dependent variable) among members within each of the comparison groups (Note that even banks with the same size and same level of diversification would have different EPS levels).
ONE-WAY ANOVA
16
Now, let’s compare MSB & MSW:
MSB = 17.6 and MSW = 3.35.
QUESTION: Based on the logic of ANOVA, when would we consider two (or more) groups as different/unequal?
When MSB is significantly larger than MSW.
QUESTION:
What would be a reasonable index (a single number) that willshow how large MSB is compared to MSW?
(i.e., a single number that will show if MSB is larger than, equal to, or smaller than MSW)?
Compare BETWEEN and WITHIN GroupVariances/Mean Squares--Compute the F-Ratio:
• Ratio of MSB and MSW (Call it F-Ratio):
• What can we infer when F-ratio is close to 1?• MSB and MSW are likely to be equal and, thus,
there is a strong likelihood that NO difference exists among the comparison groups.
• How about when F-ratio is significantly larger than 1?• The more F-ratio exceeds 1, the larger MSB is
compared to MSW and, thus, the stronger would be the likelihood/evidence that group difference(s) exist.
• Results of the above computations are usually summarized
in an ANOVA TABLE such as the one that follows: 17
282.5350.3
698.17
MSW
MSBF
ANOVA TABLE
Source Sum of Squares
df Mean Squares F
Between Groups
35.397 K – 1 = 2 35.39 / 2 = 17.698 17.698 / 3.35 = 5.282
Within Groups
87.112 N – K = 26 87.11 / 26 = 3.350
Total 122.509 N – 1 = 28
698.172
397.35
13
)21.431.3(10)21.463.3(10)21.484.5(9 222
MSB
18
350.326
112.87MSW
)310109(
)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42MSW
222222
ONE-WAY ANOVA
For our sample companies, EPS difference across the three industries (MSB) is more than 5 times the EPS difference among firms within the industries (MSW)
• QUESTION: What is our null Hypothesis?
• QUESTION: Is the above F-ratio of 5.28 large enough to warrant rejecting the null?• ANSWER: It would be if the chance of being wrong (in
rejecting the null) does not exceed 5%.• So, look up the F-value in the table of F-distribution
(under appropriate degrees of freedom) to find out what the -level will be if, given this F-value, we decide to reject the null.• Degrees of Freedom: v1 = k – 1 = 2
v2 = n – k = 26 19
Interpretation and Conclusion:QUESTION: What does the F = 5.28 mean, intuitively?
21
F = 4.27 is significant at = 0.025. That is, if F=4.27 and we reject H0, we would face 5% chance
of being wrong.But, our F = 5.28 > 4.27
So, what can we say about our -level? Will it be larger or smaller than 0.025?
ONE-WAY ANOVA
22
• The odds of being wrong, if we decide to reject the null, would be less than 2.5% (i.e., < 0.025) .
Would rejecting the null be a safe bet? Conclusion?
Reject the null and conclude that the average EPS is NOT EQUAL FOR ALL GROUPS (industries) being compared.
• Our F = 5.28 > 4.27