application of anova

23
Business Statistics Presentation Presented by:- 1. Siddharth Nahata 2. Rohit Patidar 3. Deepali Agarwal 4. Rajat Srivastava 5. Prachi Mandhani 6. Sumant Singh Application of ANOVA

Upload: rohit-patidar

Post on 08-Aug-2015

32 views

Category:

Education


0 download

TRANSCRIPT

Business StatisticsPresentation

Presented by:-1. Siddharth Nahata2. Rohit Patidar3. Deepali Agarwal4. Rajat Srivastava5. Prachi Mandhani6. Sumant Singh

Application of ANOVA

2

STATITICAL DATA ANALYSIS

COMMON TYPES OF ANALYSIS?

1. Examine Strength and Direction of Relationships

a. Bivariate (e.g., Pearson Correlation—r) Between one variable and another:

rxy or Y = a + b1 x1

b. Multivariate (e.g., Multiple Regression Analysis) Between one dep. var. and each of several indep. variables, while

holding all other indep. variables constant:

Y = a + b1 x1 + b2 x2 + b3 x3 + . . . + bk xk

2. Compare Groups

a. Compare Proportions (e.g., Chi-Square Test—2) H0: P1 = P2 = P3 = … = Pk

b. Compare Means (e.g., Analysis of Variance) H0: µ1 = µ2 = µ3 = …= µk

ONE-WAY ANOVA

• To compare the mean values of a certain characteristic among two or more groups.

• To see whether two or more groups are equal (or different) on a given metric characteristic.

3

ANOVA was developed in 1919 by Sir Ronald Fisher, a British statistician and geneticist/evolutionary biologist

When Do You Use ANOVA?

Sir Ronald Fisher (1890-1962)

ONE-WAY ANOVA

4

H0: There are no differences among the mean values of the groups being compared (i.e., the group means are all equal)– H0: µ1 = µ2 = µ3 = …= µk

H1 (Conclusion if H0 rejected)?Not all group means are equal(i.e., at least one group mean is different from the rest).

H0 in ANOVA?

ONE-WAY ANOVA

• Scenario 1. When comparing 2 groups, a one-step test : 2 Groups: A B

Step 1: Check to see if the two groups are different or not, and if so, how.

• Scenario 2. When comparing >3 groups, if H0 is rejected, it isa two-step test: >3 Groups: A B C

Step 1: Overall test that examines if all groups are equal or not. And, if not all are equal (H0 rejected), then:

Step 2: Pair-wise (post-hoc) comparison tests to see where (i.e., among which groups) the differences exit, and how.

5

So, the number of steps involved in ANOVA depend on if we are comparing 2 groups or > 2 groups:

ANOVA TABLE

Sum of Squares

df

Mean Squares

F-Ratio

SSB (Between Groups Sum Of Squares)

K – 1 MSB = SSB / K-1 F = MSB / MSW corresponding

SSW (Within Groups Sum of Squares)

N – K MSW = SSW / N-K

SST (Total Sum of Squares)

N – 1

Kn

xxxxxx kkiii

222 )(...)()(MSW 2211

1

2222

211 )(...)()(

Kkknnn xxxxxx

MSB

6

Typical solution presented in statistics classes require…• Constructing an ANOVA TABLE

Test Statistic

ONE-WAY ANOVA

• Sample Data: A random sample of 9 banks, 10 retailers, and 10 utilities.• Table 1. Earnings Per Share (EPS) of Sample Firms in the Three

IndustriesBanking Retailing Utility

6.42 3.52 3.552.83 4.21 2.138.94 4.36 3.246.80 2.67 6.475.70 3.49 3.064.65 4.68 1.806.20 3.30 5.292.71 2.68 2.968.34 7.25 2.90----- 0.16 1.73

nB = 9 nR = 10 nU = 10 n = 29

H0: There were no differences in average EPS of Banks, Utilities, and Retailers.

First logical thing you do?

_ _ _ =xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21 7

EXAMPLE: Whether or not average earnings per share (EPS) for commercial banks, retailing operations, & utility companies (variable Industry) was the same last year.

8

EPS in various sectors

1 2 3 4 5 6 7 8 9 100.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

BankingRetailingUtility

ONE-WAY ANOVA

9

Why is it called ANOVA?• Differences in EPS (Dep. Var.) among all 29

firms hastwo components--differences among the groups and differences within the groups. That is,

a. There are some differences in EPS among the three groups of firms (Banks vs. Retailers vs. Utilities), and

b. There are also some differences/variations in EPS of the firms within each of these groups (among banks themselves, among retailers themselves, and among utilities themselves).

• ANOVA will partition/analyze the variance of the dependent variable (i.e., the differences in EPS) and traces it to its two components/sources--i.e., to differences between groups vs. differences within groups.

ONE-WAY ANOVA• Table 1. Earnings Per Share (EPS) of Sample Firms in the

Three Industries

Banking Retailing Utility6.42 3.52 3.552.83 4.21 2.138.94 4.36 3.246.80 2.67 6.475.70 3.49 3.064.65 4.68 1.806.20 3.30 5.292.71 2.68 2.968.34 7.25 2.90----- 0.16 1.73

nB = 9 nR = 10 nU = 10 n = 29

_ _ _ =xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21

Total WITHIN Group Variance (or Mean Square WITHIN)?10)310109(

)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42MSW

222222

Mean Square WITHIN Groups (MSW):

11

K)-(N

WithinSS

Groups ofNumber -Size Sample Total

Means Group RespectiveTheir From nsObservatio All of Deviations Squared of SumMSW

K

xxxxxx

nUUiRRiBBi

222 )()()(

MSW

350.326

112.87MSW

Called “Degrees of Freedom”=

(nB-1)+(nR-1)+(nU-1)

)310109(

)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42MSW

222222

Let’s see what we just did:

The generic mathematical formula for MSW:

ONE-WAY ANOVA

ONE-WAY ANOVA

12

• Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries

Banking Retailing Utility6.42 3.52 3.552.83 4.21 2.138.94 4.36 3.246.80 2.67 6.475.70 3.49 3.064.65 4.68 1.806.20 3.30 5.292.71 2.68 2.968.34 7.25 2.90----- 0.16 1.73

nB = 9 nR = 10 nU = 10 n = 29

_ _ _ =xB = 5.84 xR = 3.63 xU = 3.31 x = 4.21

Let’s now compute the BETWEEN Group Variance (Mean Square BETWEEN--MSB)?

698.172

397.35

13

)21.431.3(10)21.463.3(10)21.484.5(9 222

MSB

698.172

397.35

13

)21.431.3(10)21.463.3(10)21.484.5(9 222

MSB

Mean Square BETWEEN Groups (MSB):

13

1 -K

Between SS

1- Groups ofNumber

Mean Grand thefrom Means Group of Deviations Squared of SumMSB

Called Degrees ofFreedom

1

)()()( 222

K

xxnxxnxxnMSB uuRRBB

Let’s see what we just did:

Mathematical formula for MSB:

Weighted by respective group sizes

ONE-WAY ANOVA

ONE-WAY ANOVA

14

Mean Square Between Groups = MSB = 17.698

MSB represents the portion of the total differences/variations in EPS (the dependent variable) that is attributable to (or explained by) differences BETWEEN groups (e.g., industries)

• That is, the part of differences in companies’ EPS that result from whether they are banks, retailers, or utilities.

ONE-WAY ANOVA

15

Mean Square Within Groups (MS Residual/Error) =MSW = 3.35

MSW represents:a. The differences in EPS (the dependent variable) that

aredue to all other factors that are not examined and not controlled for in the study (e.g., diversification level, firm size, etc.)

Plus . . .

b. The natural variability of EPS (the dependent variable) among members within each of the comparison groups (Note that even banks with the same size and same level of diversification would have different EPS levels).

ONE-WAY ANOVA

16

Now, let’s compare MSB & MSW:

MSB = 17.6 and MSW = 3.35.

QUESTION: Based on the logic of ANOVA, when would we consider two (or more) groups as different/unequal?

When MSB is significantly larger than MSW.

QUESTION:

What would be a reasonable index (a single number) that willshow how large MSB is compared to MSW?

(i.e., a single number that will show if MSB is larger than, equal to, or smaller than MSW)?

Compare BETWEEN and WITHIN GroupVariances/Mean Squares--Compute the F-Ratio:

• Ratio of MSB and MSW (Call it F-Ratio):

• What can we infer when F-ratio is close to 1?• MSB and MSW are likely to be equal and, thus,

there is a strong likelihood that NO difference exists among the comparison groups.

• How about when F-ratio is significantly larger than 1?• The more F-ratio exceeds 1, the larger MSB is

compared to MSW and, thus, the stronger would be the likelihood/evidence that group difference(s) exist.

• Results of the above computations are usually summarized

in an ANOVA TABLE such as the one that follows: 17

282.5350.3

698.17

MSW

MSBF

ANOVA TABLE

Source Sum of Squares

df Mean Squares F

Between Groups

35.397 K – 1 = 2 35.39 / 2 = 17.698 17.698 / 3.35 = 5.282

Within Groups

87.112 N – K = 26 87.11 / 26 = 3.350

Total 122.509 N – 1 = 28

698.172

397.35

13

)21.431.3(10)21.463.3(10)21.484.5(9 222

MSB

18

350.326

112.87MSW

)310109(

)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42MSW

222222

ONE-WAY ANOVA

For our sample companies, EPS difference across the three industries (MSB) is more than 5 times the EPS difference among firms within the industries (MSW)

• QUESTION: What is our null Hypothesis?

• QUESTION: Is the above F-ratio of 5.28 large enough to warrant rejecting the null?• ANSWER: It would be if the chance of being wrong (in

rejecting the null) does not exceed 5%.• So, look up the F-value in the table of F-distribution

(under appropriate degrees of freedom) to find out what the -level will be if, given this F-value, we decide to reject the null.• Degrees of Freedom: v1 = k – 1 = 2

v2 = n – k = 26 19

Interpretation and Conclusion:QUESTION: What does the F = 5.28 mean, intuitively?

20F = 3.37 is significant at = 0.05 (If F=3.37 and we reject H0, 5% chance of being wrong)

11

21

F = 4.27 is significant at = 0.025. That is, if F=4.27 and we reject H0, we would face 5% chance

of being wrong.But, our F = 5.28 > 4.27

So, what can we say about our -level? Will it be larger or smaller than 0.025?

ONE-WAY ANOVA

22

• The odds of being wrong, if we decide to reject the null, would be less than 2.5% (i.e., < 0.025) .

Would rejecting the null be a safe bet? Conclusion?

Reject the null and conclude that the average EPS is NOT EQUAL FOR ALL GROUPS (industries) being compared.

• Our F = 5.28 > 4.27

23

Thank You