prof. saibal chattopadhyay iim calcutta

IIMC Long Duration Executive EducationExecutive Programme in Business Management

Statistics for Managerial Decisions

Advanced Statistical Inference

Prof. Saibal Chattopadhyay

IIM Calcutta

A Brief Review

• Uncertainty and Randomness: Theory of Probability• Decision Making Under Uncertainty: Utility Theory• Random Variables & Probability Distributions:

Binomial, Poisson, Normal, Exponential • Joint Distribution of Two Random Variables-

Marginal Distributions, Mean, Variance, Covariance, Correlation Coefficient, Independence of random variables

• Regression Approach to the analysis of a bivariate data – Curve fitting and Least Squares Principle

• Sampling Theory: SRS, Stratified RS, Systematic Sampling, Central Limit Theorem, Multistage Sampling, Chi-Square, t and F distributions

Statistical Inference

• Sample based Inference about a population• Estimation (Point and Interval)• Hypothesis TestingCharacteristics of Interest:• Population Mean• Population SD• Population ProportionOne sample problems:Mean (SD known or unknown; n large or small) Two Sample Problems:• Difference of two means• Ratio of Two SD’s• Difference of two proportions• Case Studies 1-5

Some other Inference problems

• Categorical Data AnalysisVariable is categorical in nature: Information

available in terms of frequencies (number of individuals) belonging to different categories

Example: 100 randomly selected items returned to a department store are categorized as: Cash Refund: 34Credit to Charge Account: 18Merchandise Exchange: 31Return Refused: 17

Categorical Data Analysis

Research Question: Are these four possible dispositions for a return request occur with equal frequency?

Need a hypothesis-testing to assess whether the Data (four frequencies: 34, 18, 31, 17) support the theory that probabilities for observations to fall in these four categories are all equal

P1, P2, P3, and P4 are these probabilities, with P1 + P2 + P3 + P4 = 1

To test Ho: P1 = P2 = P3 = P4

Hypothesis-testing for categorical data

• What is the alternative hypothesis?

Ha: Not all Pi’s are equal

How to proceed?

With 2 such categories, no problem: the test is the equality of two proportions

With multiple categories?

• Goodness of fit tests for Ho versus Ha

An extension for testing equality of proportions from several populations

Goodness-of-fit test

General idea:• k categories• P1, P2, …, Pk: true unknown proportions for these k

categories; P1 + P2 +…+ Pk = 1• Ho: P1 = P1o; P2 = P2o; … Pk=Pko• Ha: Ho not true; at least one Pi differs from the

corresponding hypothesized value• Level of significance = = 0.05 or 0.01• Data given: Observed frequencies f1, f2, …, fk for

these k categories; f1 + f2 + …+ fk = n = sample size


• Calculate the ‘expected frequencies’ for these k categories if Ho is true; Under Ho,

Expected Frequency = Probability*Sample Size

• fe1 = n.P1o; fe2 = n.P2o; … ; fek = n.Pko

• fe1 + fe2 + … + fek = n = total frequency

• Examine how closely these correspond to the actual observed frequencies

• If they match closely, accept Ho

• Reject Ho otherwise


How to judge: Test Statistic?2 = (obs. freq. – exp. freq.)2 /(exp. freq.)

= (fi – fei)2 /(fei)A Chi-square based on frequencies, both observed and

expected (under Ho)A Frequency Chi-Square Test• Distribution of this Chi-square?• Approximately Chi-square with (k-1) d.f. provided all

expected frequencies are ‘large’• How large: all fei 5

Goodness-of-fit Chi-Square Test

• If Ho is true, discrepancies are small and so Chi-Square value is ‘small’

• Reject Ho if 2 is ‘large’: 2 > C• How large is large? Use level = 0.05 or 0.012 : upper -point of 2 (d.f = k –1): TableBack to the Example:• k = 4 (number of categories)• Ho: P1 = P2 = P3 = P4 = ¼ ; Ha: Not Ho• Obs. Freq: f1 = 34, f2 = 18, f3 = 31, f4 = 17• N = total frequency = 100

Goodness-of-fit Chi-Square

• Expected Frequencies: fe1 = 100. ¼ = 25 = fe2 = fe3 = fe4

2 = (34 – 25)2/25 + (18 – 25)2/25 + (31 – 25)2/25 + (17 – 25)2/25 = 9.2

• Suppose = 0.05 ( to test at 5% level) 2 value from table (d.f = k –1 =3) = 7.815• Observed 2 = 9.2 > 7.815 : Reject Ho• Return of merchandise not equally frequent

over the different categories, at 5% level

Another Application – Test of Homogeneity

• 2 or more similarly classified populations• Data: Frequencies falling in each category are

known from each population• To Test if the populations are identical2 populations - K classes eachP1, P2, …, Pk : Probabilities for Population1P1*, P2*, … Pk*: Prob. For Population 2Ho: P1=P1*, P2 =P2*, …, Pk=Pk*Ha: They are not all equal

Case Study 6: Right of Advertising

• A study of consumers and dentists attitude toward advertising of dental services

“Should Dentists Advertise?” - Journal of Advertising Research, June 1982, 33-38.

Two samples: 101 consumers (population1)& 124 dentists (population 2) were asked to respond

to the following statement:“I favour the use of advertising by dentists to

attract new patients”Possible Responses are: (strongly agree, agree,

neutral, disagree, strongly disagree):

Should Dentists Advertise?

• Data table

Strongly

Agree

Agree Neutral Disagree Strongly Disagree

Consumers 34 49 9 4 5

Dentists 9 18 23 28 46


Research Question: Are the two groups - consumers and dentists – differ in their attitudes toward advertising?

Probability Table:Strongly Agree

Agree Neutral Disagree

Strongly disagree

Total

Consumers

P1 P2 P3 P4 P5 1

Doctors P1* P2* P3* P4* P5* 1


To Test Ho: P1=P1*, …, P5 = P5*

Expected Cell Count Formula:Exp = (Row marginal total)(Col. Marginal total)

Grand Total

Chi-sq = (obs. freq. – exp. freq.)2 /(exp. freq.)

DF = (# Rows – 1) (#Columns –1)

Reject Ho if observed Chi-sq >tabled Chi-sq.

(Assumption: all expected frequency 5)


Table of observed (expected) counts:

Strongly Agree

Agree Neutral Disagree Strongly disagree

Total

Consumers

34

(19.30)

49

(30.08)

9

(14.36)

4

(14.36)

5

(22.89)

101

(101.00)

Doctors 9

(23.70)

18

(36.92)

23

(17.64)

28

(17.64)

46

(28.11)

124

(124.00)

Total 43

(43.00)

67

(67.00)

32

(32.00)

32

(32.00)

51

(51.00)

225

(225.00)


Calculation of the Test Statistic:Here all expected frequencies are 5. Chi-sq = (34 – 19.3)2 + … + (46 – 28.11)2

19.30 28.11 = 84.47

Degrees of freedom = (2-1)(5-1) = 4Use alpha = 0.05Chi-sq from table = 9.488Reject Ho if obs. Chi-sq > 9.488


Conclusion: Since obs. Value of Chi-sq = 84.47 > 9.488, we shall reject Ho at 5% level of significance. Thus in the light of the given data, it appears that the two groups (consumers and doctors) differ significantly in their attitudes toward advertising.

A Test for Independence

• Two attributes A and B• A has k levels A1, A2, …, Ak• B has l levels B1, B2, …, Bl• Data available on k.l level combinations

fij = number of observations (frequency) belonging to (Ai, Bj), n = total frequency

• To test Ho: A and B are independent• Alternative Ha: they are associated

Case Study 7: TV viewing and Fitness

“Television viewing and Physical fitness in adults”: Research Quarterly for Exercise and Sport (1990), 315-320.

A: Physical Fitness has k=2 levelsA1=physically fit, A2=not physically fitB: TV viewing time (in hours per day,

rounded to the nearest hour) has l=4 levels

B1= 0, B2= (1-2), B3= (3-4), B4 =(5 or more)

TV viewing and Physical Fitness

• Data available on 1200 adult males surveyed gave the following counts:

Physically fit Not Physically fit Row marginal total

0 35 147 182

1-2 101 629 730

3-4 28 222 250

5 or more 4 34 38

Col marginal total

168 1032 1200


Ho: TV viewing and Physical fitness are independent attributes

Ha: They are associatedExpected Cell Counts under Ho:

(Row total)(Column Total)Total Frequency

Chi-sq = (obs. – exp.)2 / exp Degrees of freedom = (k-1)(l-1)Reject Ho if observed Chi-sq > Tabled Chi-sq.


Table of Observed (Expected) Frequencies

TV Group Physically Fit Not Physically Fit

Row totals

0 35

(25.5)

147

(156.5)

182

(182.0)

1-2 101

(102.2)

629

(627.8)

730

(730.0)

3-4 28

(35.0)

222

(215.0)

250

(250.0)

5 or more 4

(5.3)

34

(32.7)

38

(38.0)

Column totals 168

(168.0)

1032

(1032.0)

1200

(1200.0)


• All expected frequencies are 5; so we may use the goodness-of-fit chi-square

Degrees of Freedom = (2-1)(4-1) = 3Chi-sq = (35 – 25.5)2 + … + (34 – 32.7)2

25.5 32.7 = 6.13

At 5% level, tabled Chi-sq = 7.815Decision Rule: Reject Ho if Chi-sq > 7.815

TV Viewing and Physical Fitness

• Conclusion: Since Observed Chi-sq = 6.13 is less than tabled value 7.815, we fail to reject Ho at 5% level. This means that in the light of the given data, it appears that Physical Fitness and TV viewing are independent of each other.

References

Text Book for the Course

• Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited)

Suggested Reading

• Complete Business Statistics: Aczel, A.D. & Sounderpandian, J. – Fifth Edition (Tata McGraw-Hill)

prof. saibal chattopadhyay iim calcutta

Documents