prof. saibal chattopadhyay iim calcutta
DESCRIPTION
IIMC Long Duration Executive Education Executive Programme in Business Management Statistics for Managerial Decisions Advanced Statistical Inference. Prof. Saibal Chattopadhyay IIM Calcutta. A Brief Review. Uncertainty and Randomness: Theory of Probability - PowerPoint PPT PresentationTRANSCRIPT
IIMC Long Duration Executive EducationExecutive Programme in Business Management
Statistics for Managerial Decisions
Advanced Statistical Inference
Prof. Saibal Chattopadhyay
IIM Calcutta
A Brief Review
• Uncertainty and Randomness: Theory of Probability• Decision Making Under Uncertainty: Utility Theory• Random Variables & Probability Distributions:
Binomial, Poisson, Normal, Exponential • Joint Distribution of Two Random Variables-
Marginal Distributions, Mean, Variance, Covariance, Correlation Coefficient, Independence of random variables
• Regression Approach to the analysis of a bivariate data – Curve fitting and Least Squares Principle
• Sampling Theory: SRS, Stratified RS, Systematic Sampling, Central Limit Theorem, Multistage Sampling, Chi-Square, t and F distributions
Statistical Inference
• Sample based Inference about a population• Estimation (Point and Interval)• Hypothesis TestingCharacteristics of Interest:• Population Mean• Population SD• Population ProportionOne sample problems:Mean (SD known or unknown; n large or small) Two Sample Problems:• Difference of two means• Ratio of Two SD’s• Difference of two proportions• Case Studies 1-5
Some other Inference problems
• Categorical Data AnalysisVariable is categorical in nature: Information
available in terms of frequencies (number of individuals) belonging to different categories
Example: 100 randomly selected items returned to a department store are categorized as: Cash Refund: 34Credit to Charge Account: 18Merchandise Exchange: 31Return Refused: 17
Categorical Data Analysis
Research Question: Are these four possible dispositions for a return request occur with equal frequency?
Need a hypothesis-testing to assess whether the Data (four frequencies: 34, 18, 31, 17) support the theory that probabilities for observations to fall in these four categories are all equal
P1, P2, P3, and P4 are these probabilities, with P1 + P2 + P3 + P4 = 1
To test Ho: P1 = P2 = P3 = P4
Hypothesis-testing for categorical data
• What is the alternative hypothesis?
Ha: Not all Pi’s are equal
How to proceed?
With 2 such categories, no problem: the test is the equality of two proportions
With multiple categories?
• Goodness of fit tests for Ho versus Ha
An extension for testing equality of proportions from several populations
Goodness-of-fit test
General idea:• k categories• P1, P2, …, Pk: true unknown proportions for these k
categories; P1 + P2 +…+ Pk = 1• Ho: P1 = P1o; P2 = P2o; … Pk=Pko• Ha: Ho not true; at least one Pi differs from the
corresponding hypothesized value• Level of significance = = 0.05 or 0.01• Data given: Observed frequencies f1, f2, …, fk for
these k categories; f1 + f2 + …+ fk = n = sample size
Goodness-of-fit test
• Calculate the ‘expected frequencies’ for these k categories if Ho is true; Under Ho,
Expected Frequency = Probability*Sample Size
• fe1 = n.P1o; fe2 = n.P2o; … ; fek = n.Pko
• fe1 + fe2 + … + fek = n = total frequency
• Examine how closely these correspond to the actual observed frequencies
• If they match closely, accept Ho
• Reject Ho otherwise
Goodness-of-fit test
How to judge: Test Statistic?2 = (obs. freq. – exp. freq.)2 /(exp. freq.)
= (fi – fei)2 /(fei)A Chi-square based on frequencies, both observed and
expected (under Ho)A Frequency Chi-Square Test• Distribution of this Chi-square?• Approximately Chi-square with (k-1) d.f. provided all
expected frequencies are ‘large’• How large: all fei 5
Goodness-of-fit Chi-Square Test
• If Ho is true, discrepancies are small and so Chi-Square value is ‘small’
• Reject Ho if 2 is ‘large’: 2 > C• How large is large? Use level = 0.05 or 0.012 : upper -point of 2 (d.f = k –1): TableBack to the Example:• k = 4 (number of categories)• Ho: P1 = P2 = P3 = P4 = ¼ ; Ha: Not Ho• Obs. Freq: f1 = 34, f2 = 18, f3 = 31, f4 = 17• N = total frequency = 100
Goodness-of-fit Chi-Square
• Expected Frequencies: fe1 = 100. ¼ = 25 = fe2 = fe3 = fe4
2 = (34 – 25)2/25 + (18 – 25)2/25 + (31 – 25)2/25 + (17 – 25)2/25 = 9.2
• Suppose = 0.05 ( to test at 5% level) 2 value from table (d.f = k –1 =3) = 7.815• Observed 2 = 9.2 > 7.815 : Reject Ho• Return of merchandise not equally frequent
over the different categories, at 5% level
Another Application – Test of Homogeneity
• 2 or more similarly classified populations• Data: Frequencies falling in each category are
known from each population• To Test if the populations are identical2 populations - K classes eachP1, P2, …, Pk : Probabilities for Population1P1*, P2*, … Pk*: Prob. For Population 2Ho: P1=P1*, P2 =P2*, …, Pk=Pk*Ha: They are not all equal
Case Study 6: Right of Advertising
• A study of consumers and dentists attitude toward advertising of dental services
“Should Dentists Advertise?” - Journal of Advertising Research, June 1982, 33-38.
Two samples: 101 consumers (population1)& 124 dentists (population 2) were asked to respond
to the following statement:“I favour the use of advertising by dentists to
attract new patients”Possible Responses are: (strongly agree, agree,
neutral, disagree, strongly disagree):
Should Dentists Advertise?
• Data table
Strongly
Agree
Agree Neutral Disagree Strongly Disagree
Consumers 34 49 9 4 5
Dentists 9 18 23 28 46
Should Dentists Advertise?
Research Question: Are the two groups - consumers and dentists – differ in their attitudes toward advertising?
Probability Table:Strongly Agree
Agree Neutral Disagree
Strongly disagree
Total
Consumers
P1 P2 P3 P4 P5 1
Doctors P1* P2* P3* P4* P5* 1
Should Dentists Advertise?
To Test Ho: P1=P1*, …, P5 = P5*
Expected Cell Count Formula:Exp = (Row marginal total)(Col. Marginal total)
Grand Total
Chi-sq = (obs. freq. – exp. freq.)2 /(exp. freq.)
DF = (# Rows – 1) (#Columns –1)
Reject Ho if observed Chi-sq >tabled Chi-sq.
(Assumption: all expected frequency 5)
Should Dentists Advertise?
Table of observed (expected) counts:
Strongly Agree
Agree Neutral Disagree Strongly disagree
Total
Consumers
34
(19.30)
49
(30.08)
9
(14.36)
4
(14.36)
5
(22.89)
101
(101.00)
Doctors 9
(23.70)
18
(36.92)
23
(17.64)
28
(17.64)
46
(28.11)
124
(124.00)
Total 43
(43.00)
67
(67.00)
32
(32.00)
32
(32.00)
51
(51.00)
225
(225.00)
Should Dentists Advertise?
Calculation of the Test Statistic:Here all expected frequencies are 5. Chi-sq = (34 – 19.3)2 + … + (46 – 28.11)2
19.30 28.11 = 84.47
Degrees of freedom = (2-1)(5-1) = 4Use alpha = 0.05Chi-sq from table = 9.488Reject Ho if obs. Chi-sq > 9.488
Should Dentists Advertise?
Conclusion: Since obs. Value of Chi-sq = 84.47 > 9.488, we shall reject Ho at 5% level of significance. Thus in the light of the given data, it appears that the two groups (consumers and doctors) differ significantly in their attitudes toward advertising.
A Test for Independence
• Two attributes A and B• A has k levels A1, A2, …, Ak• B has l levels B1, B2, …, Bl• Data available on k.l level combinations
fij = number of observations (frequency) belonging to (Ai, Bj), n = total frequency
• To test Ho: A and B are independent• Alternative Ha: they are associated
Case Study 7: TV viewing and Fitness
“Television viewing and Physical fitness in adults”: Research Quarterly for Exercise and Sport (1990), 315-320.
A: Physical Fitness has k=2 levelsA1=physically fit, A2=not physically fitB: TV viewing time (in hours per day,
rounded to the nearest hour) has l=4 levels
B1= 0, B2= (1-2), B3= (3-4), B4 =(5 or more)
TV viewing and Physical Fitness
• Data available on 1200 adult males surveyed gave the following counts:
Physically fit Not Physically fit Row marginal total
0 35 147 182
1-2 101 629 730
3-4 28 222 250
5 or more 4 34 38
Col marginal total
168 1032 1200
TV viewing and Physical Fitness
Ho: TV viewing and Physical fitness are independent attributes
Ha: They are associatedExpected Cell Counts under Ho:
(Row total)(Column Total)Total Frequency
Chi-sq = (obs. – exp.)2 / exp Degrees of freedom = (k-1)(l-1)Reject Ho if observed Chi-sq > Tabled Chi-sq.
TV viewing and Physical Fitness
Table of Observed (Expected) Frequencies
TV Group Physically Fit Not Physically Fit
Row totals
0 35
(25.5)
147
(156.5)
182
(182.0)
1-2 101
(102.2)
629
(627.8)
730
(730.0)
3-4 28
(35.0)
222
(215.0)
250
(250.0)
5 or more 4
(5.3)
34
(32.7)
38
(38.0)
Column totals 168
(168.0)
1032
(1032.0)
1200
(1200.0)
TV viewing and Physical Fitness
• All expected frequencies are 5; so we may use the goodness-of-fit chi-square
Degrees of Freedom = (2-1)(4-1) = 3Chi-sq = (35 – 25.5)2 + … + (34 – 32.7)2
25.5 32.7 = 6.13
At 5% level, tabled Chi-sq = 7.815Decision Rule: Reject Ho if Chi-sq > 7.815
TV Viewing and Physical Fitness
• Conclusion: Since Observed Chi-sq = 6.13 is less than tabled value 7.815, we fail to reject Ho at 5% level. This means that in the light of the given data, it appears that Physical Fitness and TV viewing are independent of each other.
References
Text Book for the Course
• Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited)
Suggested Reading
• Complete Business Statistics: Aczel, A.D. & Sounderpandian, J. – Fifth Edition (Tata McGraw-Hill)