secondary data, measures, hypothesis formulation, chi-square market intelligence julie edell britton...

33
Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Upload: deborah-johnson

Post on 11-Jan-2016

241 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Secondary Data, Measures, Hypothesis Formulation, Chi-Square

Market IntelligenceJulie Edell Britton

Session 3August 21, 2009

Page 2: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Today’s Agenda

Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square

Page 3: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

3

• National Insurance Case for Sat. 8/22– Stephen will do a tutorial today, Friday, 8/21 from 1:00

-2:15 in the MBA PC Lab and be available tonight from 7 – 9 pm in the MBA PC Lab to answer questions

– Submit slides by 8:00 am on Sat. 8/22– 2 slides with your conclusions – you may add

Appendices to support you conclusions

Announcements

Page 4: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Primary vs. Secondary Data

Primary -- collected anew for current purposes Secondary -- exists already, was collected for some other purpose

Finding Secondary Data Online @ Fuqua http://library.fuqua.duke.edu

Page 5: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Primary vs. Secondary Data

Page 6: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Evaluating Sources of Secondary Data

If you can’t find the source of a number, don’t use it. Look for further data.Always give sources when writing a report.

Applies for Focus Group write-ups too

Be skeptical.

Page 7: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Secondary Data: Pros & Cons

Advantagescheapquickoften sufficientthere is a lot of data out there

Disadvantagesthere is a lot of data out therenumbers sometimes conflict categories may not fit your needs

Page 8: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Types of Secondary Data

Internal External

Database: Can Slice/Dice; Need more processing

WEMBA_C IMS Health, Nielsen, IRI*

Summary: Can’t change categories, get new crosstabs

Knowledge Management

Conquistador, Simmons,

IRI_factbook

*IRI = Information Resources, Inc. (http://us.infores.com/)

Page 9: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Secondary Data Quality: KAD p. 120 & “What’s Behind the Numbers?”

Data consistent with other independent sources?What are the classifications? Do they fit needs?When were numbers collected? Obsolete?Who collected the numbers? Bias, resources?Why were the data collected? Self-interest?How were the numbers generated?

Sample sizeSampling method Measure typeCausality (MBA Marketing Timing & Internship)

Page 10: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

It is Hard to Infer Causality from Secondary Data

Took Core Marketing

Got Desired Marketing Internship

Did Not Get Desired Marketing Internship

Term 1 76% 24%

Term 3 51% 49%

Page 11: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Today’s Agenda

Announcements Secondary data quality Measure types Hypothesis Testing and Chi-Square

Page 12: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Measure TypesNominal: Unordered Categories

Male=1; Female = 2;

Ordinal: Ordered Categories, intervals can’t be assumed to be equal.

I-95 is east of I-85; I-80 is north of I-40; Preference data

Interval: Equally spaced categories, 0 is arbitrary and units arbitrary.

Fahrenheit temperature – each degree is equal, Attitudes

Ratio: Equally spaced categories, 0 on scale means 0 of underlying quantity.

$ Sales, Market Share

Page 13: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Meaningful Statistics & Permissible Transformations

Examples Permissible Transform

Meaningful Stats

Ratio Q1 = Bottles of wine Q2 = b*Q1 e.g., cases sold (b = 1/12)

All below + % change

Interval Wine Rating Scale 1 = Very Bad to 20 = Very Good

Att2 = a + (b*Att1) e.g., 81 to 100 (a = 80, b = 1) e.g., 80.5 to 90 (a = 80, b = .5)

All below + mean

Ordinal Rank order of wines 1 = favorite 2 = 2nd preferred 3 = least preferred

Any order preserving 100 = favorite 90 = 2nd preferred 0 = least preferred

All below + median

Nominal 1 = Pinot Noir 2 = Merlot 3 = Chardonnay

Any transformation is ok 16 = Pinot Noir 3 = Merlot 13 = Chardonnay

# of cases mode

Page 14: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Means and Medians with Ordinal Data

Gender Measure 1 Measure 2 Means

M 1 1 Measure 1

M 2 2 M=5.4 < F=5.6

F 3 3 Measure 2

F 4 4 M=65.4 > F=25.6

F 5 5

F 6 6 Medians

M 7 107 Measure 1

M 8 108 M=7 > F=5

M 9 109 Measure 2

F 10 110 M=107 > F=5

Page 15: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Ratio Scales & Index Numbers

Index= 100* (Per Capita Segment i) / (Per Capita Ave)

(000s) Sales Per Capita SegmentAge Group Population Units (000) Sales Index

<25 700 1400 2.00 7025-34 500 1250 2.50 8835-44 300 900 3.00 10545-54 240 960 4.00 14055 + 260 1196 4.60 161Total 2000 5706 2.85 100

Page 16: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Today’s Agenda

Announcements Southwestern Conquistador Beer Case Backward Market Research Secondary data quality Measure types Hypothesis Testing and Chi-Square

Page 17: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Cross Tabs of MBA Acceptance by Gender

Accept Reject

M 140 860 1000

F 60 740 800

200 1600

A. Raw Frequencies

Accept Reject

M .078 .478 .556

F .033 .411 .444

.111 .889 1.0

B. Cell Percentages

Page 18: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Accept Reject

M 140/ 1000 = .140

860/ 1000 = .860

1.00

F 60/ 800 =.075

740/ 800 = .925

1.00

C. Row Percentages

D. Column Percentages

Accept Reject

M 140/ 200 = .700

860/ 1600 = .538

F 60/ 200 =.300

740/ 1600 = .462

1.00 1.00

Page 19: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Rule of Thumb

If a potential causal interpretation exists, make numbers add up to 100% at each level of the causal factor.

Above: it is possible that gender (row) causes or influences acceptance (column), but not that acceptance influences gender. Hence, row percentages (format C) would be desirable.

Page 20: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Hypothesis Formulation and TestingHypothesis: What you believe the relationship is between the measures.

TheoryEmpirical EvidenceBeliefsExperience

Here: Believe that acceptance is related to gender

Null Hypothesis: Acceptance is not related to gender

Logic of hypothesis testing: Negative InferenceThe null hypothesis will be rejected by showing that a given observation would be quite improbable, if the hypothesis was true.

Want to see if we can reject the null.

Page 21: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Steps in Hypothesis Testing

1. State the hypothesis in Null and Alternative Form

– Ho: There is no relationship between gender and MBA acceptance

– Ha1: Gender and Acceptance are related (2-sided)

– Ha2: Fewer Women are Accepted (1-sided)

2. Choose a test statistic

3. Construct a decision rule

Page 22: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Chi-Square Test

Used for nominal data, to compare the observed frequency of responses to what would be “expected” under the null hypothesis.

Two types of tests

Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists between the two variables

Goodness of fit test – Compare whether the data sampled is proportionate to some standard

Page 23: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Chi-Square Test

k

i i

ii

E

EO

1

22 )( With (r-1)*(c-1)

degrees of freedom

iO Observed number in cell i i

iE Expected number in cell iunder independence

k number of cells r cnumber of rows number of columns

iE = Column Proportion * Row Proportion * total number observed

Page 24: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

MBA Acceptance Data Contingency

Accept Reject

M 140 860 1000

F 60 740 800

200 1600 1800

A. Observed Frequencies Accept Reject

M .078 .478 .556

F .033 .411 .444

.111 .889 1.0

B. Cell Percentages

Accept Reject

M .111*.556*1800=111 .889*.556*1800=890

F .111*.444*1800= 89 .889*.444*1800=710

C. Expected Frequencies

Page 25: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Chi-Square Test

k

i i

ii

E

EO

1

22 )(

With (r-1)*(c-1) degrees of freedom

i

2=(140-111)2/111 + (860-890)2/890 + (60-89)2/89 + (740-710)2/710= 19.30 So?

3. Construct a decision rule

Page 26: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Decision Rule1. Significance Level -

2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (r-1)*(c-1), so here that would be 1. When the number of cells is larger, we need a larger test statistic to reject the null.

3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517Ha1: Gender and Acceptance are related (2-sided) Critical Value =

3.84 Ha2: Fewer Women are Accepted (1-sided) Critical Value = 2.71

4. Decision Rule: Reject the Ho if calculated Chi-sq value (19.3) > the test critical value (3.84) for Ha1 or (2.71) for Ha2

05. Probability of rejecting the Null Hypothesis, when it is true

Page 27: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Chi-Square Table

Page 28: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Chi-Square Test

Used for nominal data, to compare the observed frequency of responses to what would be “expected” under some specific null hypothesis.

Two types of tests

Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists

Goodness of fit test – Compare whether the data sampled is proportionate to some standard

Page 29: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Goodness of fit – Chi-Square

Ho: Car Color Preferences have not shiftedHa: Car color Preferences have shifted

Data Historic Distribution Expected # = Prob*n

Red 680 30% 750Green 520 25% 625Black 675 25% 625White 625 20% 500Tot (n) 2500

Do we observe what we expected?

Page 30: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Chi-Square Test

k

i i

ii

E

EO

1

22 )(

With (k-1) degrees of freedom

i

2=(680-750)2/750 + (520-625)2/625 + (675-625)2/625 + (625-500)2/500= 59.42

So?

3. Construct a decision rule

Page 31: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Decision Rule1. Significance Level -

2. Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (k-1), so here that would be 3. When the number of cells is larger, we need a larger test statistic to reject the null.

3. Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517 Ha: Preference have changed (2-sided) Critical Value = 7.81

4. Decision Rule: Reject the Ho if calculated Chi-sq value (59.42) > the test critical value (7.81).

05. Probability of rejecting the Null Hypothesis, when it is true

Page 32: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

Chi-Square Table

Page 33: Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009

RecapFinding & Evaluating Secondary DataMeasure Types

permissible transformationsMeaningful statistics

Index #sCrosstabs

Casting right direction Chi-square statistic

Contingency Test Goodness of Fit Test