market intelligence session 6 kerlander soup case, survey design
TRANSCRIPT
Market Intelligence Session 6
Kerlander Soup Case, Survey Design
Kerlander Soup
• What situation is Kerlander facing – what motivated research?
• What Action Should Kerlander Take Based on the Research Findings?
2
3
Action Alternatives
regular creamy extra creamy
A (recommendation)
B
C
D
E
F
G* (status quo)
4
Focus Groups
• Appropriate use?• Implementation?• How they used insights?
5
Focus group insight
• Taste
Purchase decision function of:
6
Construct validity
• Taste (creaminess)
Purchase decision function of:
7
Construct validity
• Taste (creaminess)• Packaging• Price• Ingredients• Nutrition• Availability• Awareness• Label• Shelf position• Promotions• Etc etc
Purchase decision function of:
8
Construct validity
• Focus group: direction of bias?
9
Imagine they confirmed that taste was main driver of purchase…
• What do you think of taste test?
10
Reliability
Time 1 Time 2 Time 3
Soup. A 1 2 1
Soup. B 2 1 3
Soup. C 3 5 2
Soup. D 4 3 5
Soup. E 5 4 4
11
ReliabilityCorrelations – Spearman’s Rho
Time 1 Time 2 Time 3
soup. A 1 2 1
soup. B 2 1 3
soup. C 3 5 2
soup. D 4 3 5
soup. E 5 4 4
12
ReliabilityCorrelations – Spearman’s Rho
Time 1 Time 2 Time 3
soup. A 1 2 1
soup. B 2 1 3
soup. C 3 5 2
soup. D 4 3 5
soup. E 5 4 4
13
ReliabilityCorrelations – Spearman’s Rho
Time 1 Time 2 Time 3
soup. A 1 2 1
soup. B 2 1 3
soup. C 3 5 2
soup. D 4 3 5
soup. E 5 4 4
14
Reliability: correlationsSpearman’s Rho
Time 1 Time 2 Time 3
Time 1 1
Time 2 1
Time 3 1
15
What would this indicate?
Time 1 Time 2 Time 3
Time 1 1
Time 2 0.85 1
Time 3 0.14 0.08 1
16
What would this indicate?
Time 1 Time 2 Time 3
Time 1 1
Time 2 0.85 1
Time 3 0.14 0.08 1
Burnout
17
What would this indicate?
Time 1 Time 2 Time 3
Time 1 1
Time 2 0.04 1
Time 3 0.11 0.91 1
18
What would this indicate?
Time 1 Time 2 Time 3
Time 1 1
Time 2 0.04 1
Time 3 0.11 0.91 1
Learning
19
Reliability
• Missed opportunity to test reliability of data using correlations
• Due to 15 tastings, reasons to suspect it may not be reliable!
20
Imagine we know data are reliable…
• How do we decide which option to recommend? What stats should we look at?
Preference Data: Descriptive Stats
Taste Test Mean Rank
KerRegular 3.40FishDelight 2.75KerCreamy 2.40Cape Cod 2.90KerExtra 3.55
Preference Data: Descriptive Stats
Mean Median
Mode
KerRegular 3.40 4.5 5FishDelight 2.75 2.0 2KerCreamy 2.40 3.0 3Cape Cod 2.90 3.0 4KerExtra 3.55 4.5 5
Any problems with these?
23
Any problems with these?
24
Do segments exist?
Preference PlotsKerlander Regular
Kerlander Creamy Kerlander Extra Creamy
26
Totally Disaggregate Choice Modeling
• Alternative to central tendency (useful when segments exist)
• Taking 1 respondent at a time, what would they choose in a given context (with certain options on market)?
• Forecast market share from there
27
How to do it by hand
Summary Share Simulation Results
28
ScenarioKerlander
RegularFisherman
DelightKerlander
Creamy Cape CodKerlander
Extra CreamyKerlander
Total
A 40 25 35 25
B 55 20 25 25
C 40 25 10 25 50
D 30 10 25 35 55
E 30 25 20 25 55
F 30 10 25 10 25 80
G (current) 30 25 45 30
29
How to decide between options?
• Cannibalization?• Can we introduce all 3?• If disaggregate choice modeling indicates tie,
what should you do?
External validity
• How much confidence do you have in using these data to make the recommendation for Kerlander?
• What, if anything, can you do to test whether confidence in these data is warranted?
External validity
• Can you use model to forecast the current (known) market shares?
• Additional Information: – Current market share: Kerlander (45%);
Fisherman’s Delight (10%); Cape Code (45%); • Preference Data: Impute brand chosen by
each subject in a three brand market (KR,FD, CC) by looking to see which of the three is highest ranked.
• Example Three-Brand Imputed Purchase: KR FD KC CC KEC
Subj 1 1 2 3 4 5 -- buy KR Subj 4 5 4 3 2 1 -- buy CC
• Of the subjects shown:– KR 6/20 (30%)– FD 5/20 (25%)– CC 9/20 (45%)
External validity
Observed Frequency
Expected Frequency
Ker Reg 6 (30%) (45%) X 20 = 9
Fish Del 5 (25%) (10%) X 20 = 2
Cape Cod 9 (45%) (45%) X 20 = 9
Null hypothesis? Which statistic?
External validity
Observed Frequency
Expected Frequency
Ker Reg 6 (30%) (45%) X 20 = 9
Fish Del 5 (25%) (10%) X 20 = 2
Cape Cod 9 (45%) (45%) X 20 = 9
Chi-square p < .05. implication?
External validity
35
Direction of bias?
• Can we make sense of it?
36
Direction of bias?
• Can we make sense of it?– Distribution– Price– Awareness
Kerlander Soup Takeaways• Levels of measurement:
– Can’t take mean of ordinal data• Measures of central tendency are misleading when segments exist
– Disaggregate Choice Modeling Approach• Examine cues to data quality
– Reliability: repeatability, consistent results (missed opportunity to look at correlations)
– External Validity: generalizability to larger population (does model forecast current mkt share?)
– Construct Validity: measures what it purports to measure (soup pref not just based on taste)
• Why are forecasts not accurate?– Distribution– Price– Awareness 37
Survey Research, Measurement
39
Representative sampling: The beginning
• 1916: Literary Digest starts to poll Americans to predict voting behavior
• Becomes the go-to source for presidential and other election predictions
Prediction • Landon: 57%• Roosevelt: 43%
1936 presidential election
Prediction • Landon: 57%• Roosevelt: 43%
They were confident: “if past experience is a criterion, the country will know to within a fraction of 1% the actual popular vote of 40 Million voters”
1936 presidential election
Prediction • Landon: 57%• Roosevelt: 43%
Results• Landon: 38% • Roosevelt: 62%
Sampling error: 19%
1936 presidential election
43
1936 Presidential Election
• Sample size: 2.4 million• Sampling method: created mailing list of 10
million names (1 of 4 voters) pulled from:– telephone directory, lists of magazine subscribers,
rosters of clubs and associations, automobile registry
– Sent mock ballot to return to magazine
44
1936 Presidential Election
• Sampling bias– Selection bias: sample list slanted toward middle
and upper class voters.– Nonresponse bias: 2.4 out of 10 million responses
(1/4). People who respond to surveys are different than those who don’t
45
46
47
But as one star falls, another rises
• Gallup also did a poll in 1936• Sample size: 5,000• Sampling method: representative sample• Prediction:
– Roosevelt 56%; Landon 44%• Introduced modern era of public opinion polls
48
But 12 years later…
49
Common Pitfalls when writing questions
• Complex questions• Ambiguous questions• Leading questions• Loaded questions• Double-barreled questions
50
Common Pitfalls when writing questions
• Complex questions• Ambiguous questions• Leading questions• Loaded questions• Double-barreled questions
51
Complex questions
• Vague• Lacks context• Use simple, ordinary words and wording• No jargon• Literacy
– Reading level of average American: 7th-8th grade
52
Complex
• Vague• Lacks context• Whose point of view?• Jargon
53
Complex
54
Common Pitfalls when writing questions
• Complex questions• Ambiguous questions• Leading questions• Loaded questions• Double-barreled questions
55
Ambiguous questions
56
Ambiguous questions
57
Ambiguous questions
58
Ambiguous questions
• Avoid words or phrases with multiple meanings
• Specify the context of the question• Watch for similar spellings or pronunciations
of key words• Be direct about what you're asking• Back translation
59
Common Pitfalls when writing questions
• Complex questions• Ambiguous questions• Leading questions• Loaded questions• Double-barreled questions
60
61
Leading questions
• Leading the respondent to a particular answer• Introduces bias• Should not be a “right” or “wrong” answer• Data will be unreliable
62
Leading questions
63
Leading questions
64
Leading questions
65
Leading questions
66
VW example
67
Common Pitfalls when writing questions
• Complex questions• Ambiguous questions• Leading questions• Loaded questions• Double-barreled questions
68
Loaded questions
• Emotionally charged• Heated topic• Do: make all answers equally acceptable• Don’t: induce social pressure
69
Loaded questions
70
Loaded questions
71
Loaded questions
72
73
74
Common Pitfalls when writing questions
• Complex questions• Ambiguous questions• Leading questions• Loaded questions• Double-barreled questions
75
Double-barreled questions
• Asking 2 questions at once
76
Double barreled question
77
Double barreled question
78
Double-barreled question
79
Double-barreled question
80
Key to success in survey writing
The ability to anticipate:1. The cognitive processes of the respondent2. The analytical work that might be performed on responses3. The “so what” (decision value) attached to statistical results4. The likely distribution of responses under different wording conditions
81
Key to success in survey writing
The ability to anticipate:1. The cognitive processes of the respondent2. The analytical work that might be performed on responses3. The “so what” (decision value) attached to statistical results4. The likely distribution of responses under different wording conditions
82
How often do you exercise? A.0 times a week1 time a week2 times a week3 or more times a week B.0-1 times a week2-3 times a week4-5 times a week6 or more times a week
83
A. How much would you be willing to pay for an ice cream maker? 1. $0-102. $11-203. $21-304. $30-405. more than $40 B. How much would you be willing to pay for an ice cream maker? 1. $0-252. $26-503. $51-754. $76-1005. more than $100
84
A. I always recycle
1 2 3 4 5totally disagree totally agree
B. I sometimes recycle
1 2 3 4 5totally disagree totally agree
C. I never recycle
1 2 3 4 5totally disagree totally agree
Creating the Survey: Question Order
• Put difficult or sensitive questions well into the interview
• Demographics Questions typically last• Usually funnel questions general to specific
– Use product category? – Use Brand X? – Do you like Brand X? – Why?
85
Creating the Survey:Types of Scales
• Some Examples– Likert scale (Agree-Disagree)– Other rating scales– Semantic Differential (opposites)– Rankings – Constant sum– Purchase Intent
86
Likert scale (agree-disagree)• Ask respondents the extent to which they agree or
disagree with a statement (Usually a 5 or 7 point scale)
87
Now we would like to find out your impressions about ColgateCombo. Please indicate your opinion below.
Neither Strongly Agree Nor StronglyDisagree Disagree Agree
Expensive 1 2 3 4 5
Convenient 1 2 3 4 5
High Quality 1 2 3 4 5
Appealing 1 2 3 4 5
Likert scale (agree-disagree)• Ask respondents the extent to which they agree or
disagree with a statement (Usually a 5 or 7 point scale)
88
Now we would like to find out your impressions about ColgateCombo. Please indicate your opinion below.
Neither Strongly Agree Nor StronglyDisagree Disagree Agree
Expensive 1 2 3 4 5
Convenient 1 2 3 4 5
High Quality 1 2 3 4 5
Appealing 1 2 3 4 5
Can also be on sliding scale for more gradations
89
Other rating scales
• Can have 5 or 7 point scales other than agreement:
1 2 3 4 5Not at all important important
90
Other rating scales
1 2 3 4 5Not at all important Important
Other option:
-2 -1 0 1 2Very unimportant Very important
91
Other rating scales
1 2 3 4 5Not at all important Important
Other option:
-2 -1 0 1 2Very unimportant Very important
92
Unipolar vs. bipolar scales?
• In general, unipolar usually better than bipolar– Less mentally taxing; only have to consider 1
attribute instead of 2– More streamlined with fewer choices (5-7 vs. 11)– Many bipolar scales actually only measure 1
dimension. Ex: “Not at all important” vs. “very unimportant”
• Exception: semantic differential
Semantic Differential (opposites)
• Typically bipolar adjectives, endpoints only are labeled• Typically 7 or 11 point scale (usually coded -3 to 3, -5 to 5)• Typically treated as interval scales
This website looks: boring -3 -2 -1 0 1 2 3 funamateur -3 -2 -1 0 1 2 3 professionalcomplex -3 -2 -1 0 1 2 3 simple
93
94
Semantic differential vs Likert?
• SDS advantages– Can tap specific emotional responses– Advantages: 1 question with multiple scales, less to read
• SDS drawbacks– Not always easy to find pairs (opposite of fun?)– Takes longer to answer
• 2 steps: (1) direction (+ or -), (2) magnitude• Scale endpoints change each time so need to recalibrate• More #s/options so harder to choose
– May strongly think not A but not B, so they mark midpoint even though they’re extreme on A
95
Pictorial Scale
96
Rankings
Constant sum
• Often used to measure Importance – e.g., allocate 100 points across various features based on importance
• Generally do not want to have more than 5-7 features or the allocation process gets too hard.
Example: How important are the following attributes to you in choosing a laptop computer? Please allocate 20 points across the various features
_____ warranty_____ battery life_____ screen size_____ processing speed_____ price 97
Purchase intent
• Statement of likelihood of making a future purchase
• Example:“How likely are you to purchase Colgate Combo in the future?”
1 2 3 4 5 6 7Not at all veryLikely likely
98
99
Quadrant Analysis
• Useful when liking/preference for company, brand, is based on various attributes.
• Gives you a big-picture view of your strengths and weaknesses
• Follow up with more specific quant analysis• Steps
– Generate list of key attributes from focus groups, previous surveys, etc.
– Ask consumers 2 questions on survey (rating scales so you can take the mean)
• Rate importance of attributes (importance)• Rate company/brand on attributes (evaluation)
Importance• How important are each of the following
financial service attributes to you?not at all extremely 1 2 3 4 5 6 7
______ Flexible business hours______ Simple paperwork______ Flexible payment plans
101
Evaluation
• Please rate Merrill Lynch on each of the listed attributes using the scale below.
poor excellent 1 2 3 4 5 6 7
______ Flexible business hours______ Simple paperwork______ Flexible payment plans
102
Quadrant Analysis• Useful when liking/preference for company, brand, is based on
various attributes.• Gives you a big-picture view of your strengths and weaknesses• Follow up with more specific quant analysis• Steps
– Generate list of key attributes from focus groups, previous surveys, etc.– Ask consumers 2 questions on survey (rating scales so you can take the
mean)• Rate importance of attributes (importance)• Rate company/brand on attributes (evaluation)
– Take the means (unless segments?)– Label axes and plot attributes in 2 dimensional space
low
h
igh
imp
orta
nce
1 2 3 4 5poor fair good v.good
excellentmean performance rating
flexiblepayment plans
accuracy of product info
ease of schedulingan appt
ability of consultantto answer questions
simplepaperwork
flexiblebusiness hours
convenient officelocations
ability to obtain product info
Financial Services Attributes
103
Importance and Performance Evaluation for one brand.
low
h
igh
imp
orta
nce
poor strong performance
Areas for Improvement
LowestPriority
“PossibleOverkill”
“Keep up the Good Work”
Quadrant Analysis
104
Importance and Performance Evaluation for one brand.
Have to be careful though if
there is a minimum level that customers
expect, could be a priority
Single Brand Evaluation
105
What else to know before making changes?
• Perceptions versus reality?– If perceptions, creating better awareness is key
• Cost of improving different attributes?• Possibility of changing importance instead?
106
Competitive analysis variant
• Comparing your brand to 1 competitor• Main difference: Evaluation ratings will be
relative – Need to take difference scores (focal brand –
competitor)– X axis will be negative (left) to positive (right), 0
(no difference) in middle
low
h
igh freq of
communication
total time toresolve problem
technician’sknowledge ofmy needs
qual of replacement parts
efficiency ofservice callhandling
effectiveness of customer training
responsetime
timeliness ofinvoicing for services
significantly worse significantly betterthan competitor than competitor
Quadrant Analysis: Competitive Analysis
107
Performance Evaluation is the difference between one brand and another brand.
Imp
ort
an
ce
low
hig
h
imp
orta
nce
freq ofcommunication
total time toresolve problem
technician’sknowledge ofmy needs
qual of replacement parts
efficiency ofservice callhandling
effectiveness of customer training
responsetime
timeliness ofinvoicing for services
significantly worse at parity significantly betterthan competitor than competitor
priorities for pre-emption competitiveimprovement opportunities strengths
Competitive Analysis Variant
108
Performance Evaluation is the difference between one brand and another brand.
• How important to you are each of the following car attributes?not at all extremely 1 2 3 4 5 6 7
Sporty Styling _______ _______Handling _______ _______Cost _______ _______Comfort _______ _______Sound System _______ _______
• Please rate the following vehicles on each of the listed attributes using the scale below.poor excellent 1 2 3 4 5 6 7
Toyota Camry Chevy Corvette
Sporty Styling _______ _______Handling _______ _______Cost _______ _______Comfort _______ _______Sound System _______ _______
111
• Note: you would take the means, but let’s plot one person’s data
low
h
igh
significantly worse significantly betterthan competitor than competitor
Quadrant Analysis: Competitive Analysis
112
Performance Evaluation is the difference between one brand and another brand.
Imp
ort
an
ce