the reliability and validity of alternative customer satisfaction and loyalty … · 2015. 1....
TRANSCRIPT
Proprietary and Confidential © 2011 Maritz
The Reliability and Validity of Alternative Customer Satisfaction
and Loyalty Measurements
Keith Chrzan and Ted Saunders
Maritz Research
0
Proprietary and Confidential © 2011 Maritz
Background
• From its commercial inception in the late 1970s, customer satisfaction and loyalty research has become a multi-billion dollar industry
• Most applications include two key evaluative measures
– Overall satisfaction
– Loyalty
1
Proprietary and Confidential © 2011 Maritz
Background
• Despite the importance marketers place on measuring satisfaction and loyalty, the measures themselves have not recently enjoyed the attention of rigorous psychometric science
– Much of the research on satisfaction measurement is proprietary, not shared with the industry
– Some scholarly work on measurement of customer satisfaction (Westbrook 1980, Westbrook and Oliver 1981, Hausknecht 1990, Wittink and Bayer 1994) but nothing definitive, recent or about web-based surveys
– Applied researchers and consultants often focus more on supporting preferred metrics than on exposing them to wide-ranging, discriminating testing
2
Proprietary and Confidential © 2011 Maritz
Research design – initial survey
• Web-based initial survey fielded 11/28 to 12/5/11
– 1,505 respondents (40% completed on PC web browsers, 60% on mobile phone web browsers)
– Respondents were past 90 day video renters (e.g. of Netflix, iTunes, Redbox)
– Topic brand randomly selected from among recently used rental services
– Measures included
• 4 overall satisfaction questions (cumulative, not transactional)
• 3 loyalty metrics
• 9 attributes known to drive satisfaction
• Median interview length 3.2 minutes
3
Proprietary and Confidential © 2011 Maritz
Research design – re-contact survey
• Re-contact survey fielded 12/5 to 12/12/11
• Previous respondents re-contacted seven days after initial survey - a lag
• Short enough to allow a fair assessment of test-retest reliability
• Long enough to allow post-survey behaviors to occur
– 1,187 respondents (79%) completed the follow-up survey
– 61% (724) of these follow-up respondents rented videos in the intervening seven days
• Metrics included
– The same 4 measures of overall satisfaction
– Number of times rented from each service in the past seven days
• Median interview lasted 1.2 minutes
4
Proprietary and Confidential © 2011 Maritz
Satisfaction Test
5
Proprietary and Confidential © 2011 Maritz
Five point fully-labeled unipolar scale
• A variant of a scale from Aiello and Czepiel (1979)
• Given the almost universal extreme negative skew in distribution of overall satisfaction, we prefer the unipolar scale because it spreads responses more than does a five point bipolar (completely dissatisfied to completely satisfied) scale
6
Overall, how satisfied are you with your video services provided by <SERVICE>? Would you say you are . . .
Not at all Satisfied
Slightly Satisfied
Somewhat satisfied
Very Satisfied
Completely Satisfied
o o o o o
Proprietary and Confidential © 2011 Maritz
11 point percentage labeled scale
• A thermometer scale (Westbrook 1980)
7
Using the scale below, please indicate how satisfied you are with <SERVICE>.
Not at all satisfied
Completely satisfied
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Proprietary and Confidential © 2011 Maritz
D-T scale
• Borrowed from the life satisfaction literature (Andrews and Withey 1974)
• Balanced, bipolar scale explicitly extends the range of emotional response beyond satisfaction and dissatisfaction
8
Overall, how do you feel about <SERVICE>? Would you say you feel . . .
Terrible Unhappy Mostly
Dissatisfied
Mixed (about equally
satisfied and dissatisfied)
Mostly Satisfied
Pleased Delighted
Neither (neither satisfied nor dissatisfied)
I never thought about it
Proprietary and Confidential © 2011 Maritz
Binary satisfied/not scale
• Plausibly preferable for small screen mobile-web survey environments
9
Overall, are you satisfied with <SERVICE> or not?
Not Satisfied Satisfied
o o
Proprietary and Confidential © 2011 Maritz
Attributes that drive satisfaction
• Nine attributes identified in previous research in this category
10
Poor Fair Good Very Good Excellent
Reliability
Convenience
Availability
Picture Quality
Variety
Selection Quality
Portfolio
Customer Service
Fees
Subscription Terms
Using a scale where 1 means “Poor” and 5 means “Excellent,” please indicate how well <SERVICE> performs on the following aspects:
Proprietary and Confidential © 2011 Maritz
Test-retest reliability
• If we ask the respondent the same question on a different day, do we get the same answer?
• Correlation of satisfaction measures between initial and follow-up survey measures test-retest reliability (higher correlations mean more reliability, rule-of-thumb: >.70):
• 11 point percentage scale has greater reliability than the 5 point and binary scales (p<.01); greater than D-T at p=.10
• D-T outperforms 5 point and binary scales (p<.05)
• Results hold for both PC and mobile web respondents
11
Scale Reliability
5 point unipolar .66
11 point percentage .75
D-T .72
Binary .68
Proprietary and Confidential © 2011 Maritz
Convergent validity
• Convergent validity measures the extent to which each satisfaction measure agrees with the other three
• Measured by average inter-item correlation
• 11 point percentage scale performs better than 5 point and binary at p<.01
12
Scale Mean inter-item
correlation
5 point unipolar .70
11 point percentage .77
D-T .75
Binary .62
Proprietary and Confidential © 2011 Maritz
Criterion validity – prediction from antecedents
• All regression models make sense and produce similar weights
• They differ in terms of their ability to explain satisfaction (R2) however
• The binary scale fares by far the worst on this measure but the other three scales do not differ significantly in terms of their relationship to the drivers of satisfaction
• D-T performs best among PC web respondents, the 5 point unipolar scale performs best among mobile web respondents
13
Scale R2
5 point unipolar .51
11 point percentage .49
D-T .51
Binary .22
Proprietary and Confidential © 2011 Maritz
Criterion validity – prediction of consequents
• For this measure of criterion validity we look at correlations of each satisfaction measure with a measure of advocacy and three loyalty metrics (an attitude, a behavioral prediction and a behavioral measure)
• The binary measure performs the worst while the other three satisfaction measures perform comparably
14
Scale Advocacy Intent
to Return
Probability Allocation
Post-survey
behavior
5 point unipolar .67 .77 .41 .28
11 point percentage .71 .79 .44 .27
D-T .71 .79 .44 .29
Binary .54 .57 .35 .20
Proprietary and Confidential © 2011 Maritz
Sensitivity
• All else being equal, a better satisfaction scale will discriminate more among brands’ performance than will a worse satisfaction scale
• Analysis of variance produces an F-statistic that indicates how well each of our measures differentiates the performance of the 10 video brands (higher F indicates more sensitivity)
• The binary scale again performs by far the worst while the 11 point percentage scale has the greatest sensitivity
15
Scale F
5 point unipolar 10.1
11 point percentage 12.9
D-T 9.6
Binary 4.1
Proprietary and Confidential © 2011 Maritz
Distribution
• Finally, our customers need scales that leave room for their scores to grow, so that they can track performance improvements
• Scales with fewer responses in the top box leave more room for improvement
– Again the binary scale performs poorly, with almost nine respondents in 10 selecting the top box response
– Interestingly, the D-T scale, with its extreme “delighted” endpoint leaves significantly more room for growth, despite the fact that it uses a bipolar format instead of the unipolar format of the 5 and 11 point scales
16
Scale Top box %
5 point unipolar 16
11 point percentage 16
D-T 10
Binary 87
Proprietary and Confidential © 2011 Maritz
And the winner is . . .
• The 11 point percentage scale performs best across most measures, including the test-retest reliability; D-T finishes a strong second
• No results differed by PC versus mobile web survey environment
17
Scale Test-retest reliability
Convergent validity
Criterion validity
Sensitivity Distribution
5 point unipolar
11 point percentage
D-T
Binary
Proprietary and Confidential © 2011 Maritz
And the binary scale worst . . .
18
Scale Test-retest reliability
Convergent validity
Criterion validity
Sensitivity Distribution
5 point unipolar
11 point percentage
D-T
Binary
Proprietary and Confidential © 2011 Maritz
Loyalty Test
19
Proprietary and Confidential © 2011 Maritz
Intent to return scale
• A standard behavioral intentions scale, a good measure of intentional loyalty
20
How likely are you to continue using <SERVICE>?
Not at all likely
Not very likely
Somewhat likely
Very likely
Extremely likely
o o o o o
Proprietary and Confidential © 2011 Maritz
Intent to recommend scale
21
• Often used as a measure of loyalty, this is a valid measure only of advocacy
If asked for a recommendation, how likely are you to recommend <SERVICE> to a friend or colleague?
Not at all likely
Not very likely
Somewhat likely
Very likely
Extremely likely
o o o O o
Proprietary and Confidential © 2011 Maritz
Probability allocation
• In previous studies, this measure dramatically outperformed rating scale loyalty measures in terms of predicting behavior
22
Think of the next 10 times you will use a video service to watch a movie.
How many of those 10 times do you think you will use <SERVICE>?
______ times
Proprietary and Confidential © 2011 Maritz
Re-contact behavioral measure of loyalty
• Focus on just the time between initial and re-contact survey
23
Thinking about just the past 7 days, how many movies have you rented from each
of these movie services? If you haven’t rented any movies in the past 7 days,
please check the appropriate box at the bottom.
___ Cable Pay per view
___ Satellite Pay per view
___ Hulu Plus
___ Netflix Streaming
___ Netflix DVDs in mail
___ Crackle.com
___ Redbox
___ Blockbuster
___ iTunes video rental service
___ Amazon Instant Video
None, I have not rented any movies in the past 7 days
Proprietary and Confidential © 2011 Maritz
Loyalty: predictive validity
• The behavioral measure allows us to measure the correlation of the various loyalty metrics with post-survey behavior
• The probability allocation dramatically outperforms either rating scale measure of loyalty in terms of its linkage with post-survey behavior (likewise it outperforms indices built from loyalty ratings and satisfaction)
• Intent to return also significantly outperforms intent to recommend as a loyalty metric (p=.05)
24
Scale Correlation with post
survey share
Advocacy (intent to recommend) .31
Intent to return .38
Probability allocation .66
Proprietary and Confidential © 2011 Maritz
Loyalty: sensitivity
• The probability allocation is the most sensitive loyalty measure
25
Scale F
Advocacy (intent to recommend) 10.7
Intent to return 13.4
Probability allocation 33.6
Proprietary and Confidential © 2011 Maritz
Loyalty summary
• Across the board, the probability allocation measure outperforms both intent to recommend and intent to return rating scale measures
• Intent to return also outperforms intent to recommend, plausibly because the latter specifically measures advocacy rather than loyalty
26
Proprietary and Confidential © 2011 Maritz
Recommendations
• Satisfaction: 11 point percentage-labeled thermometer scale
• Loyalty: probability allocation
27
Proprietary and Confidential © 2011 Maritz
Limitations
• This is just one study; despite its strong results we want to see
– Evidence from more categories
– On more scales
– With different kinds of respondents
– In different survey contexts (e.g. branded survey)
– Using different data collection modalities
28
Proprietary and Confidential © 2011 Maritz
Bibliography
• Aiello, A. and J.A. Czepiel (1979) “Customer Satisfaction in a Catalog Type Retail Outlet: Exploring the Effect of Product, Price and Attributes,” in New Dimensions in Consumer Satisfaction and Complaining Behavior, eds. R.L. Day and H.K. Hunt, Bloomington: Indiana University, 29-135.
• Andrews, F.M. and S.B. Withey (1974) "Developing Measures of Perceived Life Quality," Social Indicators Research, 1, 1-26.
• Hausknecht, D.R. (1990) “Measurement Scales in Consumer Satisfaction/ Dissatisfaction,” Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 3, 1-11.
• Westbrook, R.A. (1980) “A Rating Scale for Measuring Product/Service Satisfaction,” Journal of Marketing, 44, 68-72.
• Westbrook, R.A and R.L. Oliver (1981) “Developing Better Measures of Consumer Satisfaction: Some Preliminary Results,” in Advances in Consumer Research, 8, ed. Kent B. Monroe, Ann Abor: Association for Consumer Research, 94-99.
• Wittink, D.R. and L.R. Bayer (1994) “The Measurement Imperative,” Marketing Research, 6, 14-22.
29