the reliability and validity of alternative customer satisfaction and loyalty … · 2015. 1....

Proprietary and Confidential © 2011 Maritz

The Reliability and Validity of Alternative Customer Satisfaction

and Loyalty Measurements

Keith Chrzan and Ted Saunders

Maritz Research

0


Background

• From its commercial inception in the late 1970s, customer satisfaction and loyalty research has become a multi-billion dollar industry

• Most applications include two key evaluative measures

– Overall satisfaction

– Loyalty

1


Background

• Despite the importance marketers place on measuring satisfaction and loyalty, the measures themselves have not recently enjoyed the attention of rigorous psychometric science

– Much of the research on satisfaction measurement is proprietary, not shared with the industry

– Some scholarly work on measurement of customer satisfaction (Westbrook 1980, Westbrook and Oliver 1981, Hausknecht 1990, Wittink and Bayer 1994) but nothing definitive, recent or about web-based surveys

– Applied researchers and consultants often focus more on supporting preferred metrics than on exposing them to wide-ranging, discriminating testing

2


Research design – initial survey

• Web-based initial survey fielded 11/28 to 12/5/11

– 1,505 respondents (40% completed on PC web browsers, 60% on mobile phone web browsers)

– Respondents were past 90 day video renters (e.g. of Netflix, iTunes, Redbox)

– Topic brand randomly selected from among recently used rental services

– Measures included

• 4 overall satisfaction questions (cumulative, not transactional)

• 3 loyalty metrics

• 9 attributes known to drive satisfaction

• Median interview length 3.2 minutes

3


Research design – re-contact survey

• Re-contact survey fielded 12/5 to 12/12/11

• Previous respondents re-contacted seven days after initial survey - a lag

• Short enough to allow a fair assessment of test-retest reliability

• Long enough to allow post-survey behaviors to occur

– 1,187 respondents (79%) completed the follow-up survey

– 61% (724) of these follow-up respondents rented videos in the intervening seven days

• Metrics included

– The same 4 measures of overall satisfaction

– Number of times rented from each service in the past seven days

• Median interview lasted 1.2 minutes

4


Satisfaction Test

5


Five point fully-labeled unipolar scale

• A variant of a scale from Aiello and Czepiel (1979)

• Given the almost universal extreme negative skew in distribution of overall satisfaction, we prefer the unipolar scale because it spreads responses more than does a five point bipolar (completely dissatisfied to completely satisfied) scale

6

Overall, how satisfied are you with your video services provided by <SERVICE>? Would you say you are . . .

Not at all Satisfied

Slightly Satisfied

Somewhat satisfied

Very Satisfied

Completely Satisfied

o o o o o


11 point percentage labeled scale

• A thermometer scale (Westbrook 1980)

7

Using the scale below, please indicate how satisfied you are with <SERVICE>.

Not at all satisfied

Completely satisfied

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%


D-T scale

• Borrowed from the life satisfaction literature (Andrews and Withey 1974)

• Balanced, bipolar scale explicitly extends the range of emotional response beyond satisfaction and dissatisfaction

8

Overall, how do you feel about <SERVICE>? Would you say you feel . . .

Terrible Unhappy Mostly

Dissatisfied

Mixed (about equally

satisfied and dissatisfied)

Mostly Satisfied

Pleased Delighted

Neither (neither satisfied nor dissatisfied)

I never thought about it


Binary satisfied/not scale

• Plausibly preferable for small screen mobile-web survey environments

9

Overall, are you satisfied with <SERVICE> or not?

Not Satisfied Satisfied

o o


Attributes that drive satisfaction

• Nine attributes identified in previous research in this category

10

Poor Fair Good Very Good Excellent

Reliability

Convenience

Availability

Picture Quality

Variety

Selection Quality

Portfolio

Customer Service

Fees

Subscription Terms

Using a scale where 1 means “Poor” and 5 means “Excellent,” please indicate how well <SERVICE> performs on the following aspects:


Test-retest reliability

• If we ask the respondent the same question on a different day, do we get the same answer?

• Correlation of satisfaction measures between initial and follow-up survey measures test-retest reliability (higher correlations mean more reliability, rule-of-thumb: >.70):

• 11 point percentage scale has greater reliability than the 5 point and binary scales (p<.01); greater than D-T at p=.10

• D-T outperforms 5 point and binary scales (p<.05)

• Results hold for both PC and mobile web respondents

11

Scale Reliability

5 point unipolar .66

11 point percentage .75

D-T .72

Binary .68


Convergent validity

• Convergent validity measures the extent to which each satisfaction measure agrees with the other three

• Measured by average inter-item correlation

• 11 point percentage scale performs better than 5 point and binary at p<.01

12

Scale Mean inter-item

correlation



D-T .75

Binary .62


Criterion validity – prediction from antecedents

• All regression models make sense and produce similar weights

• They differ in terms of their ability to explain satisfaction (R2) however

• The binary scale fares by far the worst on this measure but the other three scales do not differ significantly in terms of their relationship to the drivers of satisfaction

• D-T performs best among PC web respondents, the 5 point unipolar scale performs best among mobile web respondents

13

Scale R2



D-T .51

Binary .22


Criterion validity – prediction of consequents

• For this measure of criterion validity we look at correlations of each satisfaction measure with a measure of advocacy and three loyalty metrics (an attitude, a behavioral prediction and a behavioral measure)

• The binary measure performs the worst while the other three satisfaction measures perform comparably

14

Scale Advocacy Intent

to Return

Probability Allocation

Post-survey

behavior

5 point unipolar .67 .77 .41 .28

11 point percentage .71 .79 .44 .27

D-T .71 .79 .44 .29

Binary .54 .57 .35 .20


Sensitivity

• All else being equal, a better satisfaction scale will discriminate more among brands’ performance than will a worse satisfaction scale

• Analysis of variance produces an F-statistic that indicates how well each of our measures differentiates the performance of the 10 video brands (higher F indicates more sensitivity)

• The binary scale again performs by far the worst while the 11 point percentage scale has the greatest sensitivity

15

Scale F

5 point unipolar 10.1

11 point percentage 12.9

D-T 9.6

Binary 4.1


Distribution

• Finally, our customers need scales that leave room for their scores to grow, so that they can track performance improvements

• Scales with fewer responses in the top box leave more room for improvement

– Again the binary scale performs poorly, with almost nine respondents in 10 selecting the top box response

– Interestingly, the D-T scale, with its extreme “delighted” endpoint leaves significantly more room for growth, despite the fact that it uses a bipolar format instead of the unipolar format of the 5 and 11 point scales

16

Scale Top box %

5 point unipolar 16

11 point percentage 16

D-T 10

Binary 87


And the winner is . . .

• The 11 point percentage scale performs best across most measures, including the test-retest reliability; D-T finishes a strong second

• No results differed by PC versus mobile web survey environment

17

Scale Test-retest reliability

Convergent validity

Criterion validity

Sensitivity Distribution

5 point unipolar

11 point percentage

D-T

Binary


And the binary scale worst . . .

18

Scale Test-retest reliability

Convergent validity

Criterion validity

Sensitivity Distribution

5 point unipolar

11 point percentage

D-T

Binary


Loyalty Test

19


Intent to return scale

• A standard behavioral intentions scale, a good measure of intentional loyalty

20

How likely are you to continue using <SERVICE>?

Not at all likely

Not very likely

Somewhat likely

Very likely

Extremely likely

o o o o o


Intent to recommend scale

21

• Often used as a measure of loyalty, this is a valid measure only of advocacy

If asked for a recommendation, how likely are you to recommend <SERVICE> to a friend or colleague?

Not at all likely

Not very likely

Somewhat likely

Very likely

Extremely likely

o o o O o


Probability allocation

• In previous studies, this measure dramatically outperformed rating scale loyalty measures in terms of predicting behavior

22

Think of the next 10 times you will use a video service to watch a movie.

How many of those 10 times do you think you will use <SERVICE>?

______ times


Re-contact behavioral measure of loyalty

• Focus on just the time between initial and re-contact survey

23

Thinking about just the past 7 days, how many movies have you rented from each

of these movie services? If you haven’t rented any movies in the past 7 days,

please check the appropriate box at the bottom.

___ Cable Pay per view

___ Satellite Pay per view

___ Hulu Plus

___ Netflix Streaming

___ Netflix DVDs in mail

___ Crackle.com

___ Redbox

___ Blockbuster

___ iTunes video rental service

___ Amazon Instant Video

None, I have not rented any movies in the past 7 days


Loyalty: predictive validity

• The behavioral measure allows us to measure the correlation of the various loyalty metrics with post-survey behavior

• The probability allocation dramatically outperforms either rating scale measure of loyalty in terms of its linkage with post-survey behavior (likewise it outperforms indices built from loyalty ratings and satisfaction)

• Intent to return also significantly outperforms intent to recommend as a loyalty metric (p=.05)

24

Scale Correlation with post

survey share

Advocacy (intent to recommend) .31

Intent to return .38

Probability allocation .66


Loyalty: sensitivity

• The probability allocation is the most sensitive loyalty measure

25

Scale F

Advocacy (intent to recommend) 10.7

Intent to return 13.4

Probability allocation 33.6


Loyalty summary

• Across the board, the probability allocation measure outperforms both intent to recommend and intent to return rating scale measures

• Intent to return also outperforms intent to recommend, plausibly because the latter specifically measures advocacy rather than loyalty

26


Recommendations

• Satisfaction: 11 point percentage-labeled thermometer scale

• Loyalty: probability allocation

27


Limitations

• This is just one study; despite its strong results we want to see

– Evidence from more categories

– On more scales

– With different kinds of respondents

– In different survey contexts (e.g. branded survey)

– Using different data collection modalities

28


Bibliography

• Aiello, A. and J.A. Czepiel (1979) “Customer Satisfaction in a Catalog Type Retail Outlet: Exploring the Effect of Product, Price and Attributes,” in New Dimensions in Consumer Satisfaction and Complaining Behavior, eds. R.L. Day and H.K. Hunt, Bloomington: Indiana University, 29-135.

• Andrews, F.M. and S.B. Withey (1974) "Developing Measures of Perceived Life Quality," Social Indicators Research, 1, 1-26.

• Hausknecht, D.R. (1990) “Measurement Scales in Consumer Satisfaction/ Dissatisfaction,” Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 3, 1-11.

• Westbrook, R.A. (1980) “A Rating Scale for Measuring Product/Service Satisfaction,” Journal of Marketing, 44, 68-72.

• Westbrook, R.A and R.L. Oliver (1981) “Developing Better Measures of Consumer Satisfaction: Some Preliminary Results,” in Advances in Consumer Research, 8, ed. Kent B. Monroe, Ann Abor: Association for Consumer Research, 94-99.

• Wittink, D.R. and L.R. Bayer (1994) “The Measurement Imperative,” Marketing Research, 6, 14-22.

29

the reliability and validity of alternative customer satisfaction and loyalty … · 2015. 1....

Documents