Measuring Customer Satisfaction and Loyalty: Improving the “Net-Promoter” Score
Schneider, D., Berent, M., Thomas, R., & Krosnick, J. (2008)
They argue that unipolar constructs are measured most reliably and validly by offering five scale points.
Placing the “neutral” label on the midpoint is problematic because “neutral” represents a lack of evaluation rather than a 50% chance of recommending a company. See our earlier analysis on the effects of the neutral label on the NPS.
They conducted two studies.
In study 1 they recruited 2,227 participants from the US who were randomly assigned to one of four conditions.
- Eleven-point standard LTR item with labeled neutral point.
- Seven-point version of the LTR.
- Seven-point fully labeled (not at all likely, slightly likely, somewhat likely, likely, very likely, remarkably likely, extremely likely).
- Five-point fully labeled (not at all likely, slightly likely, moderately likely, very likely, extremely likely).
In addition they asked:
- Customers: Asked if they were customers in the past two years (airlines and rental cars) or six months (drug stores, supermarkets, home improvement and hardware, pet supply, electronics).
- Recommend Rate: How many times they recommended in the last six months (limited to 20 total recommendations).
- Satisfaction: eleven-point with extremes and midpoint labeled.
- Liking: seven-point scale: (dislike a great deal, dislike a moderate amount, dislike a little, neither like or dislike, like a little, like a moderate amount, like a great deal).
They used these items to predict stated historical recommendations.
They found that noncustomers tended to picked either “not at all likely” as their answer or were drawn to “neutral”; 78.96% chose the “neutral” mid-point of the scale.
Both the original 11-point and 7-point labeled only with with “neutral” points predicted historical reported recommendations rate better than fully labeled.
After accounting for the nonlinearity of the recommendations relationship, they found again the partially labeled seven-point was the best predictor.
They also found “liking” and satisfaction to be better predictors of past recommendations but liking was generally better for noncustomers.
Study 2 was conducted January to February 2008. The authors asked 4,883 respondents questions about eight brands (automotive manufacturers and airlines) and also asked if they were familiar and were customers.
In addition to the four scales used in study 1 they also added
- Five-point fully labeled scale on “recommending against.”
- Seven-point (bipolar fully labeled) (extremely likely to recommend against, moderately likely to recommend against, slightly likely to recommend against, neither likely to recommend nor recommend against, slightly likely to recommend, moderately likely to recommend, extremely likely to recommend).
- Seven-point fully labeled dislike to like as in study 1.
- Five-point fully labeled (do not like at all, like a little, like a moderate amount, like a lot, like a great deal).
- Five-point liking (unipolar).
- Five-point disliking (unipolar).
Stated Intention to Purchase in next five years:
- Five-point scale (not likely at all, slightly likely, moderately likely, very likely, extremely likely).
What they heard about a company:
- Seven-point scale (all good things, mostly good things, a few bad things, about equal numbers of good and bad things, mostly bad things, a few good things, all bad things).
They used historical increase in passengers by airline from 2007 to 2008 and change in cars sold from March 2007 to March 2008 (historical with current to slight future).
They found noncustomers generally gravitate toward the neutral point. They also found the partially labeled 11- and 7-point scales show a pattern that fits the detractors vs. promoters framework that is used by Reichheld to describe the Net Promoter Score: respondents below the neutral point are more likely to give negative recommendations
All scales have a smooth relationship to the likelihood of future purchases, possibly because respondents are “tapping into” similar or identical concepts when they formulate the response to questions about LTR and LTBuy.
As in study 1, the partially labeled 11- and 7-point scales are almost identical and better than the fully labeled scales.
They found likelihood of negative recommendations is the weakest predictor and worse than the original NPS. See also East (2011) for a similar comment about negative word of mouth.
The original Net Promoter scale and the seven-point scale with partial labels are still better predictors of negative recommendations than other alternatives.
The five-point and seven-point scales do best when predicting the likelihood of future purchases among customers
They found mixed results: sometimes liking was a better predictor, sometimes satisfaction, and sometimes likelihood to recommend; all were very similar.
Predicting the number of positive recommendations: likelihood of recommending emerges as the strongest predictor, stronger than both liking and satisfaction, but generally did poorly for predicting the number of negative recommendations (p. 46). LTR also best predicted future purchase intent when including satisfaction and liking in a regression equation.
Satisfaction seems to be mediated by liking and/or likelihood of recommendations.
LTR and liking were also better predictors of future purchase intent than perceptions of word of mouth (what they heard about a company).
When predicting historical car sales they found mixed/no correlation for noncustomers for most scales. When limiting it to just customers, they found generally strong correlations depending on how they cut detractors and promoters (for example at 3 instead of at 6). However, they report:
The result for the recommended combination of cut-off points still produced a positive relationship with a fairly convincing R2 of .39 (b = .24; p = .13; N = 8). It seems that likelihood of recommending works much better for customers of car companies than for non-customers.
For measuring airlines, again likelihood-to-recommend items performed best when using only customers. Again, using different cut points generated the highest R2, but the original cut points (6, 9) generate an impressive R2 of .72. Using the log transformation didn’t affect results.
We did find that reducing the number of scale points to 7-points generally improved the validity of the measurement. However, contrary to our expectations, assigning full-labels did not improve the validity, it rather produced weaker relationships between the scales and the validity criteria. … Our results show that different measures such as likelihood of recommendation, satisfaction and liking are interrelated and might be acting within causal chains.
Takeaway: Similar to other studies, they found that the NPS and multiple metrics correlated with historical growth rates and changing the number of scale points and labels improved correlations, but not by much.