A good measure of customer loyalty should be valid, reliable, and sensitive to changes in customer attitudes.

For the most part, the Net Promoter Score achieves this (although it does have its drawbacks).

One area the Net Promoter Score lacks in is how its scoring approach adds “noise” to the customer loyalty signal.

The process of subtracting detectors from promoters may be “executive friendly” but has the unfortunate side effect of increasing measurement error.

In an earlier article we reviewed the effects of changing the NPS scale from the original 11 points to 10 or even 5 points. We collected data from 520 U.S.-based participants and had them reflect on one of 11 brands/products. Participants answered all three variations of the NPS question with different scales (11, 10, and 5) presented in randomized order.

The results showed that changing the scale indeed changed the Net Promoter Scores (but only a little). The differences were more noticeable for individual brands than in the aggregate. The different Net Promoter Scores by response option for each brand is shown in Table 1 below.

 NPS 11NPS 10NPS 5
American-8%-6%-8%
Comcast-55%-53%-63%
Delta-10%4%-8%
DirecTV-14%-12%-20%
Dish-16%-8%-16%
Facebook29%31%21%
iTunes8%2%2%
Lynda-15%-15%-21%
Netflix63%63%61%
Udemy0%3%6%
United-15%-8%-13%
Table 1: Net Promoter Scores for 11 brands with different scale points (11, 10, and 5).

Table 2 below highlights the differences between the 11 vs. 10 point and 11 vs. 5 point variations by company/brand and the total mean differences across all 11 companies/brands.

 11 vs 1011 vs 5
Mean Difference2%-2%
American2%0%
Comcast2%-8%
Delta14%2%
DirecTV2%-6%
Dish8%0%
Facebook2%-8%
iTunes-6%-6%
Lynda0%-6%
Netflix0%-2%
Udemy3%6%
United8%2%
Table 2: Differences in Net Promoter Scores by brand.

The most notable differences came from Delta, which had a 14 point difference between the 11-point scale and the 10-point scale, from a -10% NPS to a 4% NPS (p=.09 difference for detractors). Comcast, Dish, Facebook, and United’s Net Promoter Scores however all fluctuated by a noticeable 8 percentage points depending on the scale used (none were statistically significant at p <.10). Recall that participants in the study responded to the same question in randomized order with only the number of scale options changed. But are these noticeable changes really meaningful?

Inherent Scoring Variability

While the NPS might be executive friendly, the consequences of converting 11, 10, or 5 points into essentially a two-point scale (promoters minus detractors) increases the variability of the scores. That is, a reduced scale has noticeable, but not necessarily statistically significant nor practically significant, swings in the scoring. Only the Delta difference reached to a threshold of statistical significance (less than .10).

Using Means Instead

To avoid the potential pitfall of being misled by these swings in percentages (noise) and not detecting differences when they do exist (signal), use the mean scores when making comparisons.

We can better see how changing the number of scale points affects the raw responses without the using the Net Promoter scoring system. To do so, we can interpolate raw values onto the same 11-point scale using the following conversion formula for the 5- and 10-point scales:

Converted 5-point score = (Raw -1)*10/4
Converted 10-point score: = (Raw-1)*10/9

Table 3 below shows the converted mean values for the 11-point compared to the converted 10- and 5-point versions.

 11 Point (Original)10 Point (Converted)5 Point (Converted)
Mean Total6.86.7*6.7
American Airlines7.06.96.9
Comcast4.13.9*3.9
Delta Airlines6.86.86.9
DirecTV6.46.26.1
Dish Network6.06.16.1
Facebook7.97.97.9
iTunes7.37.1*7.2
Lynda6.56.2*6.4
Netflix8.98.88.9
Udemy7.37.2*7.3
United Airlines6.56.46.7
Table 3: Mean likelihood to recommend scores for the three scale options. An * Indicates statistical differences at p < .05 between 10 and 11 point versions.

Now four of the brands have statistically different means (Comcast, iTunes, Lynda, and Udemy), indicating that changing the number of response options changes the scoring—again slightly as most scores are within 1 tenth of a point of each other.

To better interpret the differences though, it helps to convert the raw mean differences and the raw NPS % differences to a percentage difference of the maximum score of the scale. For the NPS % scale, the maximum range is from -100% to 100% (a 200% range). For the mean, the maximum possible difference is 11 points (0 to 10 scale).

For example, the 14-percentage point difference for Delta between the 11 and 10-point variations is a 7% error (14%/200% = 7%). The error from the mean on Delta though is only .05 or .045% (6.84-6.79 = .05/11). Table 4 below shows these % differences by product and the average and maximum % difference observed.

 11 vs 10
(NPS %)
11 vs 10
(Mean %)
11 vs 5
(NPS %)
11 vs 5
(Mean %)
Avg % Diff2.10%1.20%2.10%0.60%
American Airlines1.00%1.50%0.00%1.40%
Comcast1.00%2.10%4.00%1.40%
Delta Airlines7.00%0.40%1.00%0.90%
DirecTV1.00%2.10%2.90%3.20%
Dish Network4.10%0.60%0.00%0.90%
Facebook1.00%0.40%4.00%0.70%
iTunes3.00%2.20%3.00%1.70%
Lynda0.00%2.20%3.00%1.10%
Netflix0.00%1.00%1.00%0.40%
Udemy1.40%1.40%2.90%0.50%
United Airlines3.80%0.20%0.90%2.00%
Table 4: Differences in error between 11 vs 10 and 11 vs 5 point variations when comparing means versus the NPS system.

The average % difference using the NPS % system is around twice as large compared to the mean (2.1% vs. 1.2% and 2.1% vs. .6%). The NPS % also has a larger maximum difference of 7% and 4% for the 10 and 5 versions respectively versus 2.2% and 3.2% max error using the mean.

Summary and Takeaways

This analysis shows a number of things about both changing the NPS scales and interpreting differences in Net Promoter Scores:

  1. Increased variability with traditional Net Promoter Scoring: Be aware that scoring using top two box minus bottom six increases the variability and adds more noise in the data. This has two unfortunate consequences: differences are more noticeable, even if they aren’t necessarily meaningful, and fewer differences will be statistically significant.
  2. The mean detects smaller differences: While this analysis compared changing only the scale, using the mean detected more significant differences than the top two box minus bottom six box scoring system. You’re less likely to be fooled by fluctuations when the mean is used.
  3. Small differences when changing scale options: While changing the NPS scale from 11 to 10 or 5 points produced statistically significant differences with the mean, the differences are probably not practically significant as they are on average only 1% different from each other.
  4. Don’t throw the NPS baby out with the measurement bath water: The Likelihood to Recommend question is meaningful to the customer, executive friendly, and most important, has historical data and benchmarks to make it meaningful. Just because it has shortcomings doesn’t mean you reject it entirely or stop using it.
  5. Best of both worlds: Consider combining approaches. Use the traditional scoring for the executive dashboards and use the mean for making statistical (and practical) decisions about changes over time or against benchmarks and competitors.

Thanks to Jim Lewis for commenting on an earlier version of this article.