It seems that the pendulum has started to swing against the Net Promoter Score.

Three years ago, I often heard how everything in a company mattered only as long as it contributed to a better Net Promoter Score.

This new metric that the management had just bought into was how everything was measured, including employee bonuses.

In case you aren’t familiar with the NPS, it’s a score based on a single question, “How likely are you to recommend a product to a friend or colleague?” Participants respond on an 11 point scale (0 = not at all likely to recommend and 10 = extremely likely to recommend). Respondents of 9’s and 10’s are considered “promoters”, 7-8’s “passives” and 0-6 “detractors.” Detractors are customers that are likely saying bad things about your product or service and even discouraging others to use it. Promoters are the ones considered the most likely to spread positive word of mouth.

The “Net” in Net Promoter Score comes from subtracting the percentage of detractors from the percentage of promoters. A negative score means you have more detractors than promoters and a positive score means there are more promoters (that is, more positive word of mouth than negative word of mouth).

The NPS system was and still is sold to executives across every industry. Starting from the 2003 HBR article[pdf], it was positioned as the one number a company needs to grow. In the decade leading up to the introduction of the Net Promoter, there was a shift from measuring customer satisfaction (or measuring nothing at all) to measuring customer loyalty.  The decade of 2003 to 2013 will likely go down as the decade of the Net Promoter Score (at least in customer analytics circles).

In the last year, I’ve worked with three large companies who were once big NPS shops. Increasingly, however, they have pivoted away from NPS like it was a passing fad (like TQM or the South Beach Diet). While the same criticisms cropped up when the NPS replaced other measurement systems, some companies are beginning to take them more seriously.

In fact, these criticisms were some of the same ones I made in 2003 when the NPS was introduced at Intuit. However, we shouldn’t throw the baby out with the bathwater. Despite its shortcomings, the idea of measuring word of mouth and having a simple metric that companies can compare provides more gain than pain.

Here are five of the most common criticisms of the NPS and my thoughts on why they aren’t fatal flaws, at least not when making some adjustments.

1.    The NPS correlates with customer satisfaction

Customer loyalty and customer satisfaction are related ideas. They are in fact correlated too, as are usability and satisfaction and usability and loyalty. But when a measure is correlated with another measure, it does not mean that one is a replacement for the other (unless the correlation is very high or a perfect correlation of r = 1 or r =-1).

It’s certainly unlikely to have customers that are loyal but unsatisfied, but can you have satisfied customers who aren’t loyal? It would seem like you couldn’t. However, one of the impetuses behind the NPS and the shift toward loyalty measures was that customer satisfaction seemed to become uncoupled from the things companies care most about: profits and long term growth.

In The Loyalty Effect, Fred Reciheld (the creator of the NPS) makes the case that 60 to 80 percent of customers who defected or didn’t repurchase a product were in fact satisfied or very satisfied! In the Auto Industry, for example, he cited figures which showed a 90% satisfaction rate and yet, on average, only 40% of customers repurchased the same brand of car.

So, while satisfaction relates to loyalty, they aren’t the same thing. It’s usually best to measure both and not rely on one. It requires minimal additional effort to record and analyze. If you drop NPS but care about customer loyalty, than you should at least consider another way to measure loyalty, like past purchases or likelihood to repurchase (which will both correlate with the NPS!).

2.    You shouldn’t use an 11 point scale

The number of response options matters most when you have only few items in a questionnaire. The NPS is effectively a single item questionnaire and therefore, all other things being equal, the more response options, the better. So, reducing the number of scale steps to 9,7 or 5 points actually causes more harm than good (albeit slightly). But more importantly, by changing the number of response options, you lose the ability to compare your NPS with published benchmarks. Don’t forget that comparability is one of the strongest arguments for using the Net Promoter Score!  If you’re going to ask users how likely it is they’d recommend a product or service then you should stick with the 11 point scale.

3.    People are poor predictors of their future behavior

That’s true. Predicting the future is notoriously difficult, and it’s even difficult when we try to predict our own behavior. Just because people say they will do something (recommend or repurchase or show up somewhere) doesn’t mean they will. But the same problem exists with, say, predicting voter turnout. Elections are usually decided not on the proportion of public opinion but on the proportion who bothers to vote!  But who will actually show up and vote? Well, we don’t know, but we can ask and assume that people that answer “yes, definitely” are more likely (but not guaranteed) to vote than people who answer “probably not.”

The same idea applies to the likelihood to recommend question. The reason promoters are only the top two boxes on the 11 point scale is that these customers are the most likely to recommend. Will they actually recommend? It’s hard to say, but our data suggests most, but not all, will. We call this differential “promoter efficiency.”  As a way to gauge how many customers who say they will actually recommend a product or website, we ask customers if they have in the last 12 months. We see this percentage fluctuate quite substantially. For example, in consumer software we found 96% of Dropbox promoters reported recommending in the last year whereas only 58% of Norton Anti-Virus promoters did.

Remember, it’s less important to nail down the exact number of customers that will actually recommend than to understand how that estimate, with its imperfections, differs across companies, products or over-time. As with most measures, scores make much more sense when compared. To understand how reliable a measure this is within your organization, you can track customers over time and see how many customers who say they will recommend, actually did.

4.    You should use another question

There are other ways of measuring customer loyalty. The most basic loyalty question is whether customers will continue to repurchase or reuse your product or service. But repurchase rates only speak to your current customer base. While you don’t want to lose customers, this measure of churn won’t tell you much about whether there’s positive word of mouth. I encourage clients to ask both about future purchases and customers’ intent to recommend.

If you’re interested in estimating how many customers are likely to positively or negatively talk about your company, take a look at Larry Freed’s recent book on another way to measure word of mouth. His word of mouth index (WoMI) distinguishes between positive and negative word of mouth by asking customers both the likelihood to recommend question and if they would discourage others from doing business with a company. But while Freed’s book is critical of the Net Promoter Score as the best metric, his objection is not to using the NPS, he just thinks it can be improved by adding an additional question.

The likelihood to recommend question is not a replacement for customer satisfaction or repurchase intention, but it will almost surely correlate highly with other ways of measuring word of mouth.  In fact, using the data in Freed’s book for the top 100 US brands, the WoMI index score and the Net Promoter Scores have a high average correlation of r = .8.

5.    The scoring inflates the margin of error

By converting an 11 point scale into a 2 point scale of detractors and promoters (throwing out the passives), information is lost. What’s more, this new binary scale doubles the margin of error around the net score (promoters minus detractors). Unfortunately, that means that if you want to show an improvement in Net Promoter Scores over time, it takes a sample size that is around twice as big to calculate the difference, otherwise the difference won’t be distinguishable from sampling error.  This has been one of my biggest complaints about the way the NPS was calculated since I was first introduced to it.

I’ve seen organizations looking at NPS dashboards and investigating why the NPS has gone up or down over a period of time. In too many cases, adding some error-bars to the graphs show that the changes are within the margin of error. The simple workaround is to use the raw mean and standard deviation when running statistical comparisons. We found that the mean likelihood to recommend responses can predict the Net Promoter Score quite well. You can use the net scoring system for the executives  but use the raw data for the statistics.

The Net Promoter Scoring system certainly has its problems and my friends at Bain and Satmetrix have probably oversold it. Yet, despite its problems, many of which I’ve outlined here, the NPS works fine as a measure if you understand the shortcomings and make some adjustments. It’s never a good idea to put all your measurement eggs in one basket. Consider multiple measures of both satisfaction and loyalty. You should also look to understand how well the NPS correlates (or even predicts) revenue and growth in your organization and understand how other measures may do a better job.

One benefit of the NPS is that it’s gotten executives and entire organizations thinking about a metric that is meaningful to the customer and user. What’s more, because of its ubiquity, comparisons to competitors or even products in the same company are much easier. And like many heated discussions around rating scales, different variations in response options or question wording usually provide similar results.

The challenge is not in how to write the questions or response options, the challenge is in actually doing something with the information.