So, you’re planning to collect data and you want to know whether your Net Promoter Score (NPS) is significantly above 50%.
Established benchmarks can help research teams know if they’ve reached acceptable thresholds, such as a high Net Promoter Score (e.g., more than 50%). A high NPS is associated with successful product launches.
But an NPS of 55% from a sample of 20 participants has a lot less precision than one with 2,000 participants.
How large of a sample do you need to know that your NPS exceeds a benchmark beyond the effects of sampling error?
In an earlier article, we described methods for comparing Net Promoter Scores with benchmarks using significance testing and confidence intervals.
In this article, we complete the set of common analyses for NPS benchmark testing with a description of newly developed sample size estimation procedures.
Sample Size Estimation for a One-Tailed Test of NPS
Earlier, we provided the details on how to compute the sample size when comparing Net Promoter Scores from two samples of data. A benchmark comparison test is slightly different in that it has only one sample of data that is compared to a fixed value. It’s also different in that in almost all cases researchers want to know whether they’ve exceeded a benchmark, not whether their observed NPS is different than a benchmark. Consequently, as we described earlier, for a benchmark comparison we conduct a one-tailed instead of a two-tailed test.
The sample size estimation formula for comparing an NPS to a benchmark is
n = s2Z2/d2 − 3
In the formula,
- d is the minimum difference with which you can reject the null hypothesis.
- s is an estimate of the standard deviation.
- Z is a Z-score whose value is the sum of two-component Z-scores. One controls the likelihood of a Type I error (Zα, where 1 − α is the confidence level) and the other controls the likelihood of a Type II error (Zβ, where 1 − β is the level of power).
The process for computing a sample size starts with identifying a reasonable minimum difference you hope to detect between the sample and the benchmark. Then we work backward from the adjusted-Wald formula we used for the confidence interval around an NPS difference by solving for the sample size algebraically.
At a high level, the sample size estimation process has four steps:
- Start with deciding the minimum difference you need to detect (d).
- Estimate the standard deviation (s = the square root of the variance).
- Determine the appropriate level of confidence (Zα).
- Determine the appropriate level of power (Zβ).
After completing these steps, compute the sample size or use our lookup table (Table 1) to avoid the messy math. Next, we describe each of the four steps in detail, including three ways to estimate the standard deviation (variance).
1. Decide the minimum difference to detect (d).
This decision is the most important one for sample size estimation. Everything else being equal, detecting smaller differences requires larger sample sizes. Detecting very small differences requires very large sample sizes. There is no official designation, but it’s handy to think of small differences as being less than 10% and large differences as being greater than 25% (keeping in mind that context matters).
2. Estimate the standard deviation (s).
The standard deviation (the square root of the variance) is the most common way of measuring variability. Researchers often don’t know the standard deviation ahead of time. However, because we have a growing NPS database that includes standard deviations, we can make accurate estimates. There are three approaches:
- A conservative approach that assumes the greatest possible variability (max variance) will require a large sample size.
- A less conservative approach based on a more realistic estimate of maximum variance will require a smaller sample size.
- Use a known variance from prior data.
2a. Simple Approximate Formula Assuming Maximum Variance
If you have no idea about the variability of your NPS data, here’s a simple estimate that guarantees your sample size will be large enough to reject the null hypothesis for the target difference. Just set
s2 = 1
This estimate guarantees an adequate sample size for the goals of the study, but because it assumes maximum variance, it very likely will recommend a sample size that is much larger than you actually need.
2b. Simple Approximate Formula Assuming Maximum Realistic Variance
The next estimate is still simple but has a slight modification that reduces the overestimation of the previous one by replacing 1 with .67, which reduces the sample size estimate by about 33%.
You might wonder where the .67 came from. When half the respondents are detractors and the other half are promoters, the variability of NPS is at its maximum of 1. In our experience, it is unlikely to get an exact split like this. Our data from previous research on 18 sets of NPS data shows variances ranging from .40 to .76 with a mean (and 99% confidence interval) of .61 ± .06. Given an upper limit of .67 for the 99% confidence interval, we settled on using .67 as a reasonable estimate of maximum realistic variance.
2c. Maximum Accuracy Method Given Estimates of Variance
If you know something about the magnitude of adjusted variance for the NPS (from previous research or a pilot study), you can get a more accurate estimate of the required sample size.
To do this, compute an adjusted-Wald confidence interval for the previous/pilot data using the steps in our previous article, “Confidence Intervals for Net Promoter Scores“. That will give you a value for var.adj
s2 = var.adj
3. Set the confidence level to control the Type I Error (Zα).
Zα is the value associated with statistical confidence and the α criterion used to determine statistical significance (control of the Type I Error). Commonly used values for Zα are 1.645 (for one-tailed tests with α = .05) and 1.28 (for one-tailed tests with α = .10).
4. Set the level of power to control the Type II Error (Zβ).
Zβ is the value associated with statistical power and the β criterion used to control the Type II error. Always use one-tailed values of Zβ. Common values for Zβ are 0 (for 50% power, β = .5) and .842 (for 80% power, β = .2).
When you have an estimate of variance
In a UX survey of an email application conducted in early 2020, we collected likelihood-to-recommend ratings from 107 respondents. The estimated NPS was 42% (61 promoters, 30 passives, and 16 detractors); so the proportion of promoters (ppro) = 61/107 = .570 and the proportion of detractors (pdet) = 16/107 = .150 (a difference of .42), with an adjusted variance (var.adj) of .546.
Suppose we wanted to know what sample size we needed, if all else stayed the same, to be able to detect a difference of 5% in the email study. (In other words, to be able to claim that an observed NPS of 42% is significantly better than a benchmark of 37%.)
Assume we decided to use α = .05 (Zα = 1.645) to control the Type I error, β = .20 (Zβ = .842) to control the Type II error, var.adj = .546 from the email study, and to set d = .05. Given these assumptions, the estimated sample size requirement is
n = ((1.645 + .842)2(.546)/.052) − 3 = 1348 (rounded up)
When you don’t have an estimate of variance
When you don’t have an estimate of the variance, we recommend using .67, our estimate of the maximum realistic variance for the NPS. You could use the maximum possible variance for the NPS of 1, but the conditions under which that variance could happen (exactly equal proportions of detractors and promoters) aren’t realistic and substantially overestimate the required sample size.
If we keep everything the same as in the previous example except for the estimate of adjusted variance, our sample size required would be
n = ((1.645 + .842)2(.67)/.052) − 3 = 1654 (rounded up)
Sample Size Lookup Table for NPS Benchmark Comparison
Sample size estimates are best when you have some idea about expected variance, but that isn’t always possible. Table 1 shows maximum realistic variance estimates for a range of values of d (shown as percentages), for two α criteria (.10 and .05, i.e., 90% and 95% confidence), and two β criteria (.50 and .20, i.e., 50% and 80% power). If you need an estimate that isn’t in the table, use the formula.
|Difference to Detect||n (90% Confidence; 50% Power)||n (95% Confidence, 50% Power)||n(90% Confidence; 80% Power)||n(95% Confidence, 80% Power)|
For example, suppose you want to see whether an NPS exceeds 50%, a benchmark considered “excellent” by some for new products. You have no estimate of adjusted variability, so you decide to use Table 1. The larger you set the value of the critical difference (d), the smaller the required sample size will be, but the more difficult it will be to get an observed NPS that high.
If you set d to 5%, then you’re targeting an observed NPS of 55%. For the relatively low criteria of 90% confidence and 50% power, n = 438; for the more stringent criteria of 95% confidence and 80% power, n = 1655. The smaller sample size (n = 438) matches a plan in which your alpha criterion for significance is .10, and if your observed NPS is less than 55%, you probably will not get a statistically significant result. For the larger sample size (n = 1655), the alpha criterion is .05, and because the study has more statistical power, even if the observed NPS falls a bit short of 55%, you may still get a statistically significant result.
If the value of d is relaxed to 10%, the required sample size for 90% confidence and 50% power is n = 108; for 95% confidence and 80% power, n = 412. These sample sizes are substantially lower than those needed for d = 5%, but now the target value for the observed NPS is 60%, which is more difficult to achieve than 55%.
Sometimes UX researchers need to compare an estimated NPS with a benchmark. An important part of planning that research is to estimate the sample size needed to achieve its confidence, power, and precision objectives, considering the likely variability of the data. A special consideration for benchmark evaluation is to use one-tailed instead of two-tailed testing.
When there is an available estimate of adjusted variance (var.adj) from prior or pilot studies, researchers can use this information to tune a sample size estimate using the formulas in this article. If there is no estimate of adjusted variance, Table 1 can help researchers estimate sample size requirements.
When conducting a study in two or more stages, researchers can start with an estimate from Table 1. Use data from that first stage to compute var.adj, and then use that estimated variance to improve the accuracy of the initial sample size estimate for the second stage of the study.
This article is the last of an eight-part series in which we
- Described and initially evaluated three methods for computing confidence intervals for NPS.
- Conducted a comprehensive evaluation of the three confidence interval methods, concluding that the adjusted-Wald approach was the most sensitive and had accurate coverage.
- Developed a significance test for the comparison of two NPS based on adjusted-Wald confidence intervals.
- Conducted a comprehensive evaluation of three significance testing methods for NPS, concluding that the adjusted-Wald approach was the most sensitive.
- Developed a sample size estimation procedure for NPS confidence intervals based on the adjusted-Wald approach.
- Developed a sample size estimation procedure for comparing two NPS based on the adjusted-Wald approach.
- Adapted the previous work on confidence intervals and significance tests to the comparison of NPS with a benchmark.
- Developed a sample size estimation procedure for the comparison of NPS with benchmarks (this article).