Every estimate we make from a sample of customer data contains error. Confidence intervals tell us how much faith we can have in our estimates.
Confidence intervals quantify the most likely range for the unknown value we’re estimating. For example, if we observe 27 out of 30 users (90%) completing a task, we can be 95% confident that between 74% and 97% of all real-world users can complete that task.
But what exactly does a confidence level of 95% mean?
Well, if you were to take a sample from the same customer population 100 times and then compute a confidence interval around the task-completion rate each time, 95 of those intervals would contain the true task-completion rate. The other 5 times, the interval would not contain the true task-completion rate. You can never know that the interval you compute contains the true percentage. Statistics is about understanding and managing the risk of being wrong.
Confidence and the p-value
The confidence level is related to the p-value obtained when conducting statistical comparisons. We usually consider something “statistically significant” if its p-value is less than 0.05 (or 5%).
The confidence level and the p-value that determines the threshold for statistical significance are values we set ahead of time, using what we call the alpha level. If we choose an alpha level of 0.05, for example, then a p-value smaller than 0.05 is considered statistically significant, and our confidence level (1–alpha) is 0.95.
Although we most often set alpha to 0.05, it can take any value from just above 0 (e.g., 0.00001) to just below 1 (e.g., 0.99999). I’m often asked what the best level of confidence to use is. The answer is that it depends on the consequences of being wrong. To help put that into context, here are different thresholds commonly used for confidence (and p-values) that you can apply. Select the level which most closely matches your situation.
Pharmaceutical Confidence: 99%+ When a bad decision can lead to injury or death—say, when you’re evaluating clinical trials and drug interactions—you want a high level of confidence in your intervals and a high standard for declaring statistical significance. Of course, higher levels of confidence come with higher costs; testing in the pharmaceutical environment often involves sample sizes with thousands of participants. | |
Publication Confidence: 95%+ Peer-reviewed journals and high-level political polls typically require a confidence level of 95% (and corresponding p-value of less than 0.05). When you choose to break with the 0.05 norm, plan to defend your choice. And it’s going to be difficult to publish your research if your p-values exceed .05. | |
Industrial Confidence: 90%+ We often use a 90% confidence level in our client analysis when analyzing both survey data and usability benchmarks because 90% confidence for a two-sided statement equates to 95% confidence for a one-sided statement (e.g. at least 75% of users can complete a task). In many environments, dipping below 90% takes your stakeholders out of their comfort zone. See Chapter 4 in our book, Quantifying the User Experience, for more discussion on one- and two-sided confidence intervals. | |
Exploratory Confidence: 80%+ When you need only reasonable evidence—when, for example, you’re looking at product prototypes, early-stage designs, or the general sentiments from customers—the 80% level of confidence is often sufficient. When your sample sizes are smaller, confidence intervals widen and you rarely get statistically significant results with high confidence. When you relax your alpha to 0.20, you’ll be fooled more often by chance variation. When the consequences of being wrong are not dire, though, then this may be a sufficient level. | |
Casino Confidence: 51+% What happens in Vegas stays in Vegas, and that’s the case with your money too. In a casino, the longer you play the less likely you’re going home a winner. Games of chance are rigged to give the house a small edge. Where your business is on the line, we don’t recommend declaring statistical significance with p-values of 0.49. But if you find that there’s little or no downside to being wrong, and if you have to pick among poor alternatives, 51% confidence is at least better than flipping a coin. | |
While you may want to go for high confidence in every situation to minimize the risk of being wrong, the price to pay for high confidence is large sample sizes. With high confidence and smaller sample sizes, you increase the chances of another problem in statistics (false negatives), that is, not declaring statistical significance when there actually is a difference. But that’s a topic for a future post.
Learn More: UX Measurement Boot Camp
Intensive Training on UX Methods, Metrics and Measurement
Denver: Aug. 5th-7th, 2020 |