There are a lot of misconceptions about when it is and when it is not appropriate to test with five users.
There’s no reason to take an extreme position on this issue and think it’s never acceptable or always the right number.
Instead you should understand what you can and cannot learn from just a handful of users in a usability test.
Five Reasons You Should Test With Five Users
- Five users will uncover most of the obvious issues: If a usability problem affects at least 31% of all users of your application or website, then you’re very likely to see these problems in a usability test. The key thing to remember is that you are limited to seeing issues that affect many users. ‘Many’ in this case means about 1 in 3 users—which is a common occurrence. With just five users there is some chance you will see those less frequent issues (e.g. problems that affect 1 in 10 users), but don’t plan on it.
- You can test five users in one day: With the popularity of agile and Lean UX, it’s often imperative to get feedback very quickly. Schedules are hard to coordinate and getting apps stable and accessible are a challenge. Plan for one day of testing, ideally scheduled regularly. In an eight hour day with one hour testing sessions, some bathroom breaks, some tardy users, technical issues, and debriefing with the team, five users is about all you should schedule and about all your brain as a facilitator can handle.
- Identify high frequency problems: If you observe five out of five users have a problem with an interface element, then it’s highly improbably that fewer that 71% of all users would also have a problem. That’s compelling statistical evidence even at very small sample that the interface needs to be fixed. This value is obtained by looking at the lower boundary of a binomial confidence interval.
- It’s better to identify issues and fix them: If you have the budget in early stage testing for 10 or 15 users, it’s better to test them in two or three batches of five users. When you see issues that affect most of the five users, or issues that affect only one but are easy to fix, then try and have them all addressed before the next iteration of testing.
A lot of planning goes into recruiting, developing tasks and getting a stable testing environment. It can be a waste of time to watch the same users encounter the same problems over and over again if you know they should be fixed. Even worse, some usability issues may actually block other issues from being uncovered as users may be unable to progress through a task. Five users offer a good stopping point for you to learn lessons and make necessary adjustments for the next round of testing.
- If all five users complete a task at least 70% of all users can complete the task: While it’s hard to show that an application or website is usable with a small sample, there are some cases where we have enough evidence it is usable enough. In cases where all five users complete a task, we can be 90% confident between 71% and 100% of all the users would also be able to complete the task. This is the same mathematical approach we took with estimating problem occurrences in the population–the confidence interval.
While it doesn’t happen for every task, in most usability tests we do reach this level of confidence for at least one task. At this point we usually have enough evidence that the task is usable enough and effort is better spent having users attempt another task or explore another part of the interface in the next iteration.
Five Reasons You Should Not Test With Five Users
- You want to detect less obvious problems: By definition, if a problem in an interface affects a smaller portion of your user population, then it’s more difficult to detect these problems with just a few users in a usability test.
At a sample size of five, there’s only a 23% chance you’ll see problems that affect 5% of all users. Also at a sample size of five you only have a 5% chance of seeing problems than 1 out of 100 users will encounter.
It’s not impossible to observe these infrequently occurring problems, it’s just not as likely, and not something you should plan on with just five users. This is the most common misunderstanding with the five user rule—five users won’t uncover 85% of all problems, just 85% of the most obvious issues—where obvious means affecting about 1 in 3 users.
- You want to see if one design has a higher completion rate: At a sample size of five, even the largest difference in completion rates between designs won’t be statistically significant. If you need to compare completion rates between designs you should plan on a higher sample size. For example, to detect a difference of 10 percentage points (e.g. between an 85% and 95% completion rate), you should plan on a sample size of 224 (112 users in each group).
- You want a small margin of error around your metrics: If you want to estimate the percentage of all users that would complete a task or have a problem with a design element, the margin of error at a sample size of five is around +/- 34%. At sample size of 20 you’ll have a margin of error of around 20% and in order to cut your margin in error in half, you’ll need to roughly quadruple your sample size. If you want to estimate the completion rate to within +/- 10% you should plan on testing around 80 users.
- You want to be sure at least 90% or more of users can complete a task: Even if all five users complete a task, you can only be around 75% confident at least 90% of users will complete the task. While that’s much better than a 50/50 chance, it’s not a strong level of confidence to ensure a larger majority of users can complete the task.
This is an example of the principle that it’s harder to show that a task or website is usable with a small sample than it is to show that a task or website is unusable. To ensure a higher percentage of users in the entire population can complete a task, you’ll need to test with more than five users (a sample size of 30 would be the minimum you’d need, and only if all 30 users complete the task).
- You want to detect the most severe usability issues: A small sample size by definition will uncover the most frequent usability problems, but will the first few users uncover the most severe usability issues?
Don’t confuse how many users a problem affects with how much harm a problem can have on the user experience. Our analysis suggests that problem frequency and severity are independent and that issues with a more detrimental impact on the experience are just as likely to occur with a low frequency. While you are likely to see a good mix of critical and cosmetic issues in any sample of five users, you can’t reliably expect to see more severe issues with the first few users.