If you’re familiar with usability testing then you’re familiar with the magic number 5.
Five users will on average find most of the problems that affect at least one-third or more of your users. If problems are less common, then you will need to test more users to find and fix them.
On many high-traffic websites usability problems affect less than 1 out of 10 users so a larger sample size is needed. On business applications problems are about three times more common so smaller sample sizes will often suffice to find an equal amount of problems.
For comparing products, estimating completion rates or task times, the sample sizes formulas are different and depend on how precise you need to be and the variability in the sample.
Despite the formulas and controversy, people ultimately decide on how many users they test.
To find out how many users people test, I asked subscribers to measuringu.com last year how many users they tested in their most recent Formative and Summative usability tests. In total, I received 130 responses and here’s what people said.
Number of Users tested in Formative tests
There was a lot of variability in the responses. Of the 95 people who reported conducting a Formative test in the last two years, the median number of users tested was 10. Most people (82%) reported testing less than 15 users in total.
The mean number tested was 24 with a standard deviation of 54 (min 2 and max 350). These numbers are the total number of users tested, as many respondents reported testing multiple iterations of users. There were five tests with reported sample sizes above 100 that skewed the mean upward. These tests were of either websites or consumer products where a remote unmoderated test was conducted (e.g. to test the information architecture of a website). 81% reported testing more than 5 users.
Average Number of Users in Summative tests
Somewhat surprisingly, the number of users reported for Summative tests wasn’t much different than the Formative number. Of the 68 respondents who had conducted a Summative usability test, most (70%) reported testing less than 15 users. The median number of users tested was 12.
The mean number tested was 27 with a standard deviation of 46 (min. 4 and max 245). The largest differences between the Formative and Summative sample sizes were found between the smallest sample sizes (2-5 users) and largest sample sizes (20+ users).
Summative Tests have 3x more users
There are many variables that affect the number of users people test such as the product type, budgets and industry. To attempt to control somewhat for this variability I compared the sample sizes for the 33 respondents who reported conducting both Formative and Summative tests.
On average these respondents reported testing almost 3 times as many users on their most recent Summative test compared to their most recent Formative test (95% CI between 1.5 times and 4 times higher). So the graphs above mask this interesting relationship.
In generalizing results from a sample to the larger population representativeness is more important than sample size. Subscribers to my email newsletter are perhaps more quantitatively focused and that might bias their sample sizes upward. I looked at two other data sources to get an idea of how representative this data was.
In 2009 Jim Lewis and I reviewed 97[pdf] Summative datasets. We found the median number of users per test was 10, ranging from 4 to 296. Sixty-four percent of the tests had between 8 and 12 users and 80% had fewer than 20. These numbers are virtually identical to the Summative survey sample presented here.
In 2007, Hornbaek and Law [pdf] reviewed dozens of datasets that appeared in HCI publications. The average number of users per study was 32 with a standard deviation of 29 (min 6 and max. 181). They didn’t distinguish between Formative and Summative tests in their analysis but this figure also isn’t far off from the sample data.
Both sources suggest this sample is reasonably representative of the larger population of usability tests.
People Test More than 5 Users: Evaluators typically test more than 5 users. In this sample 81% tested more than 5 users in Formative tests and 91% tested more than 5 users in Summative tests.
Benchmark Your Sample Sizes: You can use this data as another benchmark when planning your next sample size. For example, if you plan on testing four rounds of four users (16 total) in a Formative usability test you’d have a sample size greater than 80% of all Formative usability tests.
Remote-Testing will change the numbers: Cheap unmoderated usability testing services are allowing for much larger sample sizes–even in Formative usability tests.. The data analyzed here is a year old and such tools have continued to increase in the last year. I suspect we’ll see an increase in the average sample size, especially for Summative/benchmarking studies.
The difference between Formative and Summative Tests is blurring: There is a common belief that Formative tests (which inform design decisions) are small sample qualitative studies and Summative tests are large sample quantitative studies. The many large sample Formative tests here suggest this distinction is blurring.
Surveys are reliable ways of evaluating sample sizes: It is a lot easier to just ask people how many users they tested then go through the trouble of actually reviewing dozens of reports. One concern I had was that people may inflate their sample size because it sounded better. The data here suggests that surveys can be a reliable method for estimating sample sizes.