{"id":359,"date":"2009-08-06T17:38:53","date_gmt":"2009-08-06T17:38:53","guid":{"rendered":"http:\/\/measuringu.com\/test-margin\/"},"modified":"2021-01-28T06:30:24","modified_gmt":"2021-01-28T06:30:24","slug":"test-margin","status":"publish","type":"post","link":"https:\/\/measuringu.com\/test-margin\/","title":{"rendered":"Margins of Error in Usability Tests"},"content":{"rendered":"
How many users will complete the task and how long will it take them? If you need to benchmark an interface, then a summative usability test is one way to answer these questions. Summative tests are the gold-standard for usability measurement. But just how precise are the metrics?<\/p>\n
Just as a presidential poll uses a sample to estimate outcomes for the entire population, usability tests also estimate the population task time and completion rate from a sample of users. Also like presidential polls, our sample estimates won’t be exactly like the entire population. Instead, the estimates have a margin of error. If we were to sample 20 users, compute the mean time and completion rate, it would differ by some amount from the real average time and completion rate.<\/p>\n
The margin of error<\/a> is half the width of the confidence interval and the confidence interval tells us the likely range the population mean and proportion will fall in. Most presidential polls have a margin or error between +\/- 3% and +\/-5%. That means if they sampled another set of likely voters, the proportion saying they would vote a certain way would be expected to fluctuate between 6 and 10%. So what is the typical margin of error in a usability test?<\/p>\n To find out, I examined a large set of data from an earlier analysis Jim Lewis and I conducted<\/a>. We collected data from 100 summative usability tests across a dozen companies taking place over the last 25 years. In total there is data from over 2000 users and 1000 tasks. The majority of sample sizes across the 1000 tasks were between 8 and 12 users (64%). Eighty percent of the tasks had less than 20 users. For each task I computed the confidence intervals then halved the confidence interval to generate the margin of error around the average task times and completion rate. For task time I excluded users who failed the task. The results are shown in the graph below.<\/p>\n Figure 1 below shows the 95% confidence interval around the average margin of error for each sample size. The margin of error was calculating by transforming the raw times using the natural log and computing a t-confidence interval.<\/a> For example, at a sample size of 10, the average margin of error is between 34 and 38% of the mean. So if you had 10 users complete a task and you observed a mean time of 100 seconds, the mean of the entire population will likely be between 66 seconds and 134 seconds. Put another way, if you were to test another 10 users on the same task, their average time would most likely fall between 66 and 134 seconds.<\/p>\nMargin of Error for Task Times<\/h3>\n