Surprisingly there is very little out there on the frequency of usability problems. Part of the reason is that most usability testing happens early in the development phase and is at best documented for an internal audience.
Once a website is launched or product released what little usability testing is done is typically more on benchmarking than on finding and fixing problems.
I wanted to get an idea of how likely a user is to encounter a problem on completed software so I reviewed usability publications and a collection of usability reports from companies.I only included tests on completed applications and live websites and excluded those that were in the design phase and didn’t have current users at the time of testing.
24 Usability Tests with Problem Frequencies
My investigation turned up a wide range of products and websites from 24 usability tests. Examples include rental car websites, business applications (financial and HR) and consumer productivity software (calendars, spreadsheets and word-processors). I didn’t include data from Heuristic Evaluations and Cognitive Walk-Throughs. While these methods are effective for identifying usability problems I wanted to focus just on problems that users actually experienced.
Finding the Average Usability Problem Frequency
To get an idea of how common problems are I divided the number of users that encountered each problem reported in each of the 24 tests. I then averaged all the problems frequencies for each test. For example, in one usability test with 8 users, 9 total problems were found. The number of users that encountered each problem is shown in the table below.
The average problem frequency for this test is .639 or there’s about a 64% chance a user will encounter a problem with this application and set of tasks.
Average Problem Frequencies are Inflated
One problem with this calculation is that it is a bit inflated. More obvious problems tend to show up with small sample sizes. It takes a much larger sample size to detect problems that affect a small percentage of users. If more users were tested, more idiosyncratic problems would turn up and thus lower the average problem frequency (a sort-of long tail of usability problems).
Fortunately there is a way to adjust for this bias to provide a more accurate picture of the average problem frequency (Good Turing & Normalization Methods[pdf]). As a short-cut approach, multiplying the average frequency by .9 and subtracting .04 tends to work just as well. So for the example problem set above we get an average problem frequency of .9*(.639) – .04 = .54. In other words, there’s a 54% chance a user will encounter at least one usability problem with this interface. For this application, a usability problem is unfortunately common.
Business Application have more Usability Problems than Consumer Software and Websites
The list of problem frequencies for the 24 studies are shown in the table below. If the data came from a publication it is noted with a superscript reference number.
|Business Applications||Consumer Software||Websites|
|Mean (SD)||.37 (.19)||.23 (.11)||.04 (.02)|
While there are a couple exceptions, this data tells us something that may seem obvious: On average users are more likely to encounter usability problems in business software than with consumer software and websites [F(2,23) =10.74; p <.01].
What wasn’t obvious was the magnitude of the difference. Usability problems are almost ten-times more common on business applications than on websites. The ratio is around 2 to 1 for business applications and consumer software.
Business applications are typically customized to integrate into enterprise systems. They are often purchased to reduce costs and improve productivity and users often receive some training or have specialized skills. Business software typically contains a lot of complex functionality and so it makes sense that there are more things that can impede a good user experience.
Consumer software and especially websites on the other hand are typically self-service. If users can’t walk-up and use a website, they’re gone. Switching costs for consumer software and websites are low so there is a low tolerance for a poor user-experience. They often have a fraction of the functionality of large-scale business applications.
Not All Usability Problems Have the Same Impact
Of course usability problems aren’t all created equal. Their impact can range from trivial to tragic and in this analysis I didn’t take into account problem severity. With that said there’s good reason to believe that problem frequency and severity are independent—meaning a usability problem is just as likely to have a major or minor impact on the user experience.
Sample Sizes for Usability Testing
Much of the controversy around how many users to test stems from the variability in problem frequency. If users are less likely to encounter a problem, then you need to test more users to find more problems. Even with this limited sample of usability tests, you can see that you’ll need a larger sample size when testing live websites than testing most live business applications.
The public facing websites in this sample have tens of thousands of daily users and most of the usability problems have been addressed. If you wanted to find more problems that affect at least 4% of users than you’ll need to plan on testing 46 users. In comparison, for many of the business applications in the sample, testing only 4 users would uncover most of the problems that affect at least 37% of users.
Track Problem Frequency
It’s a good idea to keep track of how many users encountered each problem, and then keep track of that information for internal benchmarking efforts and sample size planning. At the very least you can compare the problem frequencies of your next usability test to the benchmarks in this article. For example, if your website has an average usability problem frequency above 20% then you’ll know there’s still a lot of improvements than can be made.
In addition to reductions in task time and higher satisfaction ratings, reductions in the frequency of usability problems can be a great way to show how usability resources improved the user-experience.
- Law, L. -C. and Hvannberg, E.T(2004), Analysis of combinatorial user effect in international usability tests. In CHI Conference on Human Factors in Computing Systems, ACM, 9–16.
- Lewis, J. R., Henry, S. C., and Mack, R. L. (1990). Integrated office software benchmarks: A case study. In HCI – INTERACT ’90, Proc, (pp. 337-343). Cambridge, England:
- Nielsen, J (1994) Estimating the number of subjects needed for a thinking aloud test. International Journal of Human–Computer Studies 41,, 385–397.
- Spool, J. & Schroeder, W. (2001). Testing web sites: five users is nowhere near enough. In: CHI ’01: CHI ’01 extended abstracts on Human factors in computing systems, ACM.
- Virzi, R.A.(1992) Refining the test phase of usability evaluation: How many subjects is enough? HumanFactors 34, , 457-468.
- Woolrych, A. & Cockton, G. (2001). Why and when five test users aren’t enough.. In: Vanderdonckt, J.; Blandford, A. & Derycke, A. (Ed.), Proceedings of IHM-HCI 2001Conference, Cepadeus.