While testing with five users might reveal 85% of problems that impact 31% of users (given a set of tasks and user-type), it doesn’t mean you’re finding 85% of the critical problems. Are severe usability problems likely to occur more frequently, less frequently or is problem severity independent of frequency?
The data on this is mixed. The paper from Bob Virzi in 1992 showed there to be an association between frequency and severity (more severe problems were encountered by more users). The follow up paper by Lewis 1994 found frequency and severity to be independent. There hasn’t been much additional data addressing this issue since then.
One complication in this matter is the definition of critical. Rating a problem’s severity is typically not an objective task. It usually involves thinking about the potential impact of the problem (losing work, crashing a system or making a trivial and correctible error). Evaluators typically assign a severity code (1-3, 1-4 or 1-7) to the problem. These severity codes often correspond loosely to adjectives such as cosmetic, trivial, moderate and critical/severe and my personal favorite–catastrophe.
When assigning severity codes to problems, it would be hard not think about how many users would be impacted by the problem. That is, all things being equal, if two problems have about the same medium negative consequences, but one was seen affecting only one user in a test, and the other was seen affecting most users in a usability test, the latter would be considered as more severe. Taking account of severity and frequency into one rating can be called criticality (I think Jeff Rubin gets credit for that term) and I suspect many practitioners take this approach.
Even if different people rate problem severity separate from those who have knowledge of the problem frequency, problem statements often can contain hints about total impact based on whether the functionality is obscure or common (think of a problem on a homepage vs. one on the Terms & Conditions page).
Assume severity and frequency are not related
It would be nice if more severe problems affected more users. If they did, we’d really get the bang for the buck with small sample sizes. Until we have such evidence we should assume there is no association between severity and frequency.
Think in terms of classes of problems
In addition to frequency and severity, Bob Virzi suggests we should be thinking about whether or not there are “classes of problems that, if present, tend to affect lots of people, and also tend to be hard to recover from for individuals.” An example of a class of problems would be problems related to the Microsoft Office ribbon (something that would affect a lot of users, but not necessarily be hard to recover from).
Quantify problem frequency and severity in formative tests
This is another reason to quantify and categorize the problems you have during a formative usability test. Qualitative feedback is essential for usability improvements, but with a little more effort you should also note :
- Total number of users that experienced a problem.
- Which users experience the problem.
- Problem severity.
Having this information will allow you and your organization to quantitatively show how usability and design changes reduced the number and severity of problems.
- Rubin, Jeff (2008) Handbook of Usability Testing (2nd Edition)
- Lewis, James(1994) “Sample Sizes for Usability Studies: Additional Considerations[pdf] ” in Human Factors 36(2) p. 368-378, 1994
- Virzi, Robert (1992) “Refining the Test phase of Usability Evaluation: How many subjects is enough?” in Human Factors (34) p 457-468 1992