{"id":176,"date":"2013-07-30T22:45:00","date_gmt":"2013-07-30T22:45:00","guid":{"rendered":"http:\/\/measuringu.com\/rating-severity\/"},"modified":"2022-03-21T18:06:23","modified_gmt":"2022-03-22T00:06:23","slug":"rating-severity","status":"publish","type":"post","link":"https:\/\/measuringu.com\/rating-severity\/","title":{"rendered":"Rating the Severity of Usability Problems"},"content":{"rendered":"
If only one out of 1000 users encounters a problem with a website, then it’s a minor problem.<\/p>\n
If that sentence bothered you, it should.<\/p>\n
It could be that that single problem resulted in one visitor’s financial information inadvertently being posted to the website for the world to see.<\/p>\n
Or it could be a slight hesitation with a label on an obscure part of a website.<\/p>\n
It’s part of the responsibility of user experience professionals to help developers make decisions about what to fix.<\/p>\n
Accounting for problem frequency and severity are two critical ingredients when communicating the importance of usability problems. They are also two of the inputs needed for a Failure Modes Effects Analysis (FMEA)<\/a>, a more structured prioritization process.<\/p>\n Measuring the frequency of a problem is generally straightforward.\u00a0 Take the number of users that encounter a problem divided by the total number of users. For example, if 1 out of 5 users encounter a problem, the problem frequency is .20, or 20%. The problem frequency can then be presented in a user-by-problem matrix<\/a>. It can also be used to estimate the sample size needed<\/a> to discover a certain percent of the problems.<\/p>\n Rating the severity of a problem is less objective than finding the problem frequency. There are a number of ways to assign severity ratings. I’ve selected a few of the more popular approaches described in the literature, and I’ll contrast those with the method we use at Measuring Usability.<\/p>\n While there are differences in approaches, in general each method proposes a similar structure: a set of ordered categories reflecting the impact the problem has on the user, from minor to major.<\/p>\n Jakob Nielsen<\/a> proposed the following four-step scale a few decades ago:<\/p>\n 0 = I don’t agree that this is a usability problem at all In Jeff’s influential 1994 book<\/a>, he outlined the following scale for problem severity:<\/p>\n 4: Unusable:<\/b> The user is not able to or will not want to use a particular part of the product because of the way that the product has been designed and implemented. Joe Dumas and Ginny Redish, in their seminal book, A Practical Guide to Usability Testing<\/a>, offer a similar categorization as Rubin and Nielsen but add a global versus local dimension to the problems. The idea is that if a problem affects the global navigation of a website, it becomes more critical than a local problem only affecting, say, one page.<\/p>\n Level 1<\/b>: Prevents Task Completion Chauncey Wilson suggests<\/a> that usability severity scales should match the severity rating of bug-tracking systems in a company. He offers a five-point scale with the following levels. Earlier, he’s used a similar four-point variant<\/a>[pdf]<\/span>.<\/p>\n Level 1<\/b>: Catastrophic error causing irrevocable loss of data or damage to the hardware or software. The problem could result in large-scale failures that prevent many people from doing their work. Performance is so bad that the system cannot accomplish business goals. The Wilson and Dumas & Redish scales have the more severe problem with lower numbers. That is because in the early days of computing, severe bugs were called “level 1 bugs” and those had to be fixed before product release (Dumas, Personal Communication 2013). In this scale, the problems are defined in terms of data loss rather than their impact on users’ performance or emotional state.<\/p>\n Rolf Molich is famous for his series of comparative usability evaluations (CUE)<\/a>.\u00a0 He’s also famous for reviewing and writing (often critically) about the quality of usability reports. He and Robin Jeffries offered a three-point scale.<\/p>\n 1. Minor: <\/b>delays user briefly. This three-point approach is simpler than others but tends to rely heavily on how the problem impacts time on task.<\/p>\n Originally we started with a 7-point rating scale where evaluators assigned the problem severity a value from cosmetic (1) to catastrophic (7) but we found it was difficult to distinguish easily between levels 2 and 6. We reduced this to a four-point scale similar to Rubin, Nielsen and Dumas\/Redish above and treated them more as categories than a continuum.<\/p>\n While there was much less ambiguity with four points, we still found a murky distinction between the two middle levels in both assigning the severity and reporting the levels of problems to clients.<\/p>\n So we reduced our severity scale to just three levels, along with one for insights, user suggestions or positive attributes.<\/p>\n 1.\u00a0\u00a0\u00a0 Minor<\/b>: Causes some hesitation or slight irritation.<\/b> Insight\/Suggestion\/Positive<\/b>: Users mention an idea or observation that does or could enhance the overall experience.<\/p><\/blockquote>\n I’ve put abbreviated versions of these scales below in the table to show the similarities in some of the terms and levels.\u00a0 I’ve also aligned the scales so higher numbers indicate more severe problems.<\/p>\nProblem Frequency<\/h2>\n
Problem Severity<\/h2>\n
Jakob Nielsen<\/h3>\n
\n1 = Cosmetic problem only: need not be fixed unless extra time is available on project
\n2 = Minor usability problem: fixing this should be given low priority
\n3 = Major usability problem: important to fix, so should be given high priority
\n4 = Usability catastrophe: imperative to fix this before product can be released<\/p><\/blockquote>\nJeff Rubin<\/h3>\n
\n3: Severe<\/b>: The user will probably use or attempt to use the product here, but will be severely limited in his or her ability to do so.
\n2: Moderate<\/b>: The user will be able to use the product in most cases, but will have to undertake some moderate effort in getting around the problem.
\n1: Irritant<\/b>: The problem occurs only intermittently, can be circumvented easily, or is dependent on a standard that is outside the product’s boundaries. Could also be a cosmetic problem.<\/p><\/blockquote>\nDumas and Redish<\/h3>\n
\nLevel 2<\/b>: Creates significant delay and frustration
\nLevel 3<\/b>: Problems have a minor effect on usability
\nLevel 4<\/b>: Subtle and possible enhancements\/suggestions<\/p><\/blockquote>\nChauncey Wilson<\/h3>\n
\nLevel 2<\/b>: Severe problem, causing possible loss of data. User has no workaround to the problem. Performance is so poor that the system is universally regarded as ‘pitiful’.
\nLevel 3<\/b>: Moderate problem causing no permanent loss of data, but wasted time. There is a workaround to the problem. Internal inconsistencies result in increased learning or error rates. An important function or feature does not work as expected.
\nLevel 4<\/b>: Minor but irritating problem. Generally, it causes loss of data, but the problem slows users down slightly. There are minimal violations of guidelines that affect appearance or perception, and mistakes that are recoverable.
\nLevel 5<\/b>: Minimal error. The problem is rare and causes no data loss or major loss of time. Minor cosmetic or consistency issue.<\/p><\/blockquote>\nMolich & Jeffries<\/h3>\n
\n2. Serious: <\/b>delays user significantly but eventually allows them to complete the task.
\n3. Catastrophic: <\/b>prevents user from completing their task.<\/p><\/blockquote>\nOur Approach<\/h3>\n
\n2.\u00a0\u00a0 \u00a0Moderate<\/b>: Causes occasional task failure for some users; causes delays and moderate irritation.
\n3.\u00a0\u00a0\u00a0 Critical<\/b>: Leads to task failure. Causes user extreme irritation.<\/p>\nSummary<\/h3>\n