{"id":613,"date":"2020-10-28T02:09:43","date_gmt":"2020-10-28T02:09:43","guid":{"rendered":"http:\/\/measuringu.com\/improving-prediction-of-the-number-of-usability-problems\/"},"modified":"2021-08-09T14:45:38","modified_gmt":"2021-08-09T20:45:38","slug":"improving-prediction-of-the-number-of-usability-problems","status":"publish","type":"post","link":"https:\/\/measuringu.com\/improving-prediction-of-the-number-of-usability-problems\/","title":{"rendered":"Improving the Prediction of the Number of Usability Problems"},"content":{"rendered":"

\"\"<\/a>Paraphrasing the statistician George Box<\/a>, all models are wrong, some are useful, and some can be improved.<\/p>\n

In a recent article<\/a>, we reviewed the most common way of modeling problem discovery, which is based on a straightforward application of the cumulative binomial probability formula: P(x\u22651) = 1 – (1-p)n<\/sup>.<\/p>\n

Well, it\u2019s straightforward if you like playing around with these sorts of formulas like Jim and I do.<\/p>\n

In this formula, p is the probability of an event of interest (e.g., a participant experiences a specific usability problem), n is the number of opportunities for the event to occur (sample size), and P(x\u22651) is the probability of the event occurring at least once in n tries. In other words, if you know n and p, you can estimate the likelihood of discovering (seeing at least once) a usability problem.<\/p>\n

For the formula to work, you need a value for p. In the most common method, p is estimated from a participant-by-problem matrix<\/a>. When the sample size is large, binomial models using estimates of p tend to closely match empirical problem discovery. When the sample size is small, however, estimates of p tend to be inflated\u00a0and lack correspondence with empirical problem discovery, so the projected rate of problem discovery would be faster than it actually is.<\/p>\n

Another potential issue with estimates of p aggregated across a set of problems with different likelihoods of occurrence is that the binomial probability model doesn\u2019t have any parameters that account for the variability of p. This can lead to a phenomenon known as overdispersion<\/a>, where the problem discovery model is overly optimistic compared to empirical problem discovery.<\/p>\n

Despite these issues, this original approach to modeling problem discovery is usually accurate enough to be useful. In this article, we explore some approaches to improving this wrong but useful model.<\/p>\n

Binomial p (Adjusted)<\/h2>\n

Before 2001, a common practice was to run a test with a few participants and then use those results to estimate p. With that estimate, you could use the cumulative binomial probability equation to predict how many participants you would need to achieve a specific percentage of discovery (acknowledging the generalizability limits due to product, tasks, participants, environments, and methods). Based on the pilot data, you could also predict for each sample size how many unique usability problems you would probably discover and how many would probably remain undiscovered. A flaw in this procedure was revealed by\u00a0Hertzum and Jacobsen\u00a0(2001<\/a>*), who demonstrated that this would always overestimate the true value of p, where \u201ctrue\u201d is the value of p at the end of a large-sample study.\u00a0(* This paper first appeared in 2001, but due to printing issues for some figures, was republished in 2003.<\/em>)<\/p>\n

Recalling the example from our previous article<\/a>, Virzi (1990<\/a>) published a participant-by-problem matrix with data showing the discovery of 40 problems in a study with 20 participants. When we estimated p from a randomly selected set of three participants, we got .59\u2014a substantial overestimate of the value of p generated from the full study (.36). With n = 3 and p = .59, the expected percentage of discovery from the cumulative binomial probability is .93 (93%). There were 25 problems detected with those three participants, so the estimated number of problems available for discovery using that value of p is 27 (25\/.93), far fewer than the 40 problems reported by Virzi.<\/p>\n

Inspired by Hertzum and Jacobsen, Lewis (2001<\/a>) developed a systematic method for reducing small-sample estimates of p using the following equation (see the article for its derivation):<\/p>\n

\"\"<\/a><\/p>\n

Figure 1: Formula for adjusting observed estimate of p (Lewis, 2001).<\/p>\n

The adjusted value of p is the average of two independent ways to reduce its value. The first is driven by the sample size, and the second is an application of Good-Turing discounting<\/a> based on the number of problems that were observed just once in the sample.<\/p>\n

If we use the formula in Figure 1 to adjust that initial estimate of p (.59), we note that the first three participants found 11 of the problems only once. With this information, the first method, which tends to bring the value of p down too much, gets p = .17. The second method, which tends to leave p too high, gets p = .41. The average of the two is the adjusted value of p is .29.<\/p>\n

When using this adjusted estimate of p, the predicted amount of discovery when n = 3 is .64, so the estimated number of problems available for discovery is 39 (25\/.64), very close to the observed number of 40.<\/p>\n

Binomial p (Targeted)<\/h2>\n

Despite their general success in modeling problem discovery, methods based on estimates of binomial p, adjusted or not, can be criticized because they don\u2019t account for the variability in point estimates aggregated across problems of varying probability of occurrence. One way to avoid this issue is to pick a target value of p instead of estimating it (Lewis, 2012<\/a>). Table 1 shows likelihoods of discovery for various targets of p and sample sizes, computed using the cumulative binomial probability formula.<\/p>\n\n\n\n\n\t\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t
p<\/th>n = 5<\/th>n = 10<\/th>n = 15<\/th>n = 20<\/th>\n<\/tr>\n<\/thead>\n
0.01<\/td>0.05<\/td>0.10<\/td>0.14<\/td>0.18<\/td>\n<\/tr>\n
0.05<\/td>0.23<\/td>0.40<\/td>0.54<\/td>0.64<\/td>\n<\/tr>\n
0.1<\/td>0.41<\/td>0.65<\/td>0.79<\/td>0.88<\/td>\n<\/tr>\n
0.25<\/td>0.76<\/td>0.94<\/td>0.99<\/td>1.00<\/td>\n<\/tr>\n
0.315<\/td>0.85<\/td>0.98<\/td>1.00<\/td>1.00<\/td>\n<\/tr>\n
0.5<\/td>0.97<\/td>1.00<\/td>1.00<\/td>1.00<\/td>\n<\/tr>\n
0.75<\/td>1.00<\/td>1.00<\/td>1.00<\/td>1.00<\/td>\n<\/tr>\n
0.9<\/td>1.00<\/td>1.00<\/td>1.00<\/td>1.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n

Table 1: Discovery likelihood for various targets of p.<\/p>\n

For example, suppose you want to have a pretty good (>90%) chance of discovering problems that are likely to occur 25% of the time (within the constraints of your study, as described above). Table 1 shows that you probably won\u2019t achieve this goal with n = 5, but you probably will with n = 10. For another example, suppose you want to discover 80% of the problems that will happen 5% of the time. Table 1 shows that you won\u2019t be likely to achieve this goal even when n = 20, so you\u2019d either need to plan for a larger sample size (n = 31) or set a different target for p. We use this approach at MeasuringU when scoping projects with clients, balancing sample sizes and budgets\/timelines for testing.<\/p>\n

Beta-Binomial and Other Complex Models<\/h2>\n

Another way to deal with the issue of binomial variability is to use beta-binomial modeling<\/a>, which is designed to deal with the overdispersion in binomial modeling that can lead to overly optimistic estimates of problem discovery. Schmettow (2008<\/a>) reported comparisons of adjusted-p and beta-binomial modeling of five problem discovery databases in which the beta-binomial model was a better fit in three cases and adjusted-p modeling was better in two cases. Other more complex approaches that can be used to modeling problem discovery include<\/p>\n