{"id":314,"date":"2008-08-03T00:00:47","date_gmt":"2008-08-03T00:00:47","guid":{"rendered":"http:\/\/measuringu.com\/qualitative_sidebar\/"},"modified":"2021-01-28T06:30:12","modified_gmt":"2021-01-28T06:30:12","slug":"qualitative_sidebar","status":"publish","type":"post","link":"https:\/\/measuringu.com\/qualitative_sidebar\/","title":{"rendered":"Deriving a Problem Discovery Sample Size"},"content":{"rendered":"
Nielsen derives his “five users is enough” formula from a paper he and Tom Landauer published in 1993. Before Nielsen and Landauer James Lewis of IBM proposed a very similar problem detection formula in 1982 based on the binomial probability formula.[4<\/a>] Lewis stated that:<\/p>\n The binomial probability theorem can be used to determine the probability that a problem of probability p<\/strong> will occur r<\/strong> times during a study with n <\/strong>subjects. For example, if an instruction will be confusing to 50% of the user population, the probability that one subject will be confused is .5.[4<\/a>]<\/p><\/blockquote>\n In 1990[15<\/a>] and 1992[16<\/a>] Robert Virzi outlined a predicted probability formula 1-(1-p)n<\/sup> where p is the probability of detecting a given problem and n is the sample size. Using the data we have about the Butterfly Ballot<\/a> example we can derive the sample size using Tog’s value of p (.10) of a user having some confusion about the ballot. If we wanted to have a 90% likelihood of detecting one problem we can solve for the number of users needed with the formula:<\/p>\n .90 (likelihood of detection)= 1-(1-.1) n<\/sup><\/p>\n Simplifying the equation:<\/p>\n .90 = 1-(.9) n<\/sup><\/p>\n Then isolating the variable by subtracting 1 from both sides:<\/p>\n .90-1 = -(.9) n<\/sup><\/p>\n Simplifying again<\/p>\n – .10 = – (.9) n<\/sup><\/p>\n The negative signs cancel each other out<\/p>\n .10 = .9 n<\/sup><\/p>\n Solving algebraically for n we multiply both sides of the equation by log.<\/p>\n log(.10) = n(log(.90))<\/p>\n Then divide both sides by log(.90) to isolate n.<\/p>\n n = log(.10) \u00f7 log(.90)<\/p>\n Finally we arrive at our coveted value of 21.85 or 22 users needed to have a 90% likelihood of detecting this problem once.<\/p>\n Virzi’s formula had a slight derivation when it appeared in the Alertbox column [7<\/a>] and with Tom Landauer[6<\/a>] in the Interchi article which is:<\/p>\n Problems found = N(1-(1-L)n<\/sup>)<\/p>\n Where N is the total number of usability problems in the design, L<\/em> is the proportion of usability problems discovered while testing a single user and n is the number of users in a test. Nielsen states that the typical value for L is .31. In the Butterfly Ballot example, however, as stated we know the value of L is .10 for this one problem. This lower value of L <\/em>indicates that this problem is harder to detect than a typical usability problem. Its detection is nonetheless critical in assessing its impact as the subsequent outrage over the election has shown.<\/p>\n Again plugging in the values for the Nielsen and Landauer adjusted formula we get:<\/p>\n 90% (Likelihood of Detection) = 1(1-(1-.1) n<\/sup>)<\/p>\n Where N is the 1 problem we’re looking for and L is the .1 likelihood of detection and 90% is the likelihood that at least one user will detect it.<\/p>\n Simplifying the equation again<\/p>\n .90 = 1(1-(.9) n<\/sup>)<\/p>\n We can drop the 1<\/p>\n .90 = 1-.9 n<\/sup><\/p>\n Subtract 1 from both sides.<\/p>\n .-10 = -.9 n<\/sup><\/p>\n Again the negatives signs cancel each other out and we take the log of each side.<\/p>\n log(.10) = n(log(.9))<\/p>\n Isolating the n<\/p>\n n= log(.10) \u00f7 log(.90)<\/p>\n Again we arrive at 21.85. Rounding up to 22 users we would again say that we have a 90% likelihood of detecting the problem once with 22 users.<\/p>\n If we stopped at only the five users as Nielsen recommends, we would only have a 40% probability of seeing that very important problem.<\/p>\n Likelihood of Detection (unknown) = 1(1-(1-.1) 5<\/sup>)<\/p>\n Tog doesn’t tell us where he got the 1 in 10 chance of a user having some trouble with the ballot. We do have a published empirical evaluation showing between .8% and 1.6% of voters failed to cast their correct vote as derived from a statistical analysis of voting in adjacent Florida counties. [17<\/a>] You can plug in the approximate value of .01 for L<\/em> or p and then get the 225 sample size. Lewis published [2<\/a>] a quick look-up table for identifying the sample size for identifying a problem once and twice:<\/p>\n Table 1:<\/strong> Sample Size Requirements as a Function of Problem Detection Probability and the Cumulative Likelihood of Detecting the Problem at least Once (Twice) Reprinted with permission from the author.<\/em><\/span><\/p>\n