Confidence Interval Calculator for a Completion Rate
ExplanationThe Adjusted Wald method should be used almost all the time. For exceptions, see below.
For a detailed discussion of binomial confidence intervals with small samples, see the HFES and for a discussion on the best point estimate see the JUS paper.
Adjusted Wald MethodThe adjusted Wald interval (also called the modified Wald interval), provides the best coverage for the specified interval when samples are less than about 150. In other words, if you want a 95% confidence interval then this formula will produce an interval that will contain the observed proportion on AVERAGE about 95 percent of the time. It uses the Wald Formula but is "adjusted" in that it adds half of the squared Z-critical value to the numerator and the entire squared critical value to the denominator before computing the interval i.e (x+z2/2)/(n+z2). For example, a 95% confidence level uses the Z-critical value of 1.96 or approximately 2. If you observe 9 out of 10 users completing a task, this formula computes the proportion as( 9 + (1.962/2) )/ (10 + (1.962)) = approx. 11/14 and builds the interval using the Wald formula. Note: Prior to March 1st 2006, this calculator computed this interval by adding one z-value to the numerator and a squared z-value to the denominator.
Exact MethodThe Exact method was designed to guarantee at least 95% coverage, whereas the approximate methods (adjusted Wald and Score) provide an average coverage of 95% only in the long run. Use the Exact method when you need to be sure you are calculating a 95% or greater interval - erring on the conservative side. For example, at the population completion rate of 97.8% both the Score and adjusted Wald methods had actual coverage that fell to 89%. When the risk of this level of actual coverage is inappropriate for an application, then the Exact method provides the necessary precision. NOTE: We have determined that there is an issue with our code for computing this interval when sample sizes are large (n>2000). Fortunately, when sample sizes are this large all methods converge on the same interval, so for large samples, use one of the other methods.
Score MethodThe Score method provided coverage better than the Exact and Wald methods but falls short of the adjusted Wald method. Additionally, its drawback is its computational difficulty and its poor coverage for some values when the population completion rate is around 98% or 2%, regardless of sample size (Agresti and Coull, 1998). The only advantage in using the Score method is that it provides more precise endpoints when the ends of the intervals are close to 0 or 1. For some values (e.g. 9/10) the adjusted Wald's crude intervals go beyond 0 and 1 and a substitution of >.999 is used. For the score method, the upper interval is .9975.
Wald MethodThe Wald method should be avoided if calculating confidence intervals for completion rates with sample sizes less than 100. Its coverage is too far from the nominal level to provide a reliable estimate of the population completion rate. As the sample size increases above 100, all four methods converge to similar intervals. Use the Wald as a point of reference or for larger sample sizes.
* The "Margin of Error" values are half the width of the Confidence Intervals. For the adjusted wald and wald formulas, you can use the proportion +/- the confidence interval. For the exact method, the intervals are not symmetrical as the proportion complete gets further from 50% (e.g. 90% or 15%). Therefore the margin of error should be only used at as an approximation for the exact method and the actual values above and below the proportion should be reported.
When All Users Pass or FailWith small sample sizes, it is a common occurrence that all users in the sample will complete a task (100% completion rate) or all will fail the task (0% completion rate). For these scenarios, it is often unpalatable to report 100% or 0%. After all, how likely is it that the true population parameter is as extreme as 100% or 0%? The Best Estimate box provides the best point estimate under these conditions and uses the LaPlace method for calculation. While this value may seem too far from the observed 100%, its attractiveness is that it is a function of the sample size-- the greater the sample size, the closer this value will be to 100%.
Calculation Note: When the observed completion rate is 100% or 0% there cannot be a two sided confidence interval (since you cannot have more than 100% or less than 0%). In these cases it is necessary to use a z-critical value for a one-sided confidence interval. For example, a 95% two sided confidence interval uses the z-score of approximately 1.96, a one sided interval uses a z-score of approximately 1.64.
Likely Population Completion RateThe two options in this drop-down:
Between .5 and 1
If you conduct usability tests in which your task completion rates are roughly restricted to the range of .5 to 1.0, then select "Between .5 and 1" in the drop-down. See the Best Estimates section below for how the point estimate is calculated with this option. Unknown
If your task completion rates typically take a wide range of values, uniformly distributed between 0 and 1, then select "Unknown" from the drop down. If you don't know either way then leave it at "Unknown." This selection will use the LaPlace method for the best estimate of the completion rate.
Point EstimatesWhereas a confidence interval describes a likely range or interval of values, a point estimate describes a single value- a point as an estimate of an unknown parameter in the population. The chance that the sample point estimate is the same as the unknown population completion rate is extremely unlikely. For that reason, you should always compute a confidence interval when reporting a completion rate. It is much more informative than a point estimate since it provides a reasonably likely boundary for the population completion rate.
Although it receives little attention in introductory statistics classes and has had little influence on measurement practices in the field of usability engineering, there is a rich history of alternative methods developed to achieve a more accurate point estimate of p than simply dividing the number of successes by the number of attempts (for example, see Chew, 1971; Laplace, 1812; Manning & Schutze, 1999). This need is most evident when there is an extreme outcome, specifically, when x=0 (0%) or x=n (100%) - especially, but not exclusively, when sample sizes are small. Four estimation methods that pertain to situations more common in usability testing are detailed below:
MLE:(Maximum Likelihood Estimate)(x / n)The MLE is the sample proportion or the number of users succeeding divided by the total attempting. It is the most common point estimate reported.
LaPlace (x+1)/(n+2)A famous large-sample problem comes from the seminal work of Laplace in the early 1800s. He posed the question of how certain you can be that the sun will rise tomorrow, given that you know that it has risen every day for the past 5000 years (1,825,000 days). You can be pretty sure that it will rise, but you can't be absolutely sure. The sun might explode, or a large asteroid might smash the Earth into pieces. In response to this question, he proposed the Laplace Law of Succession, which is to add one to the numerator and two to the denominator ((x+1)/(n+2)). Applying this procedure, you'd be 99.999945% sure that the sun will rise tomorrow - close to 100%, but slightly backed away from that extreme. The magnitude of the adjustment is greater when sample sizes are small. For example, if you observe two out of two successes and apply the LaPlace procedure, then your estimate of p is 75% (x+1=3, n+2=4, p=3/4) rather than 100%. If you had observed two failures, then your estimate of p is 25% (x+1=1, n+2=4, p=1/4) rather than 0%. LaPlace in essence is saying, the next result is a toss up, so give each alternative an equally likely chance of occurring.
Wilson (x+z2/2)/(n+z2)Wilson's point estimate is the midpoint of the adjusted wald interval. It is derived by adding half a squared critical value to the numerator and a squared critical value to the denominator. Wilson's is the more conservative approach.
Jeffreys (x+.5)/(n+1)Jeffreys (1961) provided a compromise between the LaPlace and MLE methods. See reference for technical details.
Best EstimateThe best point estimate is calculated using the following logic: If "Unknown" is selected from the Likely Population Completion Rate drop-down, the LaPlace method is used. The smaller your sample size and the farther your initial estimate of p is from .5, the greater the benefit over the MLE.
If "Between .5 and 1" is selected from the Likely Population Completion Rate drop-down and the observed completion rate is:
- Less than or equal to .5: the Wilson method is used.
- Between .5 and .9: the MLE is used.
- Greater than .9: the LaPlace method is used (Note, if 1 > x > .9 the Jefferys method is also a viable alternative).
- Agresti, A., and Coull, B. (1998). Approximate is better than 'exact' for interval estimation of binomial proportions. The American Statistician, 52, 119-126.
- Chew, V. (1971). Point estimation of the parameter of the binomial distribution. The American Statistician, 25, 47-50.
- Jeffreys, H (1961) Theory of Probability (3rd Ed), Clarendon Press, Oxford pp. 179-192.
- Laplace, P. S. (1812). Theorie analytique des probabilitites. Paris, France: Courcier.
- Lewis, J.R. & Sauro, J. (2006) "When 100% Really Isn't 100%: Improving the Accuracy of Small-Sample Estimates of Completion Rates" in Journal of Usability Studies Issue 3, Vol. 1, May 2006, pp. 136-150
- Manning, C. D., & Schutze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
- Sauro, J & Lewis, J R (2005) " Estimating Completion Rates from Small Samples using Binomial Confidence Intervals: Comparisons and Recommendations" in Proceedings of the Human Factors and Ergonomics Society Annual Meeting (HFES 2005) Orlando, FL