It’s that time of year again: March Madness.
The Madness in March comes from the NCAA College basketball tournament, with unanticipated winners and losers with dozens of games packed into the final days of March.
It’s also the time of year where a lot of people start working directly with probability, whether they know it or not. Individuals and groups of colleagues around the US will put together their best guess to who will be the “Final Four” of the 64 contending teams and ultimately who will win the tournament.
What does this have to do with usability testing? Well, every time researchers conduct a usability test to uncover problems they’re also working with probabilities, even if they tell you they hate math! To understand the role of probabilities in usability testing it helps to see how they are used when picking winning teams for the tournament.
The sport of picking the most winners in each divisional bracket of the tournament (“bracketology”) has itself become as interesting as the outcome of the games.
Interest this year was heightened when billionaire investor Warren Buffett pledged $1 billion dollars to anyone who successfully predicts the outcome of ALL 63 games in the tournament. You might wonder, what are the odds and why would someone, even one of the richest people in the world, risk losing a billion dollars? After all, like a lottery, won’t someone win?
Finding out your chances of winning will help with finding the right sample size for a usability test.
Computing The Odds
Let’s start with making the correct decision for just a single hypothetical game, say between Stanford and Virginia. There are only two possible outcomes, Stanford wins or Virginia wins. Your chances of guessing correctly with no knowledge of basketball or the teams are 50/50. It’s like tossing a coin. or .5 or 50% when expressed as a percentage.
Now, what are the chances of correctly guessing TWO games right? It’s just .52 or .25. For getting three games correct it’s 12.5% and for four it’s 6%. The chances of randomly guessing just 10 of the 63 games right is slightly less than .1%. In other words, from random guessing alone, there’s a 99.9% chance you’ll get one of 10 wrong.
You can see that the numbers rapidly start working against your favor. In fact, the chances of picking all 63 games correctly (a perfect bracket) is .563 or a 1 in 9.2 quintillion chance (1/9,223,372,036,854,750,000).
Guessing vs. Experience
Now you might think that putting together a tournament bracket involves more than just flipping a coin. Participants certainly use knowledge of teams’ performance and some historical data. Each team has a ranking from 1 to 16, called a seed, based on how well they did in the regular season.
Teams with higher seeds are more likely to win. In fact, every #1 seeded team (which gets paired with the lowest seeded #16 team) has won its first game in the tournament 116 times over the last 29 years. Some have put the odds of winning with more “expert” knowledge at 1 in 128 billion or even as low as 1 in 7 billion, as Nate Silver calculated.
So picking the higher seeded teams seems to be a strategy that would improve your odds. The trouble is, there are plenty of examples of highly seeded teams getting beaten by lower seeded teams. For example, last week, the perennial tournament performer, Duke, seeded #3 lost to #14 seeded Mercer in a major upset.
Such unpredictable “upsets” are common in one-game playoffs (hence the madness). Even after taking into account additional variables, Nate’s performance after the first round was 25 out of 32. Not bad, but that’s the same performance as our associate who used the “I like this school’s colors” along with “My dad went to that school” strategy.
It’s likely the case that the actual odds of winning are better than 63 coin flips, but many more sophisticated calculations are based on assumptions, many of which turn out to be wrong. To play things safe, you assume a worse-case scenario and work from there. A similar strategy works when planning small sample sizes in usability tests.
What are the chances?
So, how many brackets would need to be submitted to have a decent chance of winning Buffett’s billion? It’s not just 9.2 quintillion, you actually need more than the probability itself to have a good chance of the event occurring. To find out we work backwards from the binomial probability formula using logarithms (who doesn’t love logarithms?). The formula is :
So to have an 85% chance of seeing a coin flipped tails once, you should plan on flipping it three times—85% of those 3 flips you’ll see a tail at least one time.
log (1-.85)/log(1-.50) = 2.74
So using the same formula, to have an 85% chance of getting a perfect bracket, you’d need about 17 quintillion submissions:
That would mean more than 2.4 billion brackets for every person on earth. Buffett has limited the number of submissions to 15 million. This is his sample size. To find the odds of an event occurring given a sample size we just rework the formula again.
With just 15 million submissions, chance alone would predict a 1 in 615 billion chance of a winner:
A typical power ball lottery has odds of winning in the 1 in 175 million range. In other words, your odds of winning the power ball are about 3,514 times better that getting Buffett’s billion.
Even using the most generous 1 in 7 billion odds, we would still need on average about 13 billion submissions to have a decent chance of winning.
log (1-.85)/log(1- 0.00000000014) = 13,279,837,320
Sample Sizes for Discovering Problems in Usability Tests
The same formula that tells us there’s an infinitesimal chance of getting a perfect bracket also tells us the chances we’ll observe a usability problem given an estimate of how common a problem is. For example, given usability problems that affect 31% of users (a relatively common problem), you should plan on testing with 5 users to have an 85% chance of seeing the problem at least one time :
To observe problems that are less frequent (affecting 10% of the entire user population), you should plan on a usability test with 18 users
log (1-.85)/log(1- .10) = 18.0
It’s often that we have our sample size dictated for us. If you have two days to test before the next round of changes, it’ll be difficult to test more than 10 users total (five per day). Given a sample size of 10, the probability of observing problems that affect 10% of users is 65%:
1-((1-.1)10) = .65
The table below shows the comparison between the tournament calculations and sample sizes for uncovering problems in usability tests.
|NCAA Perfect Bracket||Usability Test|
|Probability of Occurrence||Getting all 63 Games Right||1 in 9.2 quintillion||User Encountering a less common Interface Problem||.10|
|Sample Size Needed||
To have an 85% chance of having at least 1 winner
To have an 85% chance of seeing a problem at least once
|Chance of Happening||
Probability of getting a winner with 15 million submissions
1 in 615 billion
Probability of observing a problem that affects 10% of users if you only test 10 users.
This year it took only 21 games to end all chances of a perfect bracket, which was a few games less than the last couple seasons which took 23 and 24 games in a similar contest. If you’re looking to win a billion dollars, trying your hand at perfect brackets is probably not the way to go.
The collective effort of finding and fixing usability problems in interfaces however, should reduce errors, increase efficiency and likely lead to an improvement in productivity which can be measured in the billions of dollars—but that’s a subject for another article.