Usability tests are conducted on samples of users taken from a larger user population. In usability testing it is hard enough to recruit and test users let alone select them randomly from the larger user population. Samples in usability studies are almost always convenience samples. That is, we rely on volunteers to participate in our test (a convenience to us). Volunteers, even paid volunteers, are self-selected–they decided to participate.
Why does this matter? Well, every statistical test, including confidence intervals, has the underlying assumption that users were sampled randomly from the population. In surveys of the American electorate a lot of money is spent reducing self-selection and other biases. If you do the right statistics on the wrong people you’ll think someone else is going to be president. So does convenience sampling in usability tests mean we cannot use statistics to make better decisions?
It turns out a similar problem is experienced in clinical trials and other medical experiments. For example, it is usually not possible to identify everyone who has bladder cancer and select a random sample to test the effectiveness of a new drug in an experiment. Statistical findings are only valid to the extent the people in the studies match those in the larger population. The results of any single study are only tentative (despite the headlines) until the same findings are replicated by different researchers using different people over time.
Fortunately most usability issues don’t involve life or death decisions. We should nonetheless be aware of the bias in our samples. The users who participate in usability studies are those who both have time and don’t mind participating. There is almost always another group of users who either don’t have time or have time but don’t want to participate.
|Figure 1: This user profile matrix shows that the bulk of usability test participants are going to have more availability and a higher inclination to participate in a test (quadrant 1). Users who don’t have an inclination to test or free time (quadrant 4) may be different than users in quadrant 1 and provide hidden opportunities for usability improvement.|
Is there something about the users who aren’t participating that make them different enough than those who are? How would you know? It’s often difficult to know how many or who these users are and how they might use the product differently that your test participants. To find these users try:
- Company survey data
- Customer support call logs
- Hiring a third party who won’t make the same assumptions as you and can get more honest feedback and test results
In applied settings we rarely have the luxury of suspending our judgment until another team member can replicate our findings. This does not preclude the use of statistics in our analysis. It just means we need to understand who might be underrepresented in the studies. The good news is that there are usually enough usability problems that if addressed would please every user. Just don’t get too confident if you’ve run out of usability issues to improve. The problems are likely out there; you’re just not talking to the right users to find them.