Are Opt-In Online Panels Too Inaccurate?

Jeff Sauro, PhD • Jim Lewis, PhD

feature image with opt in buttonIt’s hard to conduct user research without users. And current and prospective users are often people who sign up to an online panel to get paid for taking studies.

But who are these people who spend time taking surveys, unmoderated usability studies, and product concept evaluations? Can we trust them?

We’ve written before about the quality of online panels. While they certainly have shortcomings, when used correctly, they can be an essential worldwide source of participants.

One important distinction in the published literature concerns the difference between probability and non-probability panels. In theory, everyone in a specified population has an equal chance of being invited to participate in a probability panel through phone or mail solicitation, but not everyone agrees to participate, so after random invitation, there is still some self-selection.

A non-probability panel (sometimes called an opt-in panel) makes no attempt to randomize invitations. Instead, it relies on participants who are interested in opting into the panel, usually by signing up at the panel’s website.

The literature suggests that probability panels are usually better than non-probability panels in matching the percentages of study participants to actual percentages in the population.

Recently, the Pew Research Center conducted an analysis of three probability and three non-probability (opt-in) panels and concluded that the opt-in panels were about half as accurate as the probability-based panels when compared to well-known benchmarks.

But we’ve seen headlines before that grab more attention than the data supports. What was inaccurate? Satisfaction scores, NPS data? How much were the errors—5%, 20%, 30%? We dug into their data to find out.

Summary of the Pew Research

Pew’s criteria for assessing panel accuracy were based solely on matching the response data they received from each panel to 28 known external benchmarks, mostly from the U.S. census and other government data collected in 2021. These benchmarks included variables such as U.S. citizenship, English proficiency, marital status, number of cars owned, adults living at home, and voter turnout. Their sample sizes were large (roughly 5,000 per panel for a total of 29,937), which allowed them to cross-tab on variables like age by ethnicity, which they could also compare to the census benchmarks.

How far off were the samples from the benchmark? Pew reported the errors were on average 6.4%, 6.1%, and 5.0% for the three non-probability opt-in samples compared to 2.3%, 3.0%, and 2.5% for the three probability samples (opt-in mean: 5.8%; probability mean: 2.6%). So that’s where the “half as accurate” headline came from. These are average absolute errors across the 28 variables, so some were smaller (e.g., presidential vote choice), and a few were much higher (percentage receiving food stamps).

What This Means for UX Researchers Who Use Panels

At first, this sounds like bad news for opt-in panels. After all, who wants twice the error? Depending on the opt-in panel, between 11 and 17 of the benchmark estimates had errors greater than 5 percentage points.

But another perspective on the same data suggests that opt-in panels are not that bad, considering their lower cost and availability. A lot of the variables had less than a 5% error, which is quite remarkable given how little control was used to collect participants (and some opt-in errors were smaller than corresponding probability panel errors—for details, see the appendix of the Pew report).

Averaging across the three opt-in panels, the opt-in errors (deviations from the benchmarks, with probability errors shown for comparison) were less than 3% for eight of the 28 variables:

  • U.S. citizenship (opt-in error: 1.1%; probability error: 1.4%)
  • Presidential vote choice (opt-in error: 1.2%; probability error: 1.2%)
  • Voter turnout (opt-in error: 1.3%; probability error: 8.2%)
  • Number of children in the household (opt-in error: 1.3%; probability error: 0.9%)
  • Parental status (opt-in error: 1.8%; probability error: 1.5%)
  • English proficiency (opt-in error: 1.8%; probability error: 1.1%)
  • COVID-19 vaccination status (opt-in error: 2.3%; probability error: 4.9%)
  • Marital status (opt-in error: 2.8%; probability error: 1.3%)

Eight variables had opt-in errors greater than 3% but less than 6%:

  • Housing tenure (opt-in error: 3.8%; probability error: 1.4%)
  • Type of dwelling (opt-in error: 4.1%; probability error: 1.8%)
  • Covered by health insurance (opt-in error: 4.1%; probability error: 2.1%)
  • Has retirement account (opt-in error: 4.2%; probability error: 1.7%)
  • Military service (opt-in error: 5.5%; probability error: 1.8%)
  • Number of adults in the household (opt-in error: 5.5%; probability error: 1.4%)
  • Where one lived a year ago (opt-in error: 5.6%; probability error: 4.1%)
  • Smoking status (opt-in error: 5.6%; probability error: 0.9%)

The following eight variables had opt-in errors greater than 6% but less than 10%:

  • Food allergy (opt-in error: 6.1%; probability error: 3.3%)
  • Union membership (opt-in error: 6.1%; probability error: 3.1%)
  • Number of cars in the household (opt-in error: 6.2%; probability error: 1.8%)
  • Ever diagnosed with high blood pressure (opt-in error: 6.4%; probability error: 2.5%)
  • e-cigarette usage (opt-in error: 6.5%; probability error: 3.1%)
  • Job status last week (opt-in error: 7.6%; probability error: 2.2%)
  • Received worker’s compensation (opt-in error: 9.4%; probability error: 1.6%)
  • Received unemployment compensation (opt-in error: 9.6%; probability error: 4.9%)

Only four of the 28 variables had opt-in errors greater than 10%:

  • Worked last year (opt-in error: 10.8%; probability error: 2.0%)
  • Work affected by COVID-19 (opt-in error: 11.8%; probability error: 1.7%)
  • Received food stamps benefits (opt-in error: 16.1%; probability error: 6.1%)
  • Received Social Security benefits (opt-in error: 15.2%; probability error: 4.6%)

For these 28 items, the opt-in and probability panel errors were not significantly correlated (r(26) = .29, p = .13). The mean difference between opt-in and probability panel errors was 3.2% with a 95% confidence interval ranging from 1.7% to 4.7%. The differences ranged from −6.9% to 10.6% with a median of 2.9% and an interquartile range from 0.9% to 4.7%. The six items with the largest opt-in errors (>9%) were on financial topics.

So, for the social and political research context represented by these items, the additional error for opt-in panels relative to probability panels will usually be no more than 5%.

UX/CX Research Isn’t Focused on Estimating Population Frequencies

Pew’s research is influential, but it’s directed toward public opinion research. Pew’s analysis focused on estimating population characteristics, such as how many people in the U.S. own a car, vape, or vote. While having accuracy to within 1–3% may be important for sizing a market, understanding the anticipated impact of a proposed law, or predicting an election, rarely do we find UX or CX researchers trying to estimate who will vote or who is receiving government assistance. Those certainly may be used to screen and qualify participants, but that’s far from the point of the study.

For example, it’s common in UX research to find participants who own a car or use Medicare or Medicaid and then invite them to participate in a study to understand how service interfaces address their needs. UX/CX research focuses on what people think about experiences. This Pew research didn’t address sentiments or even popular product ownership (e.g., how many people own an iPad), likely because these are commercial rather than social topics and there are no census-quality benchmarks with which to assess errors.

UX Metrics Are Usually Not Affected by Common Demographic Variables

Research on UX research has shown that there is little systematic relationship between commonly collected demographics and the most used measure of perceived usability, the System Usability Scale (SUS). It’s common in market research to account for differences in gender, age, and geographic location. Of six studies that investigated the effect of gender, five found no significant effect (Bangor et al., 2008; Berkman & Karahoca, 2016; Kortum & Bangor, 2013; Kortum & Sorber, 2015; Tossell et al., 2012). Two studies examined the effect of age on SUS scores, with both reporting no significant difference (Bangor et al., 2008; Berkman & Karahoca, 2016).

Regarding the effects of geography, Kortum and Acemyan (2018) collected SUS ratings of 11 popular products from 3,168 residents of the United States recruited through Amazon Mechanical Turk. Analysis of results as a function of geographic region (based on participant self-report of state of residence paired with U.S. Census Bureau geographic and population density divisions) found little variation in SUS means. For the nine U.S. Census Bureau geographic divisions, the mean SUS ranged from 73.1 to 74.7, a nonsignificant difference of 1.6. Differences as a function of population density (rural, urban cluster, and urban) were also nonsignificant, even with the very large sample size.

This doesn’t mean these demographic variables never matter when understanding your users or prospects, but UX/CX research is almost always centered around product usage. That is, first use an opt-in panel to recruit participants who report using a product (e.g., Alexa, Roku, or a Dell laptop), and then study them to understand their attitudes toward their product experiences. Also, be sure to collect data about participants’ domain and product experience because these variables are known to influence product attitudes and behavioral intentions.

If you’re looking for precise estimates of common U.S. demographic variables, then you should use a probability panel, sample against quotas, and/or use a weighting scheme. If you can tolerate more approximate demographic estimates because they are secondary to your research focus, then an opt-in non-probability sample may be suitable.

You Don’t Always Want Your Sample to Match the U.S. Population

In April and May 2021, 462 respondents (U.S. residents) participated in a retrospective study in which we asked people who had used at least one mass merchant website in the past year to reflect on their experiences. In that study, 69% of the respondents were women, which differs significantly from the U.S. census estimate of 50%. But that doesn’t mean we want to weight our sample to match that census benchmark, because many estimates indicate that 72% of online shoppers are women. Don’t confuse census benchmarks with product benchmarks.

Don’t Forget About Coverage and Sampling Errors

The Pew sample didn’t mention the very real issue of the error you get from sampling part of the population. Even relatively large sample sizes of 100–300 will have margins of error of +/− 3% to 5%, which are as large or larger than the potential coverage error due to opt-in versus probability sampling.

We talk about coverage and sampling errors as two of the four horsemen of survey errors in Surveying the User Experience.

Estimating Bias and Weighting

Pew’s analysis shows that there is variability in accuracy depending on the panel you use. But it’s hard to know how accurate your results will be (Pew didn’t reveal the panels they used). If you need a precise estimate, you can estimate the accuracy of the panel by first comparing the frequency of responses to government data benchmarks such as the percentage of households with income above $100k (23%)—the same strategy Pew used in their research. You can then weigh the responses to correct for the under or overestimates. We’ll cover different weighting schemes such as raking in a future article.

Bad Actors (Not the Ones on Daytime TV)

The Pew analysis found more evidence of yea-saying/acquiescence bias in opt-in than probability panels. This confirms our experience with working with many online panels—participants will over-select qualification criteria to items in the hopes of getting admitted to a study.

If you have ever participated in a survey for an opt-in panel, you may be familiar with this experience: you are invited to participate, but after answering dozens of questions you are told you don’t qualify (and therefore receive no compensation). Rinse and repeat a few times and you get the sort of acquiescence inflation Pew reported.

For example, Pew found that respondents claiming to be Hispanic were disproportionately more likely to provide what they deemed bogus responses. This may be an example of participants hoping to get selected for the survey.

In well-designed UX research, there are strategies for detecting bad actors, which do not appear to have been applied in the Pew study. For example:

  • Add consistency checks. One remedy is using some simple consistency checks in your data to catch potential cheaters. For example, the TAC-10 is a select-all-that-apply (SATA) item listing technical activities that range from very easy to very difficult, so some response patterns are unlikely enough to flag a potential bad actor (e.g., it’s virtually impossible for someone to know how to program in C but be unable to install a new app on their phone). The alternating items of the SUS offer another opportunity to identify unlikely patterns of response options.
  • Check survey completion times. Respondents who complete a survey in an impossibly short time should be flagged as potential bad actors.
  • Check verbatim responses to open-ended questions. When a respondent repeatedly types gibberish, uninformative terse text, or irrelevant answers to open-ended questions, consider flagging this as a potential bad actor.

AI note: The Pew data was collected before the proliferation of ChatGPT and other AI tools that may now be finding their way into online panels (for answering open-ended questions in particular). Even so, many of the strategies used to detect bad human actors work well (for now) to detect AI “respondents.” We continue to monitor this issue and may investigate it in future research.

Are Opt-In Online Panels Too Inaccurate for UX Research?

Opt-in panels were clearly less accurate than probability panels for matching the social and political benchmarks selected for the Pew study, but they often were only a little less accurate. Our analysis of the Pew data for the social and political research context represented by these items suggests that the additional error for opt-in panels relative to probability panels will usually be no more than 5% (based on the confidence interval and interquartile range of the differences).

This is consistent with our earlier conclusion that in practical UX research, you should buffer your estimates from all panels, especially opt-in ones, by 5–10%. For example, suppose a critical product decision depends on the percentage of product users who are 18–30 years old, and the estimate is 30%. Consider how you would make a different decision if it were 20% or 40%. If you (or your stakeholders) would make the same decision for 20% and 40%, then the estimate is accurate enough. If the decision would be different for 20% versus 40%, then it would probably be better to use a probability panel.

It’s rare, however, for critical UX research questions to need to be matched to census-type benchmarks. Any demographic targets regarding age, gender, or the types of items included in the Pew study can be addressed with quotas. Professional UX researchers develop surveys that enable the detection of most types of bad actors.

Bottom line: Opt-in panels are accurate enough for most UX research.

You might also be interested in
    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top