Online panel research has been a boon for market research and UX researchers alike.

Since the late 1990s in fact, online panels have provided a cost effective pool of sample participants ready to take online studies, covering topics from soup to nuts (literally)…from apple juice to zippers.

But can you trust the data you get from these online participants? In this article, I’ll examine the accuracy of estimates made from online panels.

To understand how to assess the accuracy, it’s important to understand the differences in the types of panels and how it affects the accuracy of your estimates.

Non-Probability Panels vs. Probability Panels

Non-probability panels obtain their members using online ads, snowball sampling, river sampling, and direct enrollments; they also don’t sample proportionally from the general population.

Probability panels in contrast, as its name suggests, ensure that every member of a population (often an entire country) has at least some chance of being selected to respond to a study. Probability panel companies have measures in place to ensure some level of representativeness, often for hard-to-reach populations.

The majority of UX research is conducted using non-probability panels. There are a lot of them and their costs are a fraction of the probability panels. In our experience, the most noticeable drawback about non-probability panels is the number of poor quality responses, through cheaters or speeders. Fortunately, for the most part, poor responses can be weeded out through proper screening and cheater detection.

But even after quality controls, the major concern about using non-probability panels is one of representativeness. For example, when you want to estimate how likely consumers are to purchase a product, register for a service, or provide their attitudes toward the usability of a product, you hope that the estimates are accurate and representative.

Sampling Error vs. Unrepresentativeness

Whenever you don’t measure the entire population, you have sampling error. Consequently your estimates differ by some amount from the true average (such as the average likelihood to recommend). Sampling error, as bad as it sounds, is measurable through the use of the central limit theorem and  confidence intervals. Confidence intervals work well when you sample from the right people (this is the definition of representativeness). Confidence intervals aren’t accurate, for example, when making statements about the U.S. population if you only sample readers of one newspaper.

Estimating the error from samples of unrepresentative (or unknown representatives) is harder to do and is somewhat controversial because there’s less theoretical foundation for using inferential statistics.

Much of the research that is conducted (including the research we conduct for clients) using non-probability samples is concerned less with estimating a single number (called a point estimate), but is rather concerned about the differences in attitudes or behaviors when exposed to different stimuli (such as tasks or websites). However, there are often cases where we need to estimate a single number for a population (such as intent to purchase or likelihood to recommend) using a non-probability panel. When this is needed, it’s good to understand how accurate or inaccurate the estimates are.

The Value of External Benchmarks

One of the best ways to assess accuracy is to compare the responses from an online panel to known external benchmarks. For example, if we ask how many people own a home in the U.S. in an online survey, we can compare that to the known percentage (which often comes from public records).

A few studies have examined how accurate estimates from non-probability samples are relative to benchmarks and many of these are summarized in a chapter in Online Panel Research. Here are some of the relevant findings.

Smokers: 18% of the U.S. population smoked (using national health data at the time of the studies), but a comparison of the estimates from 17 online panels showed they all tended to overestimate the actual percentage of smokers, with a low of 19% to a high of 33%. Probability panels did better than the non-probability panels, with an average absolute error of 2% for probability samples and 10% for non-probability panels.

Newspaper Readership: Four non-probability panels in a Canadian study from 2006 overestimated newspaper readership (online and print), but there was no pattern to convert the scores to match the benchmarks. A follow-up study in 2012 using a probability panel was closer to the benchmark figures but suggested “media-philes” were more likely to respond to online surveys. A telephone survey generated more accurate results than both the non-probability and probability samples, even when they were weighted to adjust for under representativeness.

Demographics: Across dozens of variables including: work status, number of bedrooms in a house, number of vehicles owned, having a passport, drinking, having a landline or cellphone, and party voted for in last election, the average error in estimates from these panels compared to benchmarks ranged from 3% to 9%. In all variables, probability panels and telephone surveys are closer to benchmarks than non-probability samples.

Another study by Pew examined differences in estimating 20 demographic benchmarks from nine non-probability panels. Items included having a driver’s license, current address tenure, marriage status, and healthcare coverage. On average the difference between the benchmarks and panel estimates varied between 6% and 10%.

U.S. Presidential Elections: Non-probability online panels actually did quite well in predicting the outcome of the 2012 presidential election, better in fact than many probability samples. Early data suggests online panels had a similar level of accuracy as probability based panels and telephone surveys in estimating the 2016 popular vote.

Brand Awareness: In a 2006 Dutch study using brand awareness measures (for which external benchmarks are not available), the differences in responses varied by 12 to 18 points across panels for brands including Mazda, T-Mobile, and Volkswagen.

Likelihood to Purchase: The Advertising Research Foundation (ARF) found the likelihood to purchase a soup varied between 34% and 51% for the online sample, compared to 32% from a mail-in sample and 36% for a phone sample. For interest in purchasing a type of paint, estimates varied from 37% to 62% between panels. Both measures did not have external benchmarks, but showed the typical variation for important measures in customer and market research.

Themes from Online Panels

Here is a summary of the themes that emerge from the accuracy of point estimates from online panels.

Estimates vary
. In some cases, estimates vary quite substantially from each other and from known external benchmarks. It isn’t uncommon for point estimates for metrics like intent to purchase and brand awareness to vary by 15%-18% percentage points.

Probability samples are more accurate. As expected, probability samples, while rare, tend to (but not always) perform better than non-probability samples.  And they tend to have estimates that are closer to external benchmarks.

Phone and mail are often more accurate
. While the studies are somewhat dated now, phone and mail-in surveys perform better than non-probability and probability online panels.

Don’t interchange panels. A clear message from the authors who have studied differences between panels is not to change panels when making comparisons. When tracking data over time, such as likelihood to recommend a product or brand attitudes, don’t change panels. Differences between panels can in many cases exceed real differences in the population.

Weighting techniques help, some. Weighting schemes, including propensity scoring, tend to improve the accuracy of estimates but they require accounting for more than simple demographic weighting, and even then aren’t always more accurate than non-probability samples or other samples (such as phone and mail-in surveys).

Much of the research on the accuracy of point estimates from non-probability panels is on general demographics and psychographics. In an upcoming article I’ll report the results of four years of our research into how much UX measures differ between non-probability online panels and implications for researchers.