We’re in a new golden age of research.
It’s easier and cheaper than ever to collect a lot of data quickly, thanks to the proliferation of paid online panels and customer email lists.
But just because it’s easier to collect data, doesn’t mean it’s easier to generate high-quality findings.
A lot of work goes into planning and executing user research. Once you get past the effort involved in scoping, creating, and hosting an online study, you then have to be concerned with the quality of the results. In the brave new world of paid panel research studies, participants have a monetary incentive to make it through your long and laborious surveys quickly.
A number of factors can affect the quality of the responses and the insights you derive. To avoid the many ways in which a study can go wrong, consider these 10 threats:
In most cases, people participate in your online research voluntarily (even if they get paid). Depending on the type of study, the attitudes of the people who don’t participate can be as important as the attitudes from those who do. For example, if customers are irritated because your product broke in the first week of use, that irritation may keep them from answering your product-satisfaction survey.
This participation bias is inevitable, so you should always consider reasons why certain people do or don’t respond. Often, late responders can serve as a proxy for nonresponders, as some research shows that late responders respond the way nonresponders would have responded.
Nonresponders are largely responsible for the next two types of threats to online research.
2. Unrepresentative Samples
You want your sample of participants to reflect the population of customers you’re interested in making decisions about, but the people you can contact and the people who agree to participate aren’t necessarily the same people to whom you’re generalizing your results. In many cases, it’s just too expensive or too difficult to collect data from your current or prospective customers. For example, physicians, attorneys, or those less technically inclined can be difficult to recruit.
You’ll want to have as representative a sample as possible (at least on key variables). But don’t let some nonrepresentativeness prevent you from drawing conclusions–especially when you’re making comparisons between products.
The behavioral and medical sciences have this threat too. It’s an ongoing joke in the research community that we know more about upper-middle class 19-year-olds than any other population–because this demographic is required to participate in Psychology-class studies that fill our peer-reviewed journals. Yet we have many important findings from an essentially non-representative population!
3. Disproportionate Samples
Inevitably, you get some of the right participants, but maybe your proportions differ from the customer base. For example, you may know that 60% of customers have used your product for at least 10 years, but only 20% of that experience level participated in your study. While you have representativeness, it’s disproportionate. This lack of balance can lead to the wrong conclusions. To adjust, you can weight your responses to rebalance your sample data to better match the population composition on key variables.
Speeders are participants who race through a study and barely consider the questions. You can usually catch speeders by establishing a minimum amount of time for each task or study and then screening out anyone who spends less than that amount of time. Don’t be too rigid in your cut-off time though, as some participants can legitimately complete tasks or questions quite quickly.
You can also identify speeders by examining open-ended questions for gibberish and for terse and otherwise unhelpful answers. Once you weed out the low-quality responses, you’ll often find that most fast participants provide results that match those of participants who take longer to complete a study.
Speeding isn’t the only way to game the system. Some participants—cheaters—respond haphazardly regardless of how long they take. While you can’t completely eliminate cheaters, you can usually detect them and remove them from your data. We almost always include a cheater question (such as “pick a 3” on the five-point scale) to catch the most egregious cheaters.
Also, examine the open-ended questions to look for marginally helpful responses or gibberish. We find that around 3% to 20% of studies contain participants who don’t provide conscientious responses.
A more difficult kind of marginal respondent type to detect is the satisficer. These participants generally agree to most questions and provide overly positive feedback. While you may have an enthusiastic customer base, it is difficult to differentiate between positive responses and participants who just want to tell you what you want to hear (often to get their honorarium and be done with the study).
Be careful, though, when weeding out satisficers. One impetus behind alternating the tone of items in questionnaires from positive to negative is to detect such acquiescent responders. However, such alternating often creates more harm than good [PDF] since both respondents and researchers make mistakes in responding to and recoding responses.
We find that using multiple recruiting sources helps us detect and mitigate the effects of satisficing. We often complement the responses from internal customers, for example, with a high-quality panel of anonymous participants who don’t necessarily have a positive relationship with a company.
7. Early Responders
The participants who take your study immediately may respond differently from those who require more encouragement. Although we found inconsistent and only marginal differences between early and late responders in the studies we examined, some studies have found that early responders tend to express more positive and agreeable attitudes than later responders. This can make a difference if you make decisions based on preliminary results or if you have to stop a study early, so consider whether your early responders may differ from the later ones.
8. Late Responders
Participants who need reminders to take a study or who are the laggards to respond also may respond differently from earlier responders. We didn’t see major differences among the last participants to respond to a number of survey and usability studies we examined, but differences do crop up. Other research finds that later responders are less positive or have less-positive attributes (for example, being less heathy or happy) than early responders.
9. Sequence Effects
It matters not only whom you ask and how you ask–it also matters when you ask. Specifically, the order in which you ask participants to respond to items or to attempt tasks matters. A sequence effect is the unintended carryover effect that a question (or task) has on subsequent questions or tasks.
For example, you can’t accurately assess top-of-mind brand awareness after you ask participants to rate their satisfaction with certain brands you list—since now these brands are top-of-mind! Sequence effects aren’t always this obvious; they can take the form of fatigue (participants respond less conscientiously on later questions because they are tired), or they can be primed by earlier responses. Randomizing the order of questions and response options often mitigates sequence effects.
10. Impact of Tasks on Attitudes
A specific type of sequence effect that happens when we mix measures of attitude and behavior occurs when participants spend time with a website and then answer attitude questions. For example, asking participants to spend 10 minutes on a website attempting tasks will likely affect their attitude toward the brand and usability of the site.
There’s nothing wrong with mixing behavioral tasks and attitudinal questions in the same study, but be aware of the effect of one activity on the other. We find that asking attitude questions before and after task experiences produces the best results (and provides a measure of lift).