Online surveys are a critical tool used by many companies to make decisions about customers’ attitudes.

While we may categorize customers according to the timing of their purchases–identifying them, for example, as “early adopters,” “first movers,” and “laggards,” should we also categorize survey respondents in a similar way?

In other words, does how quickly a response comes in matter?

Do first responders’ survey results differ from those of late responders?

What about the last people to respond—do response laggards have a different pattern of responses from earlier responders? If you need to end a survey early, and if you have to make decisions based on partial results, would your findings likely have been different if the survey had run its course?

While it may be hard to remember, surveys predate the era of real-time, web-based data collection. Surveys used to be collected in the mail or completed in person or over the phone. So the question of the effects of response timing has not surprisingly also been around for a while. Fortunately, we can start with the research that has already been done. Here’s a sampling:

Slightly More Favorable Early Responses

  • A few decades ago, Kathy Green (1991) examined the differences in attitudes between early- and late-responding teachers to a mail-in survey. She found that the earliest responders had slightly more favorable attitudes than the later responders (those who responded only to follow-up reminders and interviews).
  • A mailed health survey from 1993 found that early responders tended to be healthier and visited their doctors more regularly than later responders. Late responders tended to rate their mental and physical state worse.
  • A more recent survey of patient medication and health beliefs found that late responders reported less perceived need, more medication concerns, less prescription-medication knowledge, and less trust in their prescribing physicians (Gadkari, et al., 2011).
  • In an examination of 42 hospital-patient satisfaction surveys, Yessis et al. (2006) found that first responders provided more positive answers than late responders. The researchers also found that late responders differed on certain demographic variables (ethnicity, health-status and education).
  • Work by Curtin et al. (2000) found a small (almost negligible) difference between early and late responders to decades of surveys on consumer sentiment[pdf]. In this case, late responders were those who required more follow-up phone calls to complete a survey.

No Difference in Responses

While this review of the literature isn’t exhaustive, it shows that, in some cases, there are modest differences between early and late responders. When differences are detected, they tend toward more favorable responses for the early responders and more negative responses for late responders.

Our Data

To see how well these findings apply to user research data, we examined seven datasets from some recent unmoderated usability studies and surveys we conducted. The unmoderated usability studies included a mix of task-based questions and study-based questions. Surveys included only study-based questions. The study questions ranged from brand favorability to satisfaction; the task-based questions were on task ease and confidence.

Participants in all studies were obtained from paid panel respondents of various ages and both genders. These studies focused on the websites of Internet retailers, software manufacturers, and a large industrial company. The median sample size for the studies was 403, with a range of 192 to 829. Studies took between one day and two weeks to complete.

We found no consistent definition for how to define early and late responders. We separated our datasets into three groups.

  1. Early responders: the first 10% of participants to complete the study.
  2. Late responders: the last 10% of participants to complete the study.
  3. The responders outside the 10% being evaluated: To look for patterns, we compared either the first 10% to the last 90% or the last 10% to the first 90% of responders.

For example, if there were 192 participants, first responders were the first 19 participants to complete the study, the late responders were the last 19, and we compared each group of 19 with the other 173.

Results

We found few significant differences across study and task comparisons. Table 1 shows the number of statistically significant differences we found at the study and task level when comparing first and last responders.

Study
(36 Comparisons)
Task
(28 Comparisons)
First vs Other 1 7*
Last vs First 1 0
Last vs Other 3 5*

Table 1: Number of statistically significant differences found between groups of responders at the study and task level. *Statistically significant findings. Given the large number of comparisons, these two numbers exceed what we would expect from chance alone.

In total there were 36 study-level questions and 28 task-level questions. Given the large number of comparisons, we would expect five study-level questions and four task-level questions to be statistically significant just from chance alone. (See Chapter 10 in Quantifying the User Experience for more discussion on making multiple comparisons and the issue of alpha inflation.)

Only three of the 36 differences between late responders and the rest of the responders were statistically significant (p <0.05). But because we would expect five significant differences from chance alone, these three significant differences aren’t terribly compelling–although it’s interesting that in all three comparisons of late responders versus the rest, the late responders showed more favorable responses.

In looking at the task-based comparisons, the seven and five statistically significant differences suggest that these differences might mean something (because the number of statistically significant differences is greater than what we’d expect from chance [4]). All seven comparisons of the first 10% versus the last 90% showed higher (more favorable) ratings for the first responders. Interestingly, the last responders are also rating higher than the other responders.

One reason for seeing only a few statistical differences is low sample sizes. Even a relatively large study of 200 participants has only 20 first responders and 20 late responders.

Next, to look for more subtle patterns, we use an approach from meta-analysis where we relax the within-study statistical significance and look only for patterns of higher or lower scores for early and late responders at the study and task level.

Table 2 shows the percent of comparisons that are higher for the chunks of respondents.

Study
(36 Comparisons)
Task
(28 Comparisons)
First vs Other 47% 64%
Last vs First *69% 39%
Last vs Other *64% 46%

Table 2: Patterns of higher responses at the study and task level. *Beyond chance variation (50%)

For example, we see the biggest difference when we compare the last 10% to the first 10% of responses at the study level (see the row that starts with Last vs First). Late responders score higher than the first responders in 25 of the 36 (69%) study comparisons and higher than all other responders in 64% of study questions (Last vs Other row in the table).

We’d expect a 50% split, so percentages of 64% and 69% are greater than what we’d expect from chance for both comparisons with the last responders group. This difference is very small and in only one study was this difference statistically significant. As shown in Figure 1, later responders at the study level give slightly higher ratings.


Figure 1: Average study rating, in order of response timing. Higher values indicate more agreeable or more positive sentiments. Later responders tend to rate slightly more favorably than early responders.

The story is a bit different at the task level. We found that first responders tended to rate the tasks slightly higher than the rest of the responders in 18 of the 28 comparisons (64%) and higher than the last responders in 61% of the comparisons. This is in contrast to the lower responses reported in some of the studies.

The task-level data metrics included only post-task ease (SEQ) and task confidence. The bulk of the differences we see can be attributed to first responders being more confident than later responders. Again, these are small differences, as shown in Figure 2.


Figure 2: Average task rating, in order of response timing, for confidence and ease.

Conclusions

These findings are similar to the literature, which generally found modest or no differences based on respondent order. The differences we observed were nuanced and depended on whether they were at the task or study level. The biggest difference we found from the literature was that late responders in our studies actually rated slightly more favorably. The difference could be due to how we categorized late responders, the subject of the studies, the types of questions, and possibly differences in how potential respondents are reminded to take a survey.

While this dataset is limited, we can conclude some things about first and last responders at the study and task level.

First responders tend to respond as follows:

  • About the same as the rest of the group on overall study questions
  • More favorably than later responders on task metrics, especially for task confidence

Late responders tend to respond as follows:

  • More favorably than the rest of the group and first responders on overall study questions
  • About the same as the rest of the group on task questions, and a bit lower than the first responders

Future research can help confirm or modify these conclusions. For now, this research suggests that any differences based on response timing are likely to be so minor that you can safely base decisions on preliminary data, and that you can stop surveys prematurely without being concerned about the usefulness of those results.