We conduct a lot of surveys and unmoderated studies at MeasuringU® using our MUIQ® platform.
One of the first steps in these studies involves both screening (ensuring you have the right participants) and characterizing (having sufficient information for further analysis on the participants such as prior experience) the participants. While there are over a dozen ways to present screening and characterizing questions, two are commonly used.
One is a grid of options in which respondents use checkboxes to indicate which options are applicable to the question asked (select-all-that-apply, or SATA). The other is to require respondents to select yes or no for each option (yes/no forced choice).
In a previous article, we critically reviewed the research literature in which there were comparisons of SATA (Figure 1) with yes/no forced choice formats. That research consistently indicated greater selection rates for forced choice.
In our review, however, we discovered that researchers investigating this effect had not accounted for a potentially important difference in the presentation format of yes/no forced choice items—in a grid (Figure 2) or as a series of individual yes/no questions (Figure 3).
For previous research that we know to have compared SATA and yes/no grids (Smyth et al., 2008), the difference in selection rates due to item format was just under 1%.
For research that used a yes/no forced choice series format, the reported differences in selection rates compared to SATA were 7.3% (Lau & Kennedy, 2019) and 24% (Nicolaas et al., 2015).
We did not turn up any previous research that directly compared SATA selection rates with those for forced choice grids and forced choice series, so we conducted our own experiments in a UX research context. So of course, we took up the challenge of collecting some new data to help address the gap.
We conducted two surveys to better understand the impacts and preferences of SATA and yes/no grids.
The research participants were members of a research panel, all U.S. adults who participated for a nominal cash incentive. We conducted two surveys with n = 201 for Survey 1 and n = 260 for Survey 2.
The surveys were programmed using the MUIQ UX testing platform. We used a Greco-Latin experimental design in both surveys to have respondents make selections from a SATA grid (Figure 1) and one of the two forced choice formats (grid in Survey 1 as shown in Figure 2; series in Survey 2 as shown in Figure 3).
Greco-Latin experimental designs have the enhanced discriminative power of within-subjects designs plus appropriate counterbalancing of the order of presentation of substantive and nuisance independent variables.
There were two groups of selection items. The first had a list of six well-known mass market websites (those shown in Figure 1) and the second had a list of four well-known seller marketplaces (those shown in Figure 2).
With two formats and two website groups, the experimental design had four orders of presentation:
- Mass merchants in SATA format followed by seller marketplaces in forced choice format
- Mass merchants in forced choice format followed by seller marketplaces in SATA format
- Seller marketplaces in SATA format followed by mass merchants in forced choice format
- Seller marketplaces in forced choice format followed by mass merchants in SATA format
Each respondent was randomly assigned to one of these orders, so each respondent experienced both formats with different website groups in a fully counterbalanced experimental design. After completing the two sets of selections, respondents indicated which of the formats they preferred.
The independent variable of interest was the within-subjects comparison of SATA and forced choice formats. The other independent variables were the type of website (mass merchant vs seller marketplace, within-subjects) and the order of presentation (between-subjects).
To focus on the primary effect of interest — the difference in selection percentages for SATA and forced choice formats — data were collapsed over the nuisance variables of website type and presentation order. For each survey we analyzed mean selection percentages and respondent preferences.
Survey 1: Forced Choice Grid
In Survey 1, the mean selection percentage was nearly identical between formats. For SATA the selection rate was 54.06% and for the forced choice grid it was 54.11%, an observed difference of .05% that was not statistically significant (t(200) = .024, p = .98). The 95% confidence interval around the difference ranged from -4.5% to 4.4%.
Despite the lack of difference in selections, participants overwhelmingly preferred the SATA format by more than a 5 to 1 margin: SATA format (72.1%) versus the forced choice grid (13.4%).
Figure 4 shows percentages and 95% adjusted-Wald binomial confidence intervals (Sauro & Lewis, 2016) for the format preferences in Survey 1.
Survey 2: Forced Choice Series (Individual Yes/No Questions)
Participants selected slightly more in yes/no questions (58.5%) versus the SATA option (55.4%). The 3.1 percentage point difference was not statistically significant (t(259) = 1.6, p = .11). The 95% confidence interval around the difference ranged from -0.7% to 6.9%.
Again, respondents overwhelmingly preferred the SATA format but by a slightly smaller ratio compared to Study 1 (about 3:1). Figure 5 shows percentages and 95% adjusted-Wald binomial confidence intervals for the format preferences in Survey 2. As in Survey 1, respondents strongly preferred the SATA format (61.9%) to the forced choice series (16.5%).
There isn’t much empirical research in differences in selection rates for response options presented in SATA and forced choice question formats. As discussed in our previous article, the data that exist generally support the hypothesis that selection rates would be greater with forced choice formats.
In published research comparing forced choice grids with SATA, the magnitudes of the reported differences were small (1.6% in Rasinski et al., 1994; 0.8% in Smyth et al., 2006; 0.05% in our Survey 1).
The magnitudes of differences between forced choice series and SATA tend to be larger (24% in Nicolaas et al., 2015; 7.3% in Lau & Kennedy, 2019; 3.1% in our Survey 2).
Based on their findings, Lau and Kennedy (2019, p. 4) wrote, “Pew Research Center has adopted a policy of using a forced-choice format instead of a select-all-that-apply list in its online surveys whenever possible.” Should UX researchers adopt a similar policy?
For numerous reasons, we believe such a decision would be premature, especially if applied to UX and (likely) consumer research.
Differences in emotionally charged content: The context of much UX and consumer research is qualitatively different from the social and political research conducted by Pew Research Center. People may be much more emotionally invested in the content of social and political research (e.g., victimization, attitude toward political parties) than in most UX research (e.g., product usage).
Differences in forced choice formats: Lau and Kennedy (2019) compared SATA with a series of forced choice questions rather than presenting them in grids (see Figures 1 and 2). When we used a forced choice grid in Survey 1, we found a near-zero difference between selection rates for SATA and forced choice.
When we ran the same experiment in Survey 2 using a forced choice series of individual yes/no questions instead of a grid, our estimated difference in the selection rates was 3.1%. This was not significantly different from 0 (p = .11), so we shouldn’t make too much of it, but it is interesting that the only studies that had observed differences greater than 2% are known to have compared SATA with forced choice series.
Not clear what the “true” selection rate is: Although Lau and Kennedy (2019) presented a rational argument that forced choice selection rates should be closer to reality than SATA selection rates, there are counterarguments that reduce our confidence in that assertion.
For example, Callegaro et al. (2014) suggested that requiring a yes or no response could, through acquiescence bias, inflate selection rates. This could especially be the case with respondents who are fatigued or in a hurry to finish.
Without conclusive validation studies, which would be very difficult to conduct, it isn’t possible to determine which selection rate is closer to reality.
Rank orders for data collected with different formats tend to be the same: In many cases the research goal is to understand the rank order of choices, which tends to be very similar for SATA and forced choice formats.
Strong respondent preference for SATA: Respondents in both of our surveys indicated overwhelming preference for SATA over forced choice, likely due to the relative ease of scanning a list and clicking as needed in a single vertical column versus forcing a selection for each item, whether or not it applies.
Given the goal of keeping drop-out rates as low as possible when collecting survey data, this is not a trivial concern.
For these reasons, it seems reasonable for UX researchers and practitioners to continue using SATA rather than switching to forced choice unless there is a compelling reason to do otherwise (e.g., a need to compare new results with previous research that used a forced choice format).
Summary and Takeaways
Researchers have reasoned that if there is a difference in selection rates as a function of question format, you would expect higher rates with forced choice than with SATA. Empirical analysis of previous research generally supports this hypothesis.
In our current UX research, we found no significant difference in selection rates when both formats appeared in grids. When comparing a SATA grid with a forced choice series, the observed difference was 3.1%. Examination of 95% confidence intervals around the estimated differences in our experiments indicated that differences greater than 4.5% were unlikely when both SATA and forced choice items appear in grids, and differences greater than 7% were unlikely when forced choice items appear in a series of yes/no questions.
When deciding which format to use, researchers should not assume that SATA and forced choice formats will produce identical selection rates. On the other hand, they very likely will produce similar rank order patterns. Respondents greatly prefer SATA over both forced choice formats.
Future research on this topic should focus on investigating the relative influence of different variables on the difference in selection rates for SATA and forced choice questions, including forced choice format (e.g., grid vs. series) and research context (e.g., UX vs. education vs. social policy/politics).
Our key recommendations for UX researchers and practitioners are:
- SATA probably better: Because we found only small differences in selection rates as a function of question format in a UX research context but found respondents significantly preferred SATA over forced choice, we recommend using SATA in UX research unless there is a compelling reason not to do so.
- Forced choice grids over series of questions: If there is a need to collect these types of data with a forced choice format, we recommend using a forced choice grid rather than a series of yes/no questions.
- Be consistent with formats when comparing: Because the data reported to date generally indicate greater selection rates for the forced choice series format, researchers should assume that format matters and avoid putting themselves in the position of having to compare data collected with different formats.
- Randomize: When collecting this type of data, report the format used (SATA, forced choice grid, or forced choice series), and randomize the order of presentation of the items.
Note: This article was adapted from Lewis, J. R., & Sauro, J. (2021). Comparison of Select-All-That-Apply items with yes/no forced choice items. Journal of Usability Studies 17(1), 21-30.