## Sensitivity Advantage of the Subjective Mental Effort Questionnaire (SMEQ)

In 2009, Joe Dumas and Jeff published a CHI paper [PDF] that compared the popular Single Ease Question (SEQ) with a specific type of visual analog/slider scale, the Subjective Mental Effort Questionnaire (SMEQ). Twenty-six participants attempted five tasks on two enterprise expense reporting applications. Each task was attempted three times on both systems (in counterbalanced order using a Greco-Latin design). Assessment of the two applications with the System Usability Scale (SUS) indicated a significant difference in their overall perceived usability (80 vs. 52). Figure 1 shows the version of the SMEQ used in this experiment. Unlike typical sliders that usually measure responses from 0 to 100, the SMEQ has a range from 0 to 150. Also unlike typical sliders, the SMEQ has nine labels between the endpoints to guide placement of the slider. The placement of these labels is not arbitrary and was a major focus of the research that developed the SMEQ. For the full sample of 26 users, after-task ratings were statistically significant on four out of five tasks for both the SEQ and the SMEQ. To determine whether the SMEQ would be able to differentiate more than the SEQ at smaller sample sizes (i.e., if it’s more sensitive), we conducted a Monte Carlo resampling exercise. We took a thousand random samples with replacement at sample sizes of 3, 5, 8, 10, 12, 15, 17, 19, and 20 and compared the means for the two products using a paired t-test. We counted the number of means that could differentiate between expense reporting applications at p < .05. The more sensitive a questionnaire type is, the more readily it can detect significant differences between products with smaller sample sizes. The SMEQ was more sensitive, better differentiating between the apps at smaller sample sizes. Figure 2 shows that starting at a sample size of eight, SMEQ tended to identify more samples than the SEQ as statistically significant (out of a thousand). For n = 8 and higher, the advantage ranged from 8–14% depending on the sample size. However, the SMEQ also had different (and calibrated) labels (Figure 1) and a range of 0 to 150, so this advantage might not generalize to standard sliders with just endpoint labeling and a range from 0 to 100. We needed more data.## Resampling Sliders vs. Numbers

To explore the possible sensitivity advantage of the standard slider, we conducted a new Monte Carlo resampling exercise with the entertainment data described in our earlier article. Both the slider and the five-point scales had only the endpoints labeled with strongly disagree and strongly agree (Figure 3), which eliminated the confounding of the label formats. The sliders could record ratings from 0 to 100. Figure 4 shows the resulting UMUX-Lite scores for all three services and both item formats. There were no statistically significant differences between the slider and number formats at the full sample sizes within each brand. Netflix (n = 73) received significantly higher ratings than Hulu (n = 40) and Amazon Prime (n = 38). Consistent with a sensitivity advantage for sliders, although all were significant (p < .05), the p-values for t-tests comparing Netflix with Hulu and Prime were lower for sliders than numbers (equal variance not assumed): Netflix vs. Hulu | Numbers: t(65.0) = 2.94, p = .005 Netflix vs. Hulu | Sliders: t(60.4) = 3.12, p = .003 Netflix vs. Prime | Numbers: t(58.3) = 2.20, p = .03 Netflix vs. Prime | Sliders: t(51.3) = 2.65, p = .01 The standard deviations of the sliders and numbers by brand are shown in Table 1. Sliders had a slightly smaller standard deviation than their corresponding numeric scale data, which may provide more precision at smaller sample sizes. The smaller standard deviation likely caused the lower p-values observed for sliders in the full sample comparisons.Sliders | Numbers | |
---|---|---|

Hulu | 15.1 | 17.2 |

Netflix | 10.5 | 13.3 |

Prime | 17.5 | 18.1 |

## Summary and Discussion

Our earlier research had demonstrated a sensitivity advantage of a specific type of slider scale (the SMEQ), but that sensitivity may have been specific to that questionnaire (because of its unique labels and numbers) and not necessarily generalizable to all sliders. We conducted a series of Monte Carlo resampling analyses on streaming entertainment providers (Netflix, Hulu, Prime, Disney+, and HBO Now) and found that standard sliders*did*have a moderate sensitivity advantage, corroborating our earlier findings. The differences were detectable once the sample sizes were ten or more (presumably, the point at which the analyses had enough power to start discriminating between the services). At n = 10, the advantage across the three analyses was about 2%, increasing to about 8% at the maximum n for resampling in each analysis. Researchers can use sliders to achieve higher sensitivity than they would get with five-point numeric scales, but they must consider the shortcomings of sliders (higher non-response rates and possible accessibility issues). In the future, we plan to examine more comparisons to see if a numeric scale with seven or eleven points gets closer to (or even matches) the sensitivity of a slider. [mc4wp_form id=”3053″]