The Evolution of the Single Ease Question (SEQ)

Jeff Sauro, PhD • Jim Lewis, PhD

Feature image with single ease questionThe primary driving forces of evolution are variation, competition, and natural selection. In the domain of rating scales, variants are developed and tested to see which variant has the best measurement properties, and the winner of that competition survives to appear in future studies.

The Single Ease Question (SEQ®) is a single seven-point question asked after participants attempt a task as part of a usability test or benchmark (Figure 1). While typically administered electronically as part of a survey, it can also be presented to participants on paper or aurally.

The current version of the SEQ.

Figure 1: The current version of the SEQ.

The SEQ is probably the most popular measure of the perception of post-task ease. According to Google Scholar (on July 8, 2024), the paper that introduced the item now known as the SEQ, Sauro and Dumas (2009), has been cited over 500 times. Since its publication, the SEQ has become a frequently used tool in the toolbox of many UX practitioners and researchers.

It seems like the SEQ has been around for a long time, but it had not been published in anything like its current form before 2006. Where did the SEQ come from? In this article, we’ll trace the roots of the SEQ and key milestones to the present.

Early Roots: 1988–2006

In the early years, the root of the evolutionary tree that led to the current SEQ can be seen in the After Scenario Questionnaire (ASQ) item that rates satisfaction with task ease. In an experiment conducted in 2006 by Tedesco and Tullis, which tested the quality of different methods for assessing task ease (which included the ASQ ease item), the winner of that competition was a five-point unnumbered bipolar scale with endpoints from Very Difficult to Very Easy.

1988:
The ASQ

The ASQ was developed by Jim Lewis and colleagues in 1988 as part of an IBM Research system usability metrics project. The ASQ consists of the three items shown in Figure 2. Although the item format is completely different from the standard SEQ (a lower rating indicates better experience, agreement endpoints, and satisfaction-based wording), it contains a single-item seven-point measure of perceived ease.

The ASQ (from 1990).

Figure 2: The ASQ (from 1990).

2006:
First Published as a Single Five-point Item

Tedesco and Tullis (2006) investigated the properties of a version of the SEQ with the polarity of the standard format (Very Difficult on the left) but with five response options instead of seven. The response options were unnumbered radio buttons between the endpoints (Figure 3). They found that this early version of the SEQ had the best measurement properties of the items they included in their study.

A five-point version of the SEQ.

Figure 3: A five-point version of the SEQ.

The Middle Years: 2009–2012

In 2009, Sauro and Dumas increased the number of response options from five to seven to improve its single-scale measurement properties and tested it against the SMEQ and UME (reversing the endpoint polarity to go from Very Easy to Very Difficult to match the polarity of the competitors). They found this version of the SEQ to be competitive with the more complex SMEQ and substantially better than the UME. Also in 2009, Sauro and Lewis demonstrated the general validity of single ease items through their correlations with other post-task metrics. In 2010, the SEQ finally got an official name, and in 2011, early benchmarks. In 2012, researchers at MeasuringU restored the original endpoint polarity (from Very Difficult to Very Easy) to make its interpretation more intuitive (larger numbers indicate a better experience) and changed the item stem, arriving at the current version shown in Figure 1.

2009:
First Publication of a Seven-point Version

The earlier work by Tedesco and Tullis prompted Sauro and Dumas (2009) to partially replicate the earlier study with a different SEQ format, this one with seven unnumbered response options and endpoint polarity with Very Easy on the left (Figure 4).

An early seven-point version of the SEQ.

Figure 4: An early seven-point version of the SEQ.

The number of response options was changed from five to seven to increase scale reliability and sensitivity (ability to differentiate between designs with smaller sample sizes) and to reduce ceiling effects.

To test the sensitivity of the new seven-point version, it was compared in a within-subjects design with two other single-item scales: a digital version of the Subjective Mental Effort Questionnaire (SMEQ, Figure 5) and the Usability Magnitude Estimation (UME, Figure 6), a scale using magnitude estimation.

The SMEQ.

Figure 5: The SMEQ.

 

The UME.

Figure 6: The UME.

In this study, the endpoint polarity of the SEQ was reversed to make it consistent with the polarities of the ASQ, UME, and SMEQ (lower numbers represented easier tasks, i.e., less effort). This relieved participants of having to be vigilant regarding item polarity.

This version of the SEQ also exhibited excellent measurement properties—but it still didn’t have its name.

2009:
Correlations with Other Metrics

In our analysis of post-task metrics from dozens of data sets from thousands of participants, we found that the single ease item correlated (along with other post-task questions) with completion rates, time, errors, and post-study attitudinal metrics, showing strong concurrent validity.

2010:
The SEQ Gets Its Name

In 2010, Sauro published the MeasuringU article, “If You Could Only Ask One Question, Use This One.” This was the first publication of the name Single Ease Question for this item. In this article, the item format continued to have seven unnumbered response options but was returned to its original polarity (Very Difficult on the left; Figure 7).

The first named version of the SEQ.

Figure 7: The first named version of the SEQ.

2011:
Benchmarks for the SEQ

MeasuringU started using the SEQ extensively in research, including predicting SEQ scores from the task scenario complexity. Early benchmarks suggested the mean of the SEQ hovered around 5, and after we gathered more data, the mean settled closer to 5.5. This allowed us to start generating percentile ranks (e.g., a score around 5.5 falls at the 50th percentile).

2012:
Varying the Item Format (Numbers and Wording)

The defining characteristic of the SEQ is its single seven-point item. In practice, the SEQ had a few variations on the original format (one is shown in Figure 8). We started adding numbers to the response options (1 to 7) and slightly changed how we posed the question. We changed the original question from a terse, “Overall, this task was?” to a more explicit, “Overall, how difficult or easy did you find this task?”

The SEQ with numbers added to response options and a modified item stem.

Figure 8: The SEQ with numbers added to response options and a modified item stem.

Maturation: 2018–2024

Following the definition of the current version of the SEQ, the metric moved into a maturation stage. In 2018, SEQ scores were calibrated to task completion rates and times, and in 2021, it was designated a registered trademark. In 2022, the current version survived three tests against earlier variants (endpoint polarity, item stem, and response option numbering). In 2023, an adjective scale was developed as an aid to SEQ interpretation, and a variant without a neutral response option failed to beat the standard version. Investigations in 2023 and 2024 of a new type of item, the click scale, supported the continued use of the current seven-point version.

2018:
Calibrated to Task Completion and Times

With our large dataset of post-task metrics and an earlier established strong correlation between other post-task metrics, we were able to map SEQ scores to thresholds for completion rates and times (Figure 9). For example, the average SEQ score is associated with an average task completion rate of 71%.

Relationship between SEQ scores, completion rates, and task times.

Figure 9: Relationship between SEQ scores, completion rates, and task times.

2021:
Registered Trademark

We applied for an SEQ trademark in October 2019, and on December 14, 2021, it was designated a registered trademark (SEQ®).

2022:
Testing Item Variations

Because of the variations in formats over time, we wanted to further rule out (or identify) any impacts these may have on scoring. From our original 2009 version to the version we currently use, there have been several variations, including on numbered response options, the wording of the question stem, and even the polarity of the scale endpoints. In 2022, we conducted three experiments to quantify the extent these variations affected the measurement properties of the SEQ.

Experiment 1:
Endpoint polarity

As shown in Figure 10, we compared the 2010 version (Figure 7) with endpoint polarity from difficult to easy to an otherwise identical version with endpoint polarity from easy to difficult (as in Sauro & Dumas, 2009). We found no evidence for a left-side bias, no significant difference in means, and overall no significant difference in top-box scores (but there was an interaction between item format and task difficulty such that the difference in top-box scores was statistically significant, but only for the difficult task). Respondents significantly preferred endpoint polarity from difficult to easy.

Two SEQ item formats differing in the polarity of the endpoints.

Figure 10: Two SEQ item formats differing in the polarity of the endpoints.

Experiment 2:
Item stem

As shown in Figure 11, we compared the original stem (“Overall this task was:”) with our current version (“How easy or difficult was it to complete this task?”). We found no statistically significant differences (in fact, the smallest p-value across the analyses was .30). There was no significant respondent preference for either format.

Two SEQ item formats differing in the wording of their stems.

Figure 11: Two SEQ item formats differing in the wording of their stems.

Experiment 3:
Response option numbering

As shown in Figure 12, we compared our current numbered version with an otherwise identical unnumbered version. We found no statistically significant differences, although some top-box differences were large enough to be concerning. The version with numbers appears to have the potential for better discrimination between easy and hard tasks. Respondents significantly preferred the numbered version.

Two SEQ item formats differing in the presence or absence of numeric labels on response options.

Figure 12: Two SEQ item formats differing in the presence or absence of numeric labels on response options.

2023:
Adjective Interpretation of SEQ Scores

Numbers by themselves don’t mean much unless they can answer the question, “Compared to what?” In 2023, we worked out how to describe SEQ scores with descriptive adjectives of relative difficulty or ease of completing tasks based on data from 211 participants who worked on five online tasks of varying difficulty. The five tasks covered a wide range of perceived difficulties, and the correspondence between SEQ scores and responses to a concurrently collected adjective scale was strong. These adjective descriptions for SEQ scores provide a useful way to interpret the SEQ (Table 1 and Figure 13).

Adjective ScaleLowMeanHigh
Most difficult imaginable1.001.001.49
Very difficult1.501.932.69
Difficult2.703.534.29
Easy4.305.095.59
Very easy5.606.146.49
Easiest imaginable6.506.827.00

Table 1: Ranges for adjective scale interpretation of SEQ scores.

SEQ and adjective scale correspondence.

Figure 13: SEQ and adjective scale correspondence.

2023:
Effect of Removing the Neutral Option

We were curious what would happen if we removed the neutral (center) response option from four items that we often use in our research (including the SEQ) and if we also systematically manipulated task difficulty, so we ran a Greco-Latin experiment (n = 200) to find out. Our analysis comparing the impact of removing the neutral point from four standardized items (SEQ, UX-Lite’s Ease and Useful items, LTR) indicated little evidence for systematic differences in response patterns. In this experiment, all participants rated two experiences, one with the standard SEQ and one with the six-point version (Figure 14). Over 90% of respondents didn’t notice that the rating scales had different numbers of response options. Some evidence suggested that the absence of a middle option in rating scales tends to increase the magnitude of ratings by a small amount, so we generally recommend using the standard version for easier comparison with historical SEQ data.

The SEQ with six response options.

Figure 14: The SEQ with six response options.

2023:
Standard SEQ and Click SMEQ Sensitivity

SMEQ measurements are often collected with slider scales that, though they are more sensitive than five-point scales, can be problematic because sliding a control to a desired point is harder than clicking a button.

A potential alternative to slider scales is the click scale—an image of a rating scale on which respondents can click anywhere to indicate their rating. Based on data from 103 participants who rated five online tasks of varying difficulty with the standard SEQ and a click version of the SMEQ, we found that the click version of the SMEQ was slightly more sensitive for confidence intervals around means, but the standard SEQ was slightly more sensitive when comparing means.

2024:
Comparing Standard and Click SEQ Scores

Continuing our investigation of click scales, we conducted an unmoderated usability study (n = 200) using the same tasks of varying difficulty as those used in the 2023 comparison of standard SEQ and click-SMEQ scores. After each task, respondents rated task ease with a click version of the SEQ (shown in Figure 15), which was compared with standard SEQ scores collected in studies that had the same five tasks. The key finding was that the standard and click SEQs had about the same standard deviations (Standard: 23.9; Click: 23.4), so the click SEQ did not appear to be sufficiently more sensitive to justify replacing the standard SEQ in the UX research toolkit.

Example of responses on the click SEQ (rating of ease of most recent tax filing).

Figure 15: Example of responses on the click SEQ (rating of ease of most recent tax filing).

The Future of the SEQ

The SEQ has been a popular perceptual UX measure of post-task ease for over 15 years and seems likely to maintain its popularity well into the future. There really is no metric that is strongly competitive with the SEQ, and it is highly used across our study templates as the default post-task question within MUiQ. For example, no other single-item measure of perceived task ease has a sufficient normative database for the assignment of percentiles to scores, has been calibrated to task completion and times, or can be interpreted with an adjective scale.

Its format has varied over the years, but research we’ve conducted since 2022 has shown that numerous variations have little to no effect on respondent behaviors including endpoint polarity, stem wording, response option numbering, presence/absence of a neutral (middle) response option, or instantiation as a click scale.

All this research gives us confidence in using the current version of the SEQ in our day-to-day research, now and into the future. It is a mature metric.

 

Animated evolution of the SEQ

Figure 16: Animated GIF of the conceptual evolution of the SEQ.

0
    0
    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top