
Its original form had 16 items. That is long for a UX questionnaire (e.g., the SUS has ten and the SUPR-Q® has eight). The reason it had 16 items was that it was developed using a technique called Rasch analysis, which, among other things, enables the dynamic presentation of a subset of items from the total set of 16 items.
During the dynamic presentation, if participants strongly agreed to the first item presented (e.g., “I like to use the app frequently”), it would present an item that should be harder to agree with (e.g., “I would never delete the app”). Using this technique, most participants would only need to answer between four and eight items to get a final score.
But presenting the items dynamically requires specialized software (like our MUiQ® platform), which many researchers don’t have. Instead, when the SUPR-Qm is used nondynamically, researchers present participants with all 16 items (typically in two eight-item grids). That’s not bad for measurement; it just takes longer to answer and score. So, as part of our program to streamline the SUPR-Qm, we developed a second version of the questionnaire (SUPR-Qm V2) that has five items (a carefully selected subset of the original set of 16 items).
In a previous article, we demonstrated the stability of the original 16-item version of the SUPR-Qm over an eight-year period, but until recently, we did not have the data needed to assess the stability of the new five-item SUPR-Qm V2.
In this article, we describe research conducted to verify the stability of the SUPR-Qm V2 and to reassess the stability of the original SUPR-Qm. How stable are those five items compared to the original 16?
Method
From February 2019 through May 2023, we used our MUiQ platform to collect UX data for 23 industries (like dating, pets, and office supplies) from a total of 155 websites. The primary purpose of these surveys was to refresh a normative database for the interpretation of SUPR-Q scores, but over this time, we also collected SUPR-Qm data from respondents who indicated that they used the mobile app for the company or service they were rating.
All participants were members of a professional online consumer panel, and all were from the United States. Suspicious cases were removed before analysis using standard methods (such as inspection of completion times, responses in free text fields, and person fit statistics). The total sample size was 4,149 (48% male, 50% female, 42% less than 30 years old, and 58% 30 years or older).
An advantage of Rasch scaling is the theoretical stability of scales across changes in time, with some empirical estimates of Rasch scales being stable for as long as 15 years. To investigate the stability of the original SUPR-Qm and SUPR-Qm V2 scales, we divided the data into two parts, Group A and Group B (see the Appendix).
The data in Group A were collected from February 2019 through August 2021, covering 11 industries and 58 websites with n = 2143. Group B included data collected from February 2022 through May 2023, covering 12 industries and 97 websites with n = 2006. The only industry included in both groups was Airlines.
Results
To check the stability of the SUPR-Qm V1 (original version with 16 items) and V2 (streamlined version with five items), we superimposed Rasch logit scales for Groups A and B for each version. As shown in Figures 1 and 2, the locations of scores on the logit scales were nearly identical for both the original SUPR-Qm and the SUPR-Q V2, demonstrating their scale stability with varying dates and industries.
Figure 1: Stability of Rasch scale for the original SUPR-Qm, indicated by the overlap of lines for Groups A and B.
Figure 2: Stability of Rasch scale for SUPR-Qm V2, indicated by the overlap of lines for Groups A and B.
Summary and Discussion
Is the five-item version of the SUPR-Qm reliable and stable? In short, yes. Scores computed from the five-item SUPR-Qm V2 look just as stable as those computed from the 16-item original.
We split the sample into two time periods (2019–2021 vs. 2022–2023). To investigate the stability of the original SUPR-Qm and the SUPR-Qm V2, we divided our sample SUPR-Qm scores into two groups that had a common method (retrospective UX surveys) but differed in their time periods (Group A: February 2019 through August 2021 and Group B: February 2022 through May 2023) and industries (for details, see the Appendix).
The results over the two time periods showed remarkable similarity. As shown in Figures 1 (SUPR-Qm V1 with 16 items) and 2 (SUPR-Qm V2 with five items), the locations of the groups’ scale scores on the underlying logit scales were almost indistinguishable. These results show that the scales for both versions of the SUPR-Qm have been stable for over four years (February 2019 through May 2023) and should remain stable for years to come.
In a future article, we’ll discuss the research we’ve conducted to establish norms and curved grading scales for the interpretation of SUPR-Qm scores.
For more details about this research, see the paper we published in the Journal of User Experience (Lewis & Sauro, 2025).
Appendix
The Appendix table provides details about the industries included in our analysis. It also shows the division of the data into the two groups (A and B), which we used to analyze the stability of the original SUPR-Qm and SUPR-Qm V2 scales over differences in time and industries.
The table shows the data collected over two time periods: February 2019 to August 2021 (n = 2143) and February 2022 through May 2023 (n = 2006). This grouping divides the large dataset roughly in half, allowing us to investigate the stability of Rasch measurement over differences in time and industry (the only industry in common across the time periods was Airlines). The total demographics are the averages over industries weighted by the sample sizes.
| Group A (2/19 – 8/21) | Apps | n | Male | Female | < 30 years | ≥ 30 years |
| Airlines | 5 | 105 | 54% | 44% | 52% | 48% |
| Auto | 4 | 49 | 59% | 39% | 51% | 49% |
| Dating | 7 | 277 | 46% | 52% | 43% | 57% |
| Dieting | 5 | 135 | 41% | 58% | 53% | 47% |
| Food Delivery | 4 | 159 | 47% | 53% | 49% | 51% |
| Job Search | 4 | 38 | 48% | 50% | 57% | 43% |
| Mass Merchants | 9 | 182 | 33% | 66% | 31% | 69% |
| Meeting Software | 4 | 73 | 58% | 41% | 73% | 27% |
| Music | 7 | 1058 | 49% | 50% | 50% | 50% |
| Pets | 4 | 33 | 43% | 56% | 47% | 53% |
| Outdoors Stores | 5 | 34 | 57% | 41% | 48% | 52% |
| Group B (2/22 – 5/23) | Apps | n | Male | Female | < 30 years | ≥ 30 years |
| Airlines | 12 | 242 | 47% | 51% | 61% | 39% |
| Business Information | 3 | 92 | 53% | 46% | 26% | 74% |
| Clothing | 13 | 144 | 45% | 52% | 28% | 72% |
| Electronics | 9 | 131 | 62% | 37% | 18% | 82% |
| Grocery | 8 | 251 | 40% | 59% | 31% | 69% |
| News | 14 | 133 | 41% | 57% | 30% | 70% |
| Office Supplies | 4 | 62 | 58% | 38% | 19% | 81% |
| Real Estate | 5 | 93 | 51% | 48% | 49% | 51% |
| Seller Marketplaces | 6 | 238 | 44% | 54% | 60% | 40% |
| Ticketing | 5 | 203 | 52% | 45% | 40% | 60% |
| Travel Aggregators | 8 | 133 | 51% | 48% | 48% | 52% |
| Wireless | 10 | 284 | 50% | 47% | 25% | 75% |
| Total (23 industries) | 155 | 4149 | 48% | 50% | 42% | 58% |
Appendix table: Industries, sample sizes, and gender/age demographics for retrospective UX data collected from February 2019 through May 2023.

