
But what have we created, and what have we just used or extended?
Across our combined careers, we (Jeff and Jim) have published 16 psychometrically qualified UX metrics (both creating original and modifying existing questionnaires) plus a method for combining prototypical usability metrics, and we have made major contributions to a popular standardized UX questionnaire that we did not create, the System Usability Scale (SUS).
In this article, we briefly describe each of these metrics (presented in roughly reverse chronological order by decade) and provide key links to more information about them (so you won’t need to ask ChatGPT and risk hallucinated references).
2020–2025
From 2020 to 2025, we developed and published four standardized UX questionnaires: UX-Lite®, SUPR-Qm® V2, TAC-10™, and PWCQ.
UX-Lite®
The UX-Lite has its roots in the UMUX-LITE (more on the UMUX-LITE below). It’s a two-item questionnaire that is essentially a miniature version of the Technology Acceptance Model (TAM), assessing the perceived ease-of-use and perceived usefulness of products and services with two five-point scales. It’s becoming an increasingly popular metric in UX research and practice.
From 2020 to 2024, we published 15 articles on the UX-Lite, many of which explored different ways to phrase the “usefulness” item because its original wording was overly complex. In addition to demonstrating the reliability and validity of the UX-Lite, it has also proved to be useful in regression and structural equation modeling of higher-level outcome metrics like ratings of overall experience, behavioral intentions (e.g., likelihood to recommend, likelihood to reuse), and actual user behaviors.
Key Characteristics
- Measures: Perceived ease of use and perceived usefulness
- Number of items: 2
- Reliability: 0.75 (coefficient alpha unless otherwise specified)
- Types of Validity: Content, construct, concurrent
- Number of subscales: 2 (single-item scales)
- Interpretive norms: Yes
- Development method: Classical test theory
Key Links and Publications
- Lewis, J. R., & Sauro, J. (2023). Effect of Perceived Ease of Use and Usefulness on UX and Behavioral Outcomes. International Journal of Human-Computer Interaction, 40(20), 6676–6683.
- Measuring UX: From the UMUX-LITE to the UX-Lite
- Evolution of the UX-Lite
- How to Score and Interpret the UX-Lite
SUPR-Qm® V2
The mobile app experience is a unique and defining aspect of our interactions with our devices. While the experience shares many characteristics with using software and websites on a traditional monitor, the mobility, screen size, and interaction style make the experience distinct. Consequently, we developed a questionnaire, the SUPR-Qm, to measure attitudes toward the mobile app user experience. In 2025, we published the second version of the SUPR-Qm, reducing the number of items from the original 16 to five.
Key Characteristics
- Measures: Intensity of the UX of mobile apps
- Number of items: 5
- Reliability: 0.83
- Types of Validity: Content, construct, concurrent
- Number of subscales: 0
- Interpretative norms: Yes
- Development method: Rasch scaling
Key Links & Publications
- Lewis, J. R., & Sauro, J. (2025). Streamlining the SUPR-Qm: The SUPR-Qm V2. Journal of User Experience, 20(2), 65–88.
- Ten Things to Know About the SUPR-Qm
- How to Score and Interpret the Five-Item SUPR-Qm V2
TAC-10™
We based the TAC-10 on research conducted at MeasuringU from 2015 through 2023 and presented it at UXPA 2024. The TAC-10 is a select-all-that-apply checklist of ten different technical activities. We published six blog articles in 2023 detailing its development, including why there was a need for a measure of tech savviness in UX research (to enable discrimination of interface and participant characteristics when analyzing UX data) and how to use the TAC-10 to classify participants into different levels of tech savviness.
Key Characteristics
- Measures: Level of tech savviness
- Number of items: 10
- Reliability: 0.67 (Spearman–Brown for dichotomous data)
- Types of Validity: Content, construct, concurrent
- Number of subscales: 0
- Interpretative norms: Yes
- Development method: Rasch scaling
Key Links & Publications
- 12 Things to Know About Using the TAC-10 to Measure Tech Savviness
- Classifying Tech Savviness Levels with Technical Activity Checklists
PWCQ
In our UX research practice, we frequently encounter users and designers who criticize website interfaces for being cluttered and stakeholders who worry about the experiential and business consequences of a cluttered website. But what exactly does it mean for a website to appear cluttered? To answer this question, we developed the Perceived Website Clutter Questionnaire (PWCQ), a five-item questionnaire with two subscales: Content Clutter and Design Clutter.
Key Characteristics
- Measures: The perceived clutter of websites
- Number of items: 5
- Reliability: 0.90
- Types of Validity: Content, construct, concurrent
- Number of subscales: 2
- Content Clutter: Reliability = 0.91
- Design Clutter: Reliability = 0.88
- Interpretative norms: No
- Development method: Classical test theory
Key Links & Publications
- Lewis, J. R., & Sauro, J. (2024). Measuring the Perceived Clutter of Websites. International Journal of Human-Computer Interaction, 41(9), 5260–5273.
- Confirming the Perceived Website Clutter Questionnaire
- Incorporating Clutter in the SUPR-Q Measurement Framework
2010–2019
From 2010 through 2019, we (Jeff and Jim) both collaborated and worked separately on the creation and publication of seven standardized UX questionnaires, plus the publication of books, papers, and numerous articles on how to use and interpret the SUS.
SUPR-Q®
At MeasuringU, we originally benchmarked websites using the SUS. But we knew that the quality of the website user experience was more than just usability, so we developed the Standardized User Experience Percentile Rank Questionnaire (SUPR-Q) in 2011 and published our findings in 2015. The SUPR-Q is a short (eight-item) questionnaire that measures perceptions of Usability, Trust, Appearance, and Loyalty for websites. The combined score provides an overall measure of the quality of the website user experience. The normative percentile database contains responses from more than 10,000 participants and 150 websites (updated on an ongoing basis, about once per quarter).
Key Characteristics
- Measures: Perceptions of the quality of UX with websites
- Number of items: 8
- Reliability: 0.90
- Types of Validity: Content, construct, concurrent
- Number of subscales: 4
- Usability: Reliability = 0.88
- Trust: Reliability = 0.87
- Appearance: Reliability = 0.80
- Loyalty: Reliability = 0.73
- Interpretive norms: Yes
- Development method: Classical test theory
Key Links & Publications
- Sauro, J. (2015). SUPR-Q: A Comprehensive Measure of the Quality of the Website User Experience. Journal of Usability Studies, 2(10), 68–86.
- SUPR-Q License & Calculator Package
- Validating the Basic SUPR-Q Measurement Model
SUPR-Qm®
Our original version of the mobile app questionnaire had 16 items selected from a larger set using Rasch scaling. We list this here for historical purposes, but our current practice is to use the SUPR-Qm V2 (see above).
Key Characteristics
- Measures: Intensity of the UX of mobile apps
- Number of items: 16
- Reliability: 0.94
- Types of Validity: Content, construct, concurrent
- Number of subscales: 0
- Interpretative norms: Yes
- Development method: Rasch scaling
Key Links & Publications
- Sauro, J., & Zarolia, P. (2017). SUPR-Qm: A Questionnaire to Measure the Mobile App User Experience. Journal of Usability Studies, 13(1), 17–37.
- Lewis, J. R., & Sauro, J. (2025). Streamlining the SUPR-Qm: The SUPR-Qm V2. Journal of User Experience, 20(2), 65–88.
- How Stable is the SUPR-Qm After 8 Years?
UMUX-LITE
The UMUX-LITE is a mini-TAM with two seven-point items, assessing perceived ease of use and perceived usefulness. It was derived from the four-item UMUX (Usability Metric for User Experience) when Jim was at IBM (in collaboration with Brian Utesch and Deb Maher) and is the predecessor to the UX-Lite. At MeasuringU, we prefer the UX-Lite (described above) due to its enhanced flexibility, but the UMUX-LITE is also used in current UX research and practice.
Key Characteristics
- Measures: Perceived ease of use and usefulness
- Number of items: 2
- Reliability: 0.83
- Types of Validity: Content, construct, concurrent
- Number of subscales: 2 (single-item scales)
- Interpretative norms: No
- Development method: Classical test theory
Key Links & Publications
- Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013). UMUX-LITE: When There’s No Time for the SUS. In Proceedings of CHI 2013 (pp. 2099–2102). Association for Computing Machinery.
- Measuring Usability: From the SUS to the UMUX-Lite
MOS-X2
As part of his work on speech systems at IBM, Jim and other collaborators at IBM developed variants of the Mean Opinion Scale (MOS) that had first been published by others in the 1990s. The MOS-X2 is the culmination of that research, a four-item questionnaire that assesses four key characteristics of user experiences with synthetic voices: Intelligibility, Naturalness, Prosody, and Social Impression.
Key Characteristics
- Measures: The perceived intelligibility, naturalness, prosody, and social impression of synthetic voices
- Number of items: 4
- Reliability: 0.85
- Types of Validity: Content, construct, concurrent
- Number of subscales: 4 (single-item scales)
- Interpretative norms: Yes
- Development method: Classical test theory
Key Links & Publications
- Lewis, J. R. (2018). Investigating MOS-X Ratings of Synthetic and Human Voices. Voice Interaction Design, 2(2), 1–22.
- The Evolution of the Mean Opinion Scale: From MOS-R to MOS-X2
SUISQ-R
The original version of the Speech User Interface Service Quality (SUISQ) questionnaire was developed at IBM and published by Melanie Polkosky in 2008. During its development, participants rated the quality of recorded interactions rather than interactions in which they participated, leaving open the question of the extent to which the findings would generalize to personal as opposed to observed interactions. Collaborating at State Farm, Jim and Mary Hardzinski collected SUISQ data in a large-sample usability study and (1) replicated the factor structure of the original and (2) used item analysis to reduce the questionnaire from 25 to 14 items (getting the SUISQ-R) while still adequately measuring its four subscales: User Goal Orientation, Customer Service Behaviors, Speech Characteristics, and Verbosity.
Key Characteristics
- Measures: Service quality of speech applications
- Number of items: 14
- Reliability: 0.88
- Types of Validity: Content, construct, concurrent
- Number of subscales: 4
- User Goal Orientation: Reliability = 0.91
- Customer Service Behavior: Reliability = 0.88
- Speech Characteristics: Reliability = 0.80
- Verbosity: Reliability = 0.67
- Interpretative norms: No
- Development method: Classical test theory
Key Links & Publications
- Lewis, J. R., & Hardzinski, M. L. (2015). Investigating the Psychometric Properties of the Speech User Interface Service Quality Questionnaire. International Journal of Speech Technology, 18(3), 479–487.
EMO
The Emotional Metric Outcomes (EMO) questionnaire was also developed while Jim was consulting at State Farm. His collaborators at State Farm wanted a standardized questionnaire for assessing the emotional consequences of interaction with a company. They published the EMO in both long (16 items) and short (8 items) versions, measuring four subscales: Positive Relationship Affect, Negative Relationship Affect, Positive Personal Affect, and Negative Personal Affect. The key characteristics below are for the more efficient short version.
Key Characteristics
- Measures: Emotional consequence of interacting with a company
- Number of items: 8
- Reliability: 0.88
- Types of Validity: Content, construct, concurrent
- Number of subscales: 4
- Positive Relationship Affect: 0.89
- Negative Relationship Affect: 0.72
- Positive Personal Affect: 0.83
- Negative Personal Affect: 0.82
- Interpretative norms: No
- Development method: Classical test theory
Key Links & Publications
- Lewis, J. R., & Mayes, D. K. (2014). Development and Psychometric Evaluation of the Emotional Metric Outcomes (EMO) Questionnaire. International Journal of Human-Computer Interaction, 30, 685–702.
- Lewis, J. R., Brown, J., & Mayes, D. K. (2015). Psychometric Evaluation of the EMO and the SUS in the Context of a Large-Sample Unmoderated Usability Study. International Journal of Human-Computer Interaction, 31(8), 545-553.
mTAM
The mTAM is a modified version of the TAM (Technical Acceptance Model), a questionnaire developed in the 1990s to assess the drivers of technology acceptance. In its original version, the TAM had 12 items measuring two subscales, Perceived Ease-of-Use and Perceived Usefulness, with items worded to focus on potential future use. For the mTAM, the only modification was to change the focus to ratings of actual use. Note that we do not use this as a practical UX questionnaire, but we have used it when exploring how other standardized metrics work within the Technology Acceptance Model.
Key Characteristics
- Measures: Perceived ease of use and perceived usefulness
- Number of items: 12
- Reliability: 0.95
- Types of Validity: Content, construct, concurrent
- Number of subscales: 2
- Perceived Ease of Use: Reliability = 0.95
- Perceived Usefulness: Reliability = 0.95
- Interpretative norms: No
- Development method: Classical test theory
Key Links & Publications
- Lah, U., Lewis, J. R., & Šumak, B. (2020). Perceived Usability and the Modified Technology Acceptance Model. International Journal of Human-Computer Interaction, 36(13), 1216–1230.
- Lewis, J. R. (2019). Comparison of Four TAM Item Formats: Effect of Response Option Labels and Order. Journal of Usability Studies, 4(14), 224–235.
SUS
No, we didn’t develop the System Usability Scale (SUS)—that honor goes to John Brooke—but between 2010 and 2019, we conducted extensive research to improve its interpretability and flexibility, with publications listed in the Key Links below.
Key Characteristics
- Measures: Perceived usability
- Number of items: 10
- Reliability: 0.91
- Types of Validity: Content, construct, concurrent
- Number of subscales: 0
- Interpretative norms: Yes
- Development method: Classical test theory
Key Links & Publications
- Sauro, J. (2011). A Practical Guide to the System Usability Scale: Background, Benchmarks & Best Practices. MeasuringU Press. Note: First book about the SUS, focusing on its measurement characteristics and practical use, including the curved grading scale developed at MeasuringU.
- Sauro, J., & Lewis, J.R. (2011). When Designing Usability Questionnaires, Does It Hurt to Be Positive? In Proceedings of CHI 2011 (pp. 2215–2223). Association for Computing Machinery. Honorable Mention for Best Paper award.
- Sauro, J., & Lewis, J. R. (2012/2016). Quantifying the User Experience: Practical Statistics for User Research. Morgan Kaufmann. Note: Extensive coverage of SUS research and use in Chapter 8.
- Lewis, J. R., & Sauro, J. (2017). Revisiting the Factor Structure of the System Usability Scale. Journal of Usability Studies, 12(4), 183–192.
- Lewis, J. R., & Sauro, J. (2017). Can I Leave This One Out? The Effect of Dropping an Item from the SUS. Journal of Usability Studies, 13(1), 38–46.
- Lewis, J. R., & Sauro, J. (2018). Item Benchmarks for the System Usability Scale. Journal of Usability Studies, 13(3), 158–167.
- Lewis, J. R. (2018). The System Usability Scale: Past, Present, and Future. International Journal of Human-Computer Interaction, 34(7), 577–590.
- Extensive publication of articles on the SUS at MeasuringU.com. Some recent examples are:
2000–2009
In this decade, Jeff introduced the Single Ease Question (SEQ) and a method for combining multiple UX metrics into a Single Usability Metric (SUM). Jim and colleagues at IBM investigated and published enhancements to the Mean Opinion Scale (MOS), most notably, the MOS-X.
SEQ®
The concepts of ease of use and usability are deeply intertwined. No one knows who the first person was to ask someone to rate the ease of completing a task in a usability study, but in 2009, Jeff and Joe Dumas were the first to publish a version of the item that is now known as the Single Ease Question (SEQ). Since its initial publication, it has undergone some cosmetic changes, and research has established good norms for its interpretation.
Key Characteristics
- Measures: Perceived ease of completing a task in a usability study
- Number of items: 1
- Reliability: 0.80 (test-retest)
- Types of Validity: Content, concurrent
- Number of subscales: 0
- Interpretative norms: Yes
- Development method: Classical test theory
Key Links & Publications
- Sauro, J., & Dumas, J. (2009). Comparison of Three One-Question, Post-Task Usability Questionnaires. In Proceedings of CHI 2009 (pp. 1599–1608). Association for Computing Machinery. Nominated for Best Paper Award.
- The Evolution of the Single Ease Question (SEQ)
SUM
The Single Usability Metric (SUM) is not a standardized questionnaire, so there is no list of key characteristics in this section. Instead, it is a standardized method for combining prototypical usability metrics such as completion rates, completion times, and subjective ratings (e.g., satisfaction or ease), an important step toward a unified measure of usability that we continue to use in benchmark studies.
Key Links & Publications
- Sauro, J., & Kindlund, E. (2005). Using a Single Usability Metric (SUM) to Compare the Usability of Competing Products. In Proceedings of HCII 2005 (pp. 235–244). Human Computer Interaction International.
- Sauro, J., & Lewis, J.R. (2009). Correlations among Prototypical Usability Metrics: Evidence for the Construct of Usability. In Proceedings of CHI 2009 (pp. 1609–1618). Association for Computing Machinery. Nominated for Best Paper Award.
- 10 Things to Know about the Single Usability Metric (SUM)
MOS-X
The Mean Opinion Scale-Expanded (MOS-X) is a 15-item questionnaire developed at IBM to obtain listeners’ subjective assessments of synthetic speech on four dimensions: Intelligibility, Naturalness, Prosody, and Social Impression. In current practice, we have replaced this questionnaire with the four-item MOS-X2 (described above).
Key Characteristics
- Measures: The perceived intelligibility, naturalness, prosody, and social impression of synthetic voices
- Number of items: 15
- Reliability: 0.93
- Types of Validity: Content, construct, concurrent
- Number of subscales: 4
- Intelligibility: 0.88
- Naturalness: 0.86
- Prosody: 0.86
- Social Impression: 0.86
- Interpretative norms: Yes
- Development method: Classical test theory
Key Links
- Polkosky, M. D., & Lewis, J. R. (2003). Expanding the MOS: Development and Psychometric Evaluation of the MOS-R and MOS-X. International Journal of Speech Technology, 6(2), 161–182.
- Lewis, J. R. (2018). Investigating MOS-X Ratings of Synthetic and Human Voices. Voice Interaction Design, 2(2), 1–22.
1990–1999
This was the decade in which Jim published his first three standardized UX questionnaires at IBM: ASQ, PSSUQ, and CSUQ. They continue to be used in UX research and practice, but we don’t use them at MeasuringU because they’re a bit antiquated in style and content (included in this article for historical completeness).
ASQ
The After-Scenario Questionnaire (ASQ) was an early attempt to develop a concise but comprehensive three-item questionnaire to administer after tasks in usability studies with ratings of satisfaction with ease, completion time, and support information.
Key Characteristics
- Measures: Task-level satisfaction
- Number of items: 3
- Reliability: 0.93
- Types of Validity: Content, construct, concurrent
- Number of subscales: 0
- Interpretative norms: No
- Development method: Classical test theory
Key Links & Publications
- Lewis, J. R. (1991). Psychometric Evaluation of an After-Scenario Questionnaire for Computer Usability Studies: The ASQ. SIGCHI Bulletin, 23(1), 78–81.
- Lewis, J. R. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use. International Journal of Human-Computer Interaction, 7(1), 57–78.
PSSUQ
The Post-Study System Usability Questionnaire (PSSUQ) was an early standardized usability questionnaire to administer at the end of a usability study containing three subscales: System Usefulness, Information Quality, and Interface Quality (most recently slightly redesigned as Version 3).
Key Characteristics
- Measures: Perceived usability
- Number of items: 16
- Reliability: 0.96
- Types of Validity: Content, construct, concurrent
- Number of subscales: 3
- System Usefulness: Reliability = 0.96
- Information Quality: Reliability = 0.92
- Interface Quality: Reliability = 0.83
- Interpretative norms: No
- Development method: Classical test theory
Key Links & Publications
- Lewis, J. R. (1992). Psychometric Evaluation of the Post-Study System Usability Questionnaire: The PSSUQ. In Proceedings of the 36th Annual Meeting of the Human Factors Society (pp. 1259–1263). HFES.
- Lewis, J. R. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use. International Journal of Human-Computer Interaction, 7(1), 57–78.
- Lewis, J. R. (2002). Psychometric Evaluation of the PSSUQ Using Data from Five Years of Usability Studies. International Journal of Human-Computer Interaction, 14(3), 463–488.
- Lewis, J. R. (2019). Using the PSSUQ and CSUQ in User Experience Research and Practice. MeasuringU Press.
CSUQ
The Computer System Usability Questionnaire (CSUQ) is a version of the PSSUQ modified for use as a general standardized UX questionnaire outside of the confines of a usability test (achieved primarily by changing references to “tasks and scenarios” to “work”). Its key characteristics are very close to those found for the PSSUQ.
Key Links & Publications
- Lewis, J. R. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use. International Journal of Human-Computer Interaction, 7(1), 57–78.
- Lewis, J. R. (2019). Measuring Perceived Usability: SUS, UMUX, and CSUQ Ratings for Four Everyday Products. International Journal of Human-Computer Interaction, 35(15), 1404–1419.
- Lewis, J. R. (2019). Using the PSSUQ and CSUQ in User Experience Research and Practice. MeasuringU Press.
Summary
From 1990 to 2025, we developed and published 16 standardized UX questionnaires from the general measurement of perceived usability to specialized measurement of the UX of websites and speech applications. Table 1 lists those questionnaires and their key characteristics in descending chronology.
| Questionnaire | Measures | Number of Items | Reliability | Number of Subscales | Interpretative Norms | Development Method |
|---|---|---|---|---|---|---|
| UX-Lite | Perceived ease and usefulness | 2 | 0.75 | 2 | Yes | Classical Test Theory |
| SUPR-Qm V2 | Intensity of UX of mobile apps | 5 | 0.83 | 0 | Yes | Rasch Scaling |
| TAC-10 | Level of tech savviness | 10 | 0.67 | 0 | Yes | Rasch Scaling |
| PWCQ | Perceived website clutter | 5 | 0.90 | 2 | No | Classical Test Theory |
| SUPR-Q | Quality of UX of websites | 8 | 0.90 | 4 | Yes | Classical Test Theory |
| SUPR-Qm | Intensity of UX of mobile apps | 16 | 0.94 | 0 | Yes | Rasch Scaling |
| UMUX-LITE | Perceived ease and usefulness | 2 | 0.83 | 2 | No | Classical Test Theory |
| MOS-X2 | UX of synthetic voices | 4 | 0.85 | 4 | Yes | Classical Test Theory |
| SUISQ-R | Service quality of speech apps | 14 | 0.88 | 4 | No | Classical Test Theory |
| EMO | Emotional interaction | 8 | 0.88 | 4 | No | Classical Test Theory |
| mTAM | Perceived ease and usefulness | 12 | 0.95 | 2 | No | Classical Test Theory |
| SEQ | Perceived task ease | 1 | 0.80 | 0 | Yes | Classical Test Theory |
| MOS-X | UX of synthetic voices | 15 | 0.93 | 4 | Yes | Classical Test Theory |
| ASQ | Task-level usability | 3 | 0.93 | 0 | No | Classical Test Theory |
| PSSUQ | Study-level usability | 16 | 0.96 | 3 | No | Classical Test Theory |
| CSUQ | Computer usability | 16 | 0.97 | 3 | No | Classical Test Theory |
Table 1: Summary of standardized questionnaires created by MeasuringU researchers (all questionnaires have published evidence of content and concurrent validity; all except the SEQ have construct validity).
In addition to these questionnaires, the SUM, a method for combining prototypical usability metrics, was created at MeasuringU.
And even though we did not create the SUS, we have published numerous studies on making it more flexible and interpretable (e.g., curved grading scale and item benchmarks).


