What Metrics Has MeasuringU Created?

Jeff Sauro, PhD • Jim Lewis, PhD

Feature image showing a building with MeasuringU logo on itAt MeasuringU®, we don’t just use UX metrics—we create them.

But what have we created, and what have we just used or extended?

Across our combined careers, we (Jeff and Jim) have published 16 psychometrically qualified UX metrics (both creating original and modifying existing questionnaires) plus a method for combining prototypical usability metrics, and we have made major contributions to a popular standardized UX questionnaire that we did not create, the System Usability Scale (SUS).

In this article, we briefly describe each of these metrics (presented in roughly reverse chronological order by decade) and provide key links to more information about them (so you won’t need to ask ChatGPT and risk hallucinated references).

2020–2025

From 2020 to 2025, we developed and published four standardized UX questionnaires: UX-Lite®, SUPR-Qm® V2, TAC-10™, and PWCQ.

UX-Lite®

The UX-Lite has its roots in the UMUX-LITE (more on the UMUX-LITE below). It’s a two-item questionnaire that is essentially a miniature version of the Technology Acceptance Model (TAM), assessing the perceived ease-of-use and perceived usefulness of products and services with two five-point scales. It’s becoming an increasingly popular metric in UX research and practice.

From 2020 to 2024, we published 15 articles on the UX-Lite, many of which explored different ways to phrase the “usefulness” item because its original wording was overly complex. In addition to demonstrating the reliability and validity of the UX-Lite, it has also proved to be useful in regression and structural equation modeling of higher-level outcome metrics like ratings of overall experience, behavioral intentions (e.g., likelihood to recommend, likelihood to reuse), and actual user behaviors.

Key Characteristics

  • Measures: Perceived ease of use and perceived usefulness
  • Number of items: 2
  • Reliability: 0.75 (coefficient alpha unless otherwise specified)
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 2 (single-item scales)
  • Interpretive norms: Yes
  • Development method: Classical test theory

Key Links and Publications

SUPR-Qm® V2

The mobile app experience is a unique and defining aspect of our interactions with our devices. While the experience shares many characteristics with using software and websites on a traditional monitor, the mobility, screen size, and interaction style make the experience distinct. Consequently, we developed a questionnaire, the SUPR-Qm, to measure attitudes toward the mobile app user experience. In 2025, we published the second version of the SUPR-Qm, reducing the number of items from the original 16 to five.

Key Characteristics

  • Measures: Intensity of the UX of mobile apps
  • Number of items: 5
  • Reliability: 0.83
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 0
  • Interpretative norms: Yes
  • Development method: Rasch scaling

Key Links & Publications

TAC-10™

We based the TAC-10 on research conducted at MeasuringU from 2015 through 2023 and presented it at UXPA 2024. The TAC-10 is a select-all-that-apply checklist of ten different technical activities. We published six blog articles in 2023 detailing its development, including why there was a need for a measure of tech savviness in UX research (to enable discrimination of interface and participant characteristics when analyzing UX data) and how to use the TAC-10 to classify participants into different levels of tech savviness.

Key Characteristics

  • Measures: Level of tech savviness
  • Number of items: 10
  • Reliability: 0.67 (Spearman–Brown for dichotomous data)
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 0
  • Interpretative norms: Yes
  • Development method: Rasch scaling

Key Links & Publications

PWCQ

In our UX research practice, we frequently encounter users and designers who criticize website interfaces for being cluttered and stakeholders who worry about the experiential and business consequences of a cluttered website. But what exactly does it mean for a website to appear cluttered? To answer this question, we developed the Perceived Website Clutter Questionnaire (PWCQ), a five-item questionnaire with two subscales: Content Clutter and Design Clutter.

Key Characteristics

  • Measures: The perceived clutter of websites
  • Number of items: 5
  • Reliability: 0.90
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 2
    • Content Clutter: Reliability = 0.91
    • Design Clutter: Reliability = 0.88
  • Interpretative norms: No
  • Development method: Classical test theory

Key Links & Publications

2010–2019

From 2010 through 2019, we (Jeff and Jim) both collaborated and worked separately on the creation and publication of seven standardized UX questionnaires, plus the publication of books, papers, and numerous articles on how to use and interpret the SUS.

SUPR-Q®

At MeasuringU, we originally benchmarked websites using the SUS. But we knew that the quality of the website user experience was more than just usability, so we developed the Standardized User Experience Percentile Rank Questionnaire (SUPR-Q) in 2011 and published our findings in 2015. The SUPR-Q is a short (eight-item) questionnaire that measures perceptions of Usability, Trust, Appearance, and Loyalty for websites. The combined score provides an overall measure of the quality of the website user experience. The normative percentile database contains responses from more than 10,000 participants and 150 websites (updated on an ongoing basis, about once per quarter).

Key Characteristics

  • Measures: Perceptions of the quality of UX with websites
  • Number of items: 8
  • Reliability: 0.90
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 4
    • Usability: Reliability = 0.88
    • Trust: Reliability = 0.87
    • Appearance: Reliability = 0.80
    • Loyalty: Reliability = 0.73
  • Interpretive norms: Yes
  • Development method: Classical test theory

Key Links & Publications

SUPR-Qm®

Our original version of the mobile app questionnaire had 16 items selected from a larger set using Rasch scaling. We list this here for historical purposes, but our current practice is to use the SUPR-Qm V2 (see above).

Key Characteristics

  • Measures: Intensity of the UX of mobile apps
  • Number of items: 16
  • Reliability: 0.94
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 0
  • Interpretative norms: Yes
  • Development method: Rasch scaling

Key Links & Publications

UMUX-LITE

The UMUX-LITE is a mini-TAM with two seven-point items, assessing perceived ease of use and perceived usefulness. It was derived from the four-item UMUX (Usability Metric for User Experience) when Jim was at IBM (in collaboration with Brian Utesch and Deb Maher) and is the predecessor to the UX-Lite. At MeasuringU, we prefer the UX-Lite (described above) due to its enhanced flexibility, but the UMUX-LITE is also used in current UX research and practice.

Key Characteristics

  • Measures: Perceived ease of use and usefulness
  • Number of items: 2
  • Reliability: 0.83
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 2 (single-item scales)
  • Interpretative norms: No
  • Development method: Classical test theory

Key Links & Publications

MOS-X2

As part of his work on speech systems at IBM, Jim and other collaborators at IBM developed variants of the Mean Opinion Scale (MOS) that had first been published by others in the 1990s. The MOS-X2 is the culmination of that research, a four-item questionnaire that assesses four key characteristics of user experiences with synthetic voices: Intelligibility, Naturalness, Prosody, and Social Impression.

Key Characteristics

  • Measures: The perceived intelligibility, naturalness, prosody, and social impression of synthetic voices
  • Number of items: 4
  • Reliability: 0.85
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 4 (single-item scales)
  • Interpretative norms: Yes
  • Development method: Classical test theory

Key Links & Publications

SUISQ-R

The original version of the Speech User Interface Service Quality (SUISQ) questionnaire was developed at IBM and published by Melanie Polkosky in 2008. During its development, participants rated the quality of recorded interactions rather than interactions in which they participated, leaving open the question of the extent to which the findings would generalize to personal as opposed to observed interactions. Collaborating at State Farm, Jim and Mary Hardzinski collected SUISQ data in a large-sample usability study and (1) replicated the factor structure of the original and (2) used item analysis to reduce the questionnaire from 25 to 14 items (getting the SUISQ-R) while still adequately measuring its four subscales: User Goal Orientation, Customer Service Behaviors, Speech Characteristics, and Verbosity.

Key Characteristics

  • Measures: Service quality of speech applications
  • Number of items: 14
  • Reliability: 0.88
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 4
    • User Goal Orientation: Reliability = 0.91
    • Customer Service Behavior: Reliability = 0.88
    • Speech Characteristics: Reliability = 0.80
    • Verbosity: Reliability = 0.67
  • Interpretative norms: No
  • Development method: Classical test theory

Key Links & Publications

EMO

The Emotional Metric Outcomes (EMO) questionnaire was also developed while Jim was consulting at State Farm. His collaborators at State Farm wanted a standardized questionnaire for assessing the emotional consequences of interaction with a company. They published the EMO in both long (16 items) and short (8 items) versions, measuring four subscales: Positive Relationship Affect, Negative Relationship Affect, Positive Personal Affect, and Negative Personal Affect. The key characteristics below are for the more efficient short version.

Key Characteristics

  • Measures: Emotional consequence of interacting with a company
  • Number of items: 8
  • Reliability: 0.88
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 4
    • Positive Relationship Affect: 0.89
    • Negative Relationship Affect: 0.72
    • Positive Personal Affect: 0.83
    • Negative Personal Affect: 0.82
  • Interpretative norms: No
  • Development method: Classical test theory

Key Links & Publications

mTAM

The mTAM is a modified version of the TAM (Technical Acceptance Model), a questionnaire developed in the 1990s to assess the drivers of technology acceptance. In its original version, the TAM had 12 items measuring two subscales, Perceived Ease-of-Use and Perceived Usefulness, with items worded to focus on potential future use. For the mTAM, the only modification was to change the focus to ratings of actual use. Note that we do not use this as a practical UX questionnaire, but we have used it when exploring how other standardized metrics work within the Technology Acceptance Model.

Key Characteristics

  • Measures: Perceived ease of use and perceived usefulness
  • Number of items: 12
  • Reliability: 0.95
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 2
    • Perceived Ease of Use: Reliability = 0.95
    • Perceived Usefulness: Reliability = 0.95
  • Interpretative norms: No
  • Development method: Classical test theory

Key Links & Publications

SUS

No, we didn’t develop the System Usability Scale (SUS)—that honor goes to John Brooke—but between 2010 and 2019, we conducted extensive research to improve its interpretability and flexibility, with publications listed in the Key Links below.

Key Characteristics

  • Measures: Perceived usability
  • Number of items: 10
  • Reliability: 0.91
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 0
  • Interpretative norms: Yes
  • Development method: Classical test theory

Key Links & Publications

2000–2009

In this decade, Jeff introduced the Single Ease Question (SEQ) and a method for combining multiple UX metrics into a Single Usability Metric (SUM). Jim and colleagues at IBM investigated and published enhancements to the Mean Opinion Scale (MOS), most notably, the MOS-X.

SEQ®

The concepts of ease of use and usability are deeply intertwined. No one knows who the first person was to ask someone to rate the ease of completing a task in a usability study, but in 2009, Jeff and Joe Dumas were the first to publish a version of the item that is now known as the Single Ease Question (SEQ). Since its initial publication, it has undergone some cosmetic changes, and research has established good norms for its interpretation.

Key Characteristics

  • Measures: Perceived ease of completing a task in a usability study
  • Number of items: 1
  • Reliability: 0.80 (test-retest)
  • Types of Validity: Content, concurrent
  • Number of subscales: 0
  • Interpretative norms: Yes
  • Development method: Classical test theory

Key Links & Publications

SUM

The Single Usability Metric (SUM) is not a standardized questionnaire, so there is no list of key characteristics in this section. Instead, it is a standardized method for combining prototypical usability metrics such as completion rates, completion times, and subjective ratings (e.g., satisfaction or ease), an important step toward a unified measure of usability that we continue to use in benchmark studies.

Key Links & Publications

MOS-X

The Mean Opinion Scale-Expanded (MOS-X) is a 15-item questionnaire developed at IBM to obtain listeners’ subjective assessments of synthetic speech on four dimensions: Intelligibility, Naturalness, Prosody, and Social Impression. In current practice, we have replaced this questionnaire with the four-item MOS-X2 (described above).

Key Characteristics

  • Measures: The perceived intelligibility, naturalness, prosody, and social impression of synthetic voices
  • Number of items: 15
  • Reliability: 0.93
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 4
    • Intelligibility: 0.88
    • Naturalness: 0.86
    • Prosody: 0.86
    • Social Impression: 0.86
  • Interpretative norms: Yes
  • Development method: Classical test theory

Key Links

1990–1999

This was the decade in which Jim published his first three standardized UX questionnaires at IBM: ASQ, PSSUQ, and CSUQ. They continue to be used in UX research and practice, but we don’t use them at MeasuringU because they’re a bit antiquated in style and content (included in this article for historical completeness).

ASQ

The After-Scenario Questionnaire (ASQ) was an early attempt to develop a concise but comprehensive three-item questionnaire to administer after tasks in usability studies with ratings of satisfaction with ease, completion time, and support information.

Key Characteristics

  • Measures: Task-level satisfaction
  • Number of items: 3
  • Reliability: 0.93
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 0
  • Interpretative norms: No
  • Development method: Classical test theory

Key Links & Publications

PSSUQ

The Post-Study System Usability Questionnaire (PSSUQ) was an early standardized usability questionnaire to administer at the end of a usability study containing three subscales: System Usefulness, Information Quality, and Interface Quality (most recently slightly redesigned as Version 3).

Key Characteristics

  • Measures: Perceived usability
  • Number of items: 16
  • Reliability: 0.96
  • Types of Validity: Content, construct, concurrent
  • Number of subscales: 3
    • System Usefulness: Reliability = 0.96
    • Information Quality: Reliability = 0.92
    • Interface Quality: Reliability = 0.83
  • Interpretative norms: No
  • Development method: Classical test theory

Key Links & Publications

CSUQ

The Computer System Usability Questionnaire (CSUQ) is a version of the PSSUQ modified for use as a general standardized UX questionnaire outside of the confines of a usability test (achieved primarily by changing references to “tasks and scenarios” to “work”). Its key characteristics are very close to those found for the PSSUQ.

Key Links & Publications

Summary

From 1990 to 2025, we developed and published 16 standardized UX questionnaires from the general measurement of perceived usability to specialized measurement of the UX of websites and speech applications. Table 1 lists those questionnaires and their key characteristics in descending chronology.

QuestionnaireMeasuresNumber of ItemsReliabilityNumber of SubscalesInterpretative NormsDevelopment Method
UX-LitePerceived ease and usefulness20.752YesClassical Test Theory
SUPR-Qm V2Intensity of UX of mobile apps50.830YesRasch Scaling
TAC-10Level of tech savviness100.670YesRasch Scaling
PWCQPerceived website clutter50.902NoClassical Test Theory
SUPR-QQuality of UX of websites80.904YesClassical Test Theory
SUPR-QmIntensity of UX of mobile apps160.940YesRasch Scaling
UMUX-LITEPerceived ease and usefulness20.832NoClassical Test Theory
MOS-X2UX of synthetic voices40.854YesClassical Test Theory
SUISQ-RService quality of speech apps140.884NoClassical Test Theory
EMOEmotional interaction80.884NoClassical Test Theory
mTAMPerceived ease and usefulness120.952NoClassical Test Theory
SEQPerceived task ease10.800YesClassical Test Theory
MOS-XUX of synthetic voices150.934YesClassical Test Theory
ASQTask-level usability30.930NoClassical Test Theory
PSSUQStudy-level usability160.963NoClassical Test Theory
CSUQComputer usability160.973NoClassical Test Theory

Table 1: Summary of standardized questionnaires created by MeasuringU researchers (all questionnaires have published evidence of content and concurrent validity; all except the SEQ have construct validity).

In addition to these questionnaires, the SUM, a method for combining prototypical usability metrics, was created at MeasuringU.

And even though we did not create the SUS, we have published numerous studies on making it more flexible and interpretable (e.g., curved grading scale and item benchmarks).

0
    0
    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top