What’s the return on investment (ROI) for UX research activities?
Do better user experiences lead to desirable business outcomes?
If a product is more useful and more usable, then people should be more likely to purchase, use, and recommend it.
But how can we quantify these links?
Understanding the ROI of UX research starts with being able to make clear connections between UX and business outcomes that include both leading (behavioral intentions) and lagging (purchase, usage, and recommendation) data indicators.
The connection certainly seems logical, but to what extent do key UX metrics like perceived ease of use and perceived usefulness (both measured by the UX-Lite) account for variations in outcome metrics (e.g., ratings of overall experience, intention to use, likelihood to recommend)? To understand how well the UX-Lite can measure tech adoption, we first need to dig into what predicts technology adoption (often called acceptance). In this article, we’ll explore how the components of the UX-Lite (perceived ease and perceived usefulness) map to existing measures of technology adoption.
Ease and Usefulness
The constructs of perceived ease of use (PEoU) and perceived usefulness (PU) are most associated with the Technology Acceptance Model (TAM). According to the TAM, PEoU and PU are the primary factors that affect a user’s intention to use a technology. As originally written, the TAM consists of 12 items, six of which measured PEoU and six that measured PU. The items were worded to measure the extent to which people expect to experience an as-yet-unused product (e.g., “Using [this product] in my job would enable me to accomplish tasks more quickly”). To enable the use of the TAM items with people who already have experience using a product, Lewis (2019) slightly modified the wording of the items (e.g., “Using [this product] enables me to accomplish tasks more quickly”) to create a modified TAM (mTAM) questionnaire.
The construct of perceived usability is part of the classical conception of usability, which is an important component of UX. In a paper we published in 2009, we demonstrated significant relationships between perceived usability and objective usability metrics (e.g., task completion times, successful task completion rates, and errors). When you examine the items of the most frequently used standardized measure of perceived usability, the System Usability Scale (SUS), one of its ten items directly references ease of use (“I thought the system was easy to use”). Research on which SUS items are the strongest drivers of its overall score has shown that responses to the ease item account for over 90% of the variation in the overall SUS. This suggests that perceived ease of use and perceived usability are probably not distinct constructs, and in fact, they might be the same construct with different names.
This brings us to the UX-Lite, a standardized UX questionnaire that has just two items (Figure 1), one directly referencing perceived ease of use and the other indirectly referring to perceived usefulness (the extent to which product features meet the user’s needs), making it essentially a mini-TAM. Originally developed as an alternative measure of perceived usability that would be shorter than the SUS, it has evolved from a four-item questionnaire (the UMUX) to its current form.
In this article, we describe a study we conducted to explore the similarities among these ways to measure perceived ease of use and perceived usefulness and their connections to key outcome metrics.
The Study Design: UX Metrics from 60 Software Products
Roughly every two years, we conduct retrospective benchmarking surveys to measure SUS and UX-Lite, along with ratings of overall experience and likelihood-to-recommend (LTR), for about 60 software products (e.g., PowerPoint, Salesforce). For the 2020 surveys of business and consumer software, we also collected the mTAM and a three-item behavioral intention (BI) measure made up of the average of two items from TAM research (Venkatesh & Davis, 2000; “Assuming I had access to [Product], I intend to use it.”; “Given that I had access to [Product], I predict that I would use it.”) and a similar third item that we routinely collect (“I plan to use [Product] in the next three months”). At the beginning of the survey, participants indicated which products they had used in the past year and were randomly assigned one of those to evaluate.
We received complete sets of responses from 2,412 participants. The participants were members of an online consumer panel, all from the United States. The percentages of males and females were about equal, with 66% below the age of 35. Respondents volunteered to participate in this research and were paid for participation by the online consumer panel.
The Results: Reliability, Validity, and Structural Equation Modeling
To assess how well the UX-Lite performs compared to the mTAM and SUS, we used three assessments. First, we analyzed the reliability of the multi-item questionnaires. Second, we assessed the validity of measures of PEoU and PU using correlations. Third, we assessed how well the UX-Lite’s ease and usefulness items predicted the outcome measures of overall experience, behavioral intention to use, and likelihood to recommend using a structural equation model (SEM) compared to the mTAM and SUS.
Assessment 1: Is the UX-Lite Reliable?
All multi-item questionnaires had coefficient alpha values (a measure of internal consistency that is a lower-bound estimate of reliability) consistent with the prior literature. A common criterion for acceptable reliability is a coefficient alpha equal to or greater than 0.70. The values of coefficient alpha computed for the questionnaires were:
- SUS: 0.89
- UX-Lite: 0.70
- mTAM: 0.96 (with 0.95 and 0.95 respectively for PU and PEoU)
- BI: 0.94
Result: The UX-Lite met the criterion for acceptable reliability even though it has only two items. This is encouraging because coefficient alpha tends to be larger when there are more items.
Assessment 2: Is the UX-Lite Valid?
Concurrent Validity
A common minimum criterion for evidence of concurrent validity is a correlation greater than 0.30 between metrics. The correlations between SUS, UX-Lite (combined and by component), mTAM (combined and by component), overall experience, LTR, and BI ranged from 0.468 to 0.800 (all p < .01), so all correlations provided evidence of concurrent validity.
Convergent and Divergent Validity
As shown by the non-overlapping 95% confidence intervals in Figure 2, the SUS had stronger correlations with the mTAM and UX-Lite measures of perceived ease (evidence of convergent validity) than with their measures of perceived usefulness (evidence of divergent validity).
Result: The results of these correlation analyses provide evidence of the concurrent, convergent, and divergent validity of the UX-Lite.
Assessment 3: Does the UX-Lite Predict Overall Experience, LTR, and Intention to Use?
Figure 3 shows three structural equation models created with AMOS. The first one (Model A) used the components of the mTAM as drivers of overall experience, LTR, and BI. In Model B, mTAM PEoU was replaced with the SUS. In Model C, the mTAM PEoU and PU components were replaced with the UX-Lite Ease and Usefulness components.
The values on double-headed arrows are correlations between the primary drivers, values on single-headed arrows (links) are standardized estimates of the strengths of relationships between variables (interpreted like beta weights in multiple regression), and values above the upper-right-hand corners of outcome metrics are squared multiple correlations (interpreted like coefficients of determination in multiple regression—i.e., percentage of variance accounted for, also designated as R2).
For example, in Model A, the correlation between mTAM PEoU and mTAM PU is 0.72, the strength of the connection between mTAM PEoU and BI (to use) is just 0.11 but between mTAM PEoU and overall experience (OverExp) is .42, and the percentage of variation in OverExp accounted for in the model is 59%. All correlations, standardized estimates, and squared multiple correlations in the models were statistically significant (p < .0001).
For assessing the goodness of fit of these types of models, we followed the advice of Jackson et al. (2009), who recommended reporting fit statistics that have different measurement properties such as the comparative fit index (CFI: a score of 0.90 or higher indicates good fit), the root-mean-square error of approximation (RMSEA: values less than 0.08 indicate acceptable fit), and the Bayesian information criterion (BIC: lower values are preferred). As shown in Figure 3, all three models had acceptable fit statistics, with Model C (UX-Lite drivers) nominally the best. The squared multiple correlations in Model C were lower than those in the other two models, but in most cases only by one or two percentage points (about five percentage points lower for BI relative to Model A). There was some variation from model to model in the magnitudes of correlations, squared multiple correlations, and standardized estimates, but the relative patterns from model to model were generally consistent. For example, the strength of the connection between perceived ease/usability and the behavioral intention to use tended to be relatively weak, but the connection between perceived usefulness and the behavioral intention to use tended to be relatively strong.
An unexpected outcome was a significantly higher correlation between mTAM PEoU and mTAM PU (0.72, 95% confidence interval from 0.70 to 0.74) in Model A than for the corresponding predictors in Models B and C: respectively, a correlation of 0.55 (95% confidence interval from 0.52 to 0.58) between SUS and mTAM PU, and a correlation of 0.52 (95% confidence interval from 0.49 to 0.55) between UX-Lite Ease and UX-Lite Usefulness.
Result: All three structural equation models had good fit statistics, but the UX-Lite model was nominally the best.
Summary and Discussion
To understand how well the UX-Lite could predict technology adoption, we conducted three assessments using data from 2,412 respondents to see how well it matched or bested items from a modified version of the Technology Acceptance Model (mTAM) that measured perceived ease of use and perceived usefulness. Our key findings were:
Can the UX-Lite measure tech adoption?… Yes, it can. To establish the UX-Lite as an adequate measure of tech adoption, we needed to show that it was reliable, valid, and could fit expected prediction models. The UX-Lite met all these criteria.
The UX metrics are reliable and valid. All metrics used in the surveys (SUS, UX-Lite, and mTAM, including its PU and PEoU subscales) had acceptably high levels of reliability (coefficient alpha from .70 to .96) and acceptably high and statistically significant levels of concurrent validity (all r > .45, p < 0.01). Analyses of correlations between the SUS and the mTAM and UX-Lite measures of perceived ease and usefulness provided evidence of convergent and divergent validity (SUS correlations with mTAM PEoU and UX-Lite Ease were significantly higher than correlations of SUS with mTAM PU and UX-Lite Usefulness).
Structural equation models strongly and consistently showed perceived ease and perceived usefulness drive experiential and intentional outcomes. For the three structural equation models, all correlations, standardized estimates, and squared multiple correlations were statistically significant (p < .01). All models had acceptable fit statistics.
Perceived usefulness tends to be a stronger driver than perceived ease. There was some variation in the standardized estimates for the effects of perceived ease and usefulness on overall experience. Both were significant drivers, but the effect of perceived usefulness usually tended to be greater. Perceived usefulness directly affected ratings of LTR and BI (to use), and indirectly affected them through its effect on overall experience. We did not find the link between perceived ease and LTR to be significant, so it was excluded from the model, and the standardized estimate between perceived ease and BI (to use) was relatively small. Perceived ease, however, had an indirect effect on LTR and BI (to use) through its effect on overall experience.
The SUS and mTAM PEoU were essentially interchangeable in the models. The structural equation models were similar regarding the magnitudes of standardized estimates and squared multiple correlations when substituting the SUS for PEoU. Despite their historical and structural differences, both appeared to be measuring the same or almost identical underlying constructs, suggesting there might not be a substantive difference between perceived usability and perceived ease of use.
Overall, substitution of the UX-Lite ease and usefulness items for the mTAM PEoU and PU components resulted in reasonably consistent models. Most standardized estimates (link weights) and squared multiple correlations were consistent in the two models. The strength of the connections of ease and usefulness with overall experience were statistically significant for both models, but for the mTAM the link weights were about equal while for the UX-Lite the influence of usefulness was about twice that of ease. This may be a consequence of the UX-Lite items being more independent (significantly less correlated) than the mTAM components.
Bottom line: This research supports UX practitioners by (1) demonstrating the importance of work that improves perceptions of product ease and usefulness and (2) demonstrating that UX researchers and practitioners can use the two-item UX-Lite in their work to effectively and efficiently measure perceived ease and usefulness.
For more details about this study, see the paper we published in the International Journal of Human-Computer Interaction (Lewis & Sauro, 2023).