From email to web browsing to learning a second language, consumer software affects many aspects of modern life.
Research in information systems management has found that the fundamental UX metrics of perceived usability and perceived usefulness are key drivers of how likely it is that users will adopt new technology and recommend software.
For example, our research has shown that perceived usability accounts for between 30 and 60% of the variation in Net Promoter Scores. This is consistent with other research showing that positive experiences lead to product recommendation, and negative experiences lead to detraction.
We continue to benchmark the user experience of popular consumer software products. From May to July 2022, we collected our latest wave of data from 2,111 participants.
This was an update to previous reports on similar sets of products in 2017 and 2020 that used a similar methodology. This research also complements our 2022 business-focused software article and report.
The 41 software products include a mix of popular productivity, storage, and security software:
- Adobe Illustrator
- Adobe Photoshop
- Adobe Reader
- Adobe Premiere
- Amazon Music
- Google Calendar
- Google Chrome
- Google Docs
- Google Drive
- Google G Suite
- Google Hangouts
- Google Sheets
- Google Slides
- McAfee Antivirus
- Microsoft Edge
- Microsoft Excel
- Microsoft Office 365
- Microsoft Outlook
- Microsoft PowerPoint
- Microsoft Word
- Mozilla Thunderbird
- Norton Antivirus
- Yahoo Mail
Participants in the study were from the United States (98%) and the UK (2%). A bit over half (52%) were female, and they tended to be young (56% below the age of 35) and educated (93% had at least some college). Participants were asked to reflect on their most recent experiences with the software and answer a number of items, including the System Usability Scale (SUS), the UX-Lite, the standard likelihood-to-recommend (LTR) question used to compute the Net Promoter Score (NPS), and the Product-Market Fit (PMF) item. The full details are available in the report. Here are some highlights.
Net Promoter Scores
The most popular method to measure loyalty is the Net Promoter Score (NPS). It’s calculated using an eleven-point (0 to 10) likelihood-to-recommend question, with the NPS computed by subtracting the percent of detractors (those choosing 0–6) from the percent of promoters (9–10).
Across the 41 products, the average Net Promoter Score was 4%, ranging from −50% to 70%. This average is reasonably consistent with our 2020 result (1%). VLC (70%), Duolingo (54%), and Google Chrome (42%) had the highest scores; Yahoo Mail (−31%) and Microsoft Edge (−50%) had the lowest.
We have collected data from ten consumer software products every few years since 2011. From 2020 to 2022 the NPS for some products declined (TurboTax, Word, Excel, Dropbox, Mint, and Quicken) while others improved (Photoshop, PowerPoint, McAfee Antivirus, and iTunes).
One of the unexpected findings was a 13-point improvement for iTunes from 2020. After starting with an NPS of 28% in 2011, iTunes consistently dropped in our studies, reaching a low of −13% in 2020. This unexpected 13-point leap in 2022 brings them to 0% (see Figure 1). Originally, we had not planned to collect more data for iTunes because, in June 2019, Apple announced it would be replacing iTunes with three separate media applications. That happened on iPhones, but iTunes did not entirely disappear on other platforms. We suspect that with the 2019 Apple announcement and the growing popularity of audio streaming services, the current population of iTunes users are those with an unusual level of loyalty to the software. This loyalty is reflected in the improved NPS (and a corresponding improvement in its SUS ratings from 69 in 2020 to 80 in 2022).
System Usability Scale (SUS) Scores
We used the popular System Usability Scale (SUS) to compute the perceived usability of the 41 products. SUS is a ten-item questionnaire with possible scores ranging from 0 to 100. The average SUS score from over 500 products (including websites, consumer software, and business software) is 68.
The average SUS score from this group of consumer products was 76, with a low score of 58 and a high score of 88. A raw SUS score of 76 compared to our SUS norms translates to the 76th percentile (well above the average 50th percentile).
The product with the lowest SUS score in this dataset was Adobe Illustrator, with a score of 58 (at the 25th percentile). The top user comments regarding issues with Illustrator were its steep learning curve and its difficulty of use even after learning its functions.
The SUS scores for the ten products from the previous reports were generally stable, with most products shifting by no more than three points. The exceptions to this were Mint (six-point increase), Quicken (four-point increase), Excel (sex-point decrease), and as mentioned previously, iTunes (eleven-point increase).
Usefulness and Ease
In the report, we present findings for the UX-Lite (based on the UMUX-Lite documented in previous reports); it’s become an important benchmark for many organizations for its single (succinct) score to quantify software acceptance and satisfaction. It’s based on earlier work harkening back to the Technology Acceptance Model (TAM), but it includes only one item each for ease (how easy is the software to use) and usefulness (how well do the features meet the users’ needs). In aggregate, it provides a measure of acceptance (a mini-TAM) or, even more broadly, satisfaction with the product.
Figure 2 shows a scatterplot of the two components of the UX-Lite scores. The dotted red lines indicate the component means from this study (77 for ease; 77 for usefulness). VLC and Firefox received the highest usefulness ratings, while Safari and Chrome received the highest ease ratings. Illustrator had the lowest scores for both components of the UX-Lite.
Perceived Usability Affects Loyalty
Figure 3 shows the SUS scores for detractors, passives, and promoters (computed at the individual level for all responses; all differences are significant with p < .05).
This figure shows that, on average, promoters had SUS scores of just over 89 (an A+ on the Sauro-Lewis curved grading scale). This pattern is consistent with previous research we conducted that found that SUS scores above 80 have a good chance of indicating promoters. The mean SUS for passives was 75.9 (a B on the Sauro-Lewis curved grading scale), and the mean for detractors was 62.3 (a grade of D).
In this report, we included a new section on the assessment of the Product-Market Fit (PMF) item, which has the following four response options to the question, “How would you feel if you could no longer use this product?”
- Very disappointed
- Somewhat disappointed
- Not disappointed (it isn’t really that useful)
- N/A—I no longer use this product
The most common way to report results for the PMF item is the top-box score (percentage of “Very disappointed” ratings), with a rule of thumb that the top-box benchmark for successful startups is 40%. (For least-squares modeling, assigning a number to each response is possible; for example, 0 = no longer used, 1 = not really that useful, 2 = somewhat disappointed, and 3 = very disappointed.)
The products with the highest PMF top-box scores were Google G Suite (53%) and VLC (50%). The products with the lowest PMF top-box scores were Adobe Photoshop (9%) and Microsoft Edge (6%). In contrast to the PMF top-box scores for business software, for which none had a score greater than 26%, six of the consumer products had scores ranging from 41% to 53%. All these consumer products come from mature (rather than startup) companies with reasonably competitive markets. Four of the six were Google products (G Suite, Gmail, Docs, Chrome).
Useful + Usable = Market Fit through Recommendations
One way to better understand the relationship between the multiple metrics collected is through a path diagram using structural equation modeling (SEM). SEM can provide insights into hypothesized causal relationships between variables.
Figure 4 shows a structural equation model using the data across all the products in this study. It has good fit statistics (c2(2) = 26.7, p < .05; CFI = 0.99; RMSEA = .077; BIC = 88.0), which means the model does a good job of explaining the data. It supports a hypothesized causal chain from the UX-Lite components (Easy, Useful) to likelihood-to-recommend (LTR) to Product-Market Fit.
Variation in the UX-Lite items accounts for just under half (42%) of the variation in LTR, and variation in LTR accounts for just over a third (37%) of variation in mean PMF (using the 0–3-point scaling of its response options). In other words, improving the ease and usefulness of products will likely improve the recommendation behavior of users, which in turn is significantly related to improved ratings of product-market fit.
Retrospective UX benchmarking is an important tool for investigating attitudes toward constructs such as usability and usefulness and their relationships to behavioral intentions such as likelihood to recommend. With this information, UX researchers can estimate how different products are doing relative to their competitors, and they have some high-level diagnostic information that can guide additional research to understand the “why” behind the numbers.
The full list of benchmarks and analysis of verbatim responses are available for purchase in the 2022 Net Promoter and UX Benchmark Report for Consumer Software.