Topics
Topics

Should You Use Negative Numbers in Rating Scales?
There are a lot of opinions about the best formats for agreement scales. Sometimes those opinions are strongly held and can lead to lengthy, heated discussions within research teams. When format differences affect measurement properties, those discussions may be time well spent, but when the formats don’t matter (or matter very little), the time is

Are Face Emoji Ratings Better than Numbered Scales?
Somewhat agree, very satisfied, extremely likely. The labels used on the points of rating scales can affect responses in often unpredictable ways. What’s more, certain terms can get lost in translation when writing surveys for international usage. Some terms may have subtly different meanings, possibly making cross-cultural comparisons problematic. While numbers are universally understood and

What Do You Gain from Larger-Sample Usability Tests?
We typically recommend small sample sizes (5–10) for conducting iterative usability testing meant to find and fix problems (formative evaluations). For benchmark or comparative studies, where the focus is on detecting differences or estimating population parameters (summative evaluations), we recommend using larger sample sizes (20–100+). Usability testing can be used to uncover problems and assess the

How to Handle Bad Data
Decisions should be driven (or at least informed) by data. Raw data is turned into information by ensuring that it is accurate and has been put into a context that promotes good decision-making. The pandemic has brought a plethora of COVID-related data dashboards, which are meant to provide information that helps the public and public

The Evolution of the Mean Opinion Scale: From MOS-R to MOS-X2
The Mean Opinion Scale (MOS) is a standardized questionnaire used to assess synthetic speech. The quality of synthetic speech strongly affects the user experience of working with conversational systems, with listeners making rapid and unconscious judgments of the speaker’s personality, so it’s important to have standardized methods for its assessment. In an earlier article, we

The UX of Video Streaming Entertainment Websites & Apps
We’ve all spent a lot of time at home this year. The pandemic has made already-popular video streaming services seem essential. The popularity makes sense given the relatively inexpensive subscription fees, the lack of long-term contracts, and the many channels of access (through websites, mobile apps, smart TVs), and there is a LOT of content

What Is the Mean Opinion Scale (MOS)?
The quality of the electronic transmission of the human voice has come a long way since Bell summoned Watson. But even with all the advancement in technology, “Can you hear me now?” is still part of our modern lexicon. Voice—both human and digital—plays an increasingly important role in interactions with our devices. But before you

Are Within-Subjects Designs Invalid?
One of the best ways to make UX metrics more meaningful is to have a comparison. For example, when conducting a UX benchmark study we often recommend adding at least one competing product (especially if it’s the first benchmark). Comparable interfaces help stakeholders easily interpret context-sensitive task metrics, such as completion rates and task time.

Are Star Ratings Better Than Numbered Scales?
Five-star reviews. Whether you’re rating a product on Amazon, a dining experience on Yelp, or a mobile app in the App or Play Store, you can see that the five-star rating system is quite ubiquitous. Does the familiarity of stars offer a better rating system than traditional numbered scales? We recently reported a comparison between standard

Are Sliders Better Than Numbered Scales?
There are many ways to format rating scales. Recently we have explored Labeling neutral points Labeling all or some response options Altering the number of response options Comparing agreement vs. item-specific endpoint labels Each of these formatting decisions has a variety of opinions and research, both pro and con, in the scientific literature at large.

Converting Rating Scales to 0–100 Points
There are a lot of ways to display multipoint rating scales by varying the number of points (e.g., 5, 7, 11) and by labeling or not labeling those points. There’s variety not only in how rating scales are displayed but also in how you score the responses. Two typical scoring methods we discussed earlier are reporting

Sample Size Recommendations for Benchmark Studies
One of the primary goals of measuring the user experience is to see whether design efforts actually make a quantifiable difference over time. A regular benchmark study is a great way to institutionalize the idea of quantifiable differences. Benchmarks are most effective when done at regular intervals (e.g., quarterly or yearly) or after significant design