
- Whether you should even use numbers in UX research
- Which is the best UX metric to use
- Whether you’re using the right statistical test on your data
- Whether surveys are ever an appropriate UX research method
- The “right” number of points in a rating scale
- And, of course, the “correct” sample size
Unfortunately, these debates and discussions can lead to discomfort and avoidance of the topics altogether. Quantitative methods and analysis can be intimidating because when there’s math, many think there should always be only one correct answer. But UX research is an applied field. There’s as much work in understanding when to quantify, how to compute, when rules can be bent, and the limitations or flexibility in interpreting the numbers. We can’t solve all your math problems, but we hope we can get you more comfortable with quantitative research. Here are five ways.
1. Use keywords to determine when quant is the better method
There are roles for both qualitative and quantitative methods in UX research. In fact, we’ve identified over forty UX research methods. While you don’t need to be an expert in all those methods, it’s helpful to know when a more quantitative approach is appropriate over a more qualitative approach. One of the best ways to do that is by looking for keywords in the research questions (Figure 1).
Figure 1: Key words related to quant and qual research questions.
For example, if you’re trying to understand users’ design preferences, you’ll want a quantitative method. You’ll need to present options and then enumerate which choice was more frequently selected before determining whether this frequency was greater than chance. Of course, you may want to know why people prefer one design over another, which is a qualitative keyword. You can combine methods because UX research is often mixed methods.
2. There are 70+ UX metrics; get comfortable with four of them
In our MeasuringUniversity course on UX metrics, we provide a taxonomy of 70+ UX metrics. That’s a lot, but many of the metrics are variations. We recommend getting comfortable with the most common and versatile metrics, which is less daunting. UX metrics can be subdivided into task-based (those typically administered immediately after a user attempts a task) and study-level (typically administered only once, either after a series of tasks or independently in a retrospective survey).
Figure 2 shows the fundamental (Big 3) task metrics.
Figure 2: The Big 3 task metrics plus the UX-Lite®. The shapes indicate the three quality indicators for the metrics—Triangle: Popular Usage, Circle: Ease of Collection, Square: Reference benchmarks. Shape colors from green (good) to red (poor) indicate our judgment of how well each metric does for each quality indicator.
Task completion: The gateway metric to quantification, task completion is a great place to start. It’s measured as a simple pass (1) or fail (0).
Single Ease Question (SEQ®): The SEQ is the most popular way to measure people’s attitude toward how easy or difficult a task was. It has seven points and has evolved to be an efficient measure, but it also discriminates between poor and good task experiences. It has benchmarks.
Time: When you need to measure efficiency, task time is the natural choice. There are a few nuances when working with task time, including how to handle when people fail tasks and the natural skewness, but neither of these should keep you away when you need to measure efficiency.
UX-Lite: This questionnaire (Figure 3) consists of two items (how easy the product was to use and how well the product’s features met your needs). It’s beautifully compact but effective (like the SEQ). It predicts future software usage. Why? Because there’s good evidence that people use products that do what they want and are easy to use.
Figure 3: The UX-Lite (created with MUiQ®).
With these four core measures, you can address many UX research questions.
3. Differentiate between data types, especially binary and continuous
A few core metrics in your toolbox will allow you to measure and address many UX research questions. But after collection, how do you analyze the data correctly? That is its own course, but the first step in analysis is understanding what types of data you have.
A broad distinction in data is between discrete (countable data) and continuous data (Figure 4).
Figure 4: Taxonomy of data types.
Completion rate data is binary data, taking only two values (0 and 1), like computer bits (binary digits). Binary data is the simplest form of discrete data (data that is countable), but also the coarsest (it’s all or nothing). The cost for this simplicity is the large sample size you’ll need for binary data compared to continuous data.
Rating scales such as the SEQ and UX-Lite are fundamentally discrete (respondents select specific scale values), but they can often be analyzed as continuous because as the sample size increases, the values the mean can take become more and more continuous. Other factors that affect analysis of rating scales as continuous data are the number of scale points and scores being an average of two or more items. Considering scale reliability and sensitivity, we usually use five points for multi-item measures (e.g., UX-Lite, SUPR-Q), seven points for single attitudinal items (e.g., SEQ), and eleven points for behavioral intentions (e.g., likelihood to recommend). Despite their apparent simplicity, we strongly discourage using only three points.
Task time data is continuous as values can be subdivided into smaller units and can take any value within a range. For example, task times can take less than a second (in a click test) to more than 10 minutes (for lengthy multi-step tasks) with any value in between being possible (e.g., 30 seconds, 5.2 seconds, 8 minutes and 12 seconds).
A clear understanding of data types will guide your choice of how to analyze data, which statistical method to use, and how to interpret results.
4. Rating scales measure attitude with more fidelity than binary metrics
We all have some experience with rating scales, and many of us have experienced poorly worded questions or scales that seemed like they would bias the results. From the number of options (is 5 or 7 better), to the use of labels and neutral points, to left- or right-side biases—which of these format choices really matter? When we analyzed data for 21 format changes, we found that only a few have a major impact on respondent behavior (Figure 5).
Figure 5: Changes to rating scales can matter, but usually not that much (asterisks indicate statistically significant differences).
5. Use magic ranges for sample size planning (not magic numbers)
Sampling error is real but not insurmountable
There seems to be a bifurcation of thought on sample sizes in UX, both too extreme but both with a kernel (or more) of truth. On one hand is the “any sample size will work” group. This side believes a few users will suffice for just about any UX research study, from problem discovery to benchmarking. The other school of thought is that to use metrics and quantification, and certainly to establish statistical significance, you will need huge sample sizes (hundreds, thousands).
A more pragmatic approach avoids these extremes by focusing on the research goal.
Sample sizes depend on the goal: “No” to magic numbers but “yes” to magic ranges
Because sampling error is real but not insurmountable, you can and should have a plan for your sample sizes. Avoid looking for one magic number that always works (like 5 or 20). Instead, use the magic ranges we’ve identified for three common research questions:
- Problem discovery
- Precision estimation
- Comparison
There is no magic sample size number that always works for any method, and there is no single magic number that works for each method. However, it can be informative to look at the most common ranges of sample sizes, as long as you keep in mind that they won’t work for all research questions but will work for the most common ones (Table 1).
| Research Goal | Sample Size Computation | Typical Ranges |
|---|---|---|
| Discovery Finding Problems | Discovery Model (discovery goal, probability of detection) | 5 to 20 |
| Estimation Estimating Parameters | Confidence Interval (confidence level, variability, precision) | 30 to 300 |
| Comparison Comparing Parameters | Hypothesis Test (confidence level, variability, precision, power level) | 40 to 400 |
Table 1: Magic ranges for three common UX research questions.
Summary
How do you get comfortable with quantitative research? It takes patience and the right mindset. Here are our five recommendations:
- Look for keywords in research questions that suggest quant over qual methods.
- While there are 70+ UX metrics, getting to know four (task completion, task time, SEQ, and UX-Lite) will allow you to address most research questions.
- Understand binary versus continuous data types because they guide analyses and sample size computations.
- Rating scales are the gateway to measuring attitudes. There are many variations and opportunities for biases and errors, but most format differences don’t have much impact on measures of UX attitudes.
- Sampling error is real but not insurmountable. Use magic ranges instead of magic numbers for sample size planning.







