You don’t need to be a data scientist, database admin, or statistical maven to conduct quantitative research.
You must, however, have a good grounding in some fundamental concepts to make the most of the efforts.
While there are a number of skills, techniques, and concepts you’ll want to be familiar with, I think it’s essential to master these five: reliability, validity, statistical significance, experimental validity, and correlations—the main factors that affect the quality of your findings.
Reliability is a measure of the consistency of a metric or a method. Every metric or method we use–including things like methods for uncovering usability problems in an interface and expert judgment of usability problems–must be assessed for reliability.
Here are the most common ways of measuring reliability for any empirical method or metric:
- Inter-rater reliability: raters or observers respond similarly to a phenomenon
- Test-retest reliability: customers respond consistently to identical experiences
- Parallel forms reliability: customers respond consistently despite slight variations to the question or method.
- Internal consistency reliability: A measure, using Cronbach’s alpha, of the extent to which customers respond consistently to items.
Validity is the degree to which what you think you’re measuring corresponds to what you are measuring. Don’t confuse validity and reliability. Proficiency in quantitative research requires that you know the difference.
Your data must be reliable to be valid, but reliability alone does not guarantee validity. Your customers can respond consistently to questions about customer loyalty, for example, but if those questions don’t correspond with the level to which customers repurchase or recommend, then the items have poor validity.
The importance of validity has given rise to many theories and models for determining it. A popular one, the tripartite model, includes three ways to establish validity.
- Criterion: Predicting outcomes (like purchase rates, usability problems, or recommend rates)
- Content: Other experts concur that the items measure the intended construct
- Construct: Correlates with other measures of the same thing (e.g. SUS and SUPRQ)
3. Statistical Significance
Almost every study has sampling error. If you’re fooled by randomness, then you’ll base your conclusions on chance findings. Statistical significance (the likelihood that a difference you observe is due to sampling error alone) is most commonly reflected in the p-value, which provides numerical evidence for statistical significance.
While statisticians quibble over the precise definition of statistical significance, you must at least know what the term means and you must know how to interpret a p-value. Low p-values (typically less than .05) indicate statistical significance. Statistical significance means that the difference you observe is not likely due to sampling error alone. A p-value of 0.05, for example, means that sampling error alone would account for the measured difference only 5 times in 100.)
Don’t confuse statistical significance with practical significance. A statistically significant result isn’t necessarily an important one. You have to determine what, if anything, the size of the difference means.
4. Experimental Validity
Experimental validity measures how well what we are measuring predicts or explains what will happen outside our controlled environment. It is at the heart of not only customer research but of the scientific method itself.
Experimental validity comes in two flavors.
- Internal validity: our confidence in the causal relationship between our design and our outcome variable
- External validity: how well we can extrapolate our findings to the experience of real users in real-world situations
It takes time to get acquainted with the fundamentals of good experimental design. But in general you should know how to set up a study to obtain the most convincing results. This includes:
- Understanding the importance of randomization
- Managing confounding variables
- Minimizing the effects of bias (sampling and researcher)
- Deriving multiple plausible interpretations of your data (not just what you want the results to be)
While there are many varieties of experimental designs, here are three of the most common, from the most internally valid to the least:
- Randomized Control: Randomly assigning participants to various designs and then seeing which has the most favorable metrics.
- Quasi-Experimental: When you can’t randomly assign participants—such as beta users vs all users—to a design but you still can control variables.
- Correlational: The relationship between variables, such as likelihood to recommend vs actually recommending. There’s no random assignment or variable manipulation here.
Correlation as a metric has been around for over 100 years. It’s relatively easy to compute and, because it’s bounded between -1 and 1, even researchers who know nothing about the data can interpret the strength of a relationship. It’s also the basis for more advanced statistical techniques, like factor analysis, and it’s used when you need to establish reliability and validity.
Become familiar with how a correlation is computed for continuous data and binary data. It’s easy to compute it in Excel using the function =CORREL(). Know how to interpret the strength of a correlation coefficient and remember–correlation does not mean causation.
It takes time to learn and appreciate these concepts. Often the best way to learn them is to use them: conduct more of your own research. One of the benefits of working with us at MeasuringU is that we are well trained in these concepts, we teach them at our UX boot camp, and we are happy to teach them to clients when we work on their projects.
|UX Measurement Boot Camp : Three Days of Intensive Training on UX Methods, Metrics and Measurement Aug. 7th-9th 2019|