It’s better to be approximately right than exactly wrong.
The quote is often misappropriated to John Maynard Keynes, the more famous economist and early statistician.
Despite the age of the quote and misappropriation, it’s sound wisdom for any researcher. It doesn’t matter how precise your methods and metrics are, if you’re asking the wrong questions or not doing the research at all, exactness doesn’t matter.
While perfect precision is desirable for any research effort, it’s unachievable, unpractical, and unnecessary. All too often efforts get stymied in a quest for perfect data, the perfect metric, or the perfect method—what a lot of people call planning paralysis. Don’t let a quest for perfect data prevent you from collecting any data! Look for sound approximations that get you to a “good enough” place that accomplishes the job and answers your research questions.
Here are some of examples of when it’s better to be approximately right than get stymied in planning paralysis.
Holding Out for a Large Sample Size
A smaller sample size offers an approximation and is often sufficient. In my experience, people love to comment on the sample size in a research study— and for some stakeholders it never is large enough—so it’s easy to get caught up in the ideal large sample size to please them. All things being equal, larger sample sizes are better than smaller ones. They offer more precision, albeit with a diminishing return. Sampling error is a mathematical fact you can’t avoid.
While reducing that error is laudable, it can go to an extreme. Don’t use up your entire budget and time trying to reduce your sampling error. Instead understand how precise you need to be using the appropriate margin of error. Will a different decision really be reached if the true value differs by 3%, 5%, or even 10%? Find that threshold and move on.
Being Overly Picky about Question Wording
How you ask a question matters. Poorly phrased questions can bias respondents to answer a certain way. This can happen in both surveys and moderated research. But don’t succumb to planning paralysis by tweaking the wording of questions or searching for the perfect wording. All too often, I’ve seen question design get hung up in committee—everyone has input and hours are spent tweaking inconsequential words to satisfy all possible concerns.
When possible use a standardized set of questions in surveys. It ensures the wording is good enough to generate actionable data. And while you don’t want to be sloppy and rush through question wording, often even poorly worded questions yield interesting results when used over time.
Finding the Quintessential Questionnaire
Using a questionnaire that’s been psychometrically validated helps ensure your instrument is reliable and valid. But I’ve never met a questionnaire that couldn’t be improved in some way. Sometimes it’s the wording; sometimes it’s the number of response options or items (too many or too few).
The System Usability Scale (SUS) is a good example—the wording of every item isn’t always applicable and leads some to delay research in a quest for a better instrument. But the SUS has been shown to generate reliable and comparable data to a wide variety of interfaces for decades. You don’t want irrelevant items in a questionnaire, but don’t let a few concerns about wording eschew a reliable questionnaire that has published norms for an untested one without a comparison point.
The Quest for the Perfect Metric
Is it mental effort, delight, love, loyalty, future intent, or affection? Which is better: the error rate or completion rate? Should we throw out the Net Promoter Score? There are a number of things to measure and just as many ways to measure them. But don’t get metric mania by looking for the perfect metric for your project. You’ll want to hone in on the right construct, but many attitudinal measures correlate as they tap into similar sentiments.
This holds true at the task level as well. The common metrics of completion rates, time, and errors correlate [pdf] as an approximation for task usability. It’s usually best to use multiple metrics and then average them in some reasonable way. This is what the Single Usability Metric (SUM) does. And of course don’t get too hung up on the “right” way to aggregate metrics!
Finding the Right Task Time
After you start collecting task times, you’ll immediately want to know what a “good” time is. But don’t get derailed trying to determine what the “right” task time is. All task times are wrong in some respect, but most are valuable. Lab-based studies are too idealized. Unmoderated remote times are easily affected by users distracted with other activities.
It’s usually the relative comparison using any of these methods that provides the most meaning rather than a long quest for the perfect time. In fact, the process of estimating task time for skilled users using Keystroke Level Modeling (KLM) is a quintessential example of where using an approximation is often good enough to know if an interface will decrease or increase task times.
When you find yourself paralyzed during the planning stage, consider whether you’re holding out for the perfect metric or method to get the perfect result. Exactness on the wrong question is not the same as not doing the research at all–but the two are close cousins in the failed-research family. An approximate result is better than an unattainable result and certainly better than no result at all.
|UX Measurement Boot Camp : Three Days of Intensive Training on UX Methods, Metrics and Measurement Aug. 7th-9th 2019|