We measure more than just usability.

We work with clients to measure everything from delight, loyalty, brand affinity, luxury, quality and even love.

While all of these concepts are related, they each measure slightly different aspects of the customer experience.

Before measuring anything, especially a construct that’s not well defined or used in practice, we answer these five questions.

1.    How is this being measured already?

If you’re trying to measure it, chances are, someone else already has. One of the first things we do when quantifying a new concept is to review the published literature to see how other researchers measure it.  We look for how items are phrased, what scales are used and how closely our methods match published findings.

For example, if we want to measure brand luxury in cars, we’d look to the literature for keywords that worked well in discriminating between high and low luxury items and see if they have been specifically tested in automobiles. We’d also be looking to see if there are normalized values or a reference database that can help us interpret the results when we do collect data.

2.    What will we compare this to?

Finding a method for measuring a new attribute is nice, but to make any metric more meaningful, you need to compare it to something. Ideally there is a reference database, historical data, or published industry scores. This is one of the reasons we developed and recommend the SUPR-Q for websites. We maintain a reference database so the scores and they immediately tell us how well a website performs against its peers on the critical aspects of usability, appearance, credibility and loyalty.

For software or just about any app, we recommend the System Usability Scale (SUS). Even though SUS wasn’t developed with a reference database, there are enough studies published and data collected that we now have a SUS reference set. The Net Promoter Score is often criticized for the way it’s scored and over marketed, but despite any measurement shortcomings, when you’re measuring word of mouth, the NPS has many benchmarks and reference datasets to compare your score against that offset its shortcomings.

In the absence of any existing reference data, one alternative is to conduct a comparative study instead of a standalone one. For example, if you’re looking to understand how much customers are delighted by your product experience, don’t just have them use your product– measure them on at least one alternative (in randomized order). When you interpret your results you’ll at least have an immediate comparison.

3.    How reliable and valid are the measures?

Reliability and validity are often used interchangeably. While I’m not bothered by their ambiguous usage in practice, it’s important to know that they do have specific meanings–both of which you should consider when measuring. Reliability refers to the repeatability of measurements over time. That is, if nothing changes, you want people to provide the same response to the same questions. It’s difficult to have much faith in a measure if there’s too much noise. If you make changes to a product design, those changes may be obscured by an unreliable measure.

There are a number of reliability measures, such as test-retest reliability, internal consistency reliability and alternate-forms reliability which came from early work in Educational Research called Classical Test Theory. This work is still relevant and applies to questionnaires, surveys and almost anything you can measure.

Reliability is necessary, but not sufficient when assessing the quality of a metric. You also need to consider validity. Validity refers to how well what you’re trying to measure, actually measures it. For example, if you’re trying to measure the visual appeal of websites, you’ll want a measure that effectively differentiates between attractive and ugly presentations with as small a sample size as possible. Education research has also developed a number of methods for determining the validity of a measure.

4.    How precise do we need to be?

It’s almost always the case that you won’t want (or need) to measure every person in a population. Sampling is an efficient way to estimate whatever you’re trying to measure and then compute how much uncertainty remains with your estimate. Even if you have ten million customers, just 1000 of them, sampled roughly randomly, will provide you with an estimate of whatever you’re measuring that will fluctuate by only about +/3%.

All too often we see budgets blown on unnecessarily large sample sizes.  You’ll often reach the same conclusions with margins of errors three to five times as wide. But even before you estimate what sample size you need, consider doing some analytic estimates based on any information you already have.

The physicist Enrico Fermi, who worked on the Manhattan project, was famous for making rather accurate estimates with little to no information about what he’s trying to measure. For example, how many piano tuners are in Chicago?  How many airplanes are in the air over the US right now? What percent of your customers shop on a mobile device? With some deduction, educated guesses and available information, you may find that in many cases, your rough approximations are actually not that rough and good enough. Consider a Fermi approximation before measuring.

5.    What will we do with the results?

A clear extension of the using the Fermi method for making estimated approximations is that it can allow you to decide what you will do when results come in, before you even sample one person. You should ask yourself, what will we do if we know that only 10% of customers would recommend our product? What if 90% will?

More information is usually better when it comes to tracking customer attitudes and behaviors and we’re always happy to help run studies. But if the results will never prompt a response, even with very extremely favorable or unfavorable responses, do you really need to measure at all?  Maybe, maybe not.

Of course realizing your study’s impact will be minimal doesn’t necessarily mean you don’t measure, but it does force you and stakeholders to consider if the questions you’re asking will actually accomplish your goals or just provide another Power Point presentation.  It also helps to define “good” and “bad” results ahead of time.  What will you do if the average response to an item is a 6.1, a 5.2 or a 4.1?  Knowing those answers help streamline your analysis and focus your efforts.

Regardless of what you’re planning on measuring, answering these five questions before you collect any data will help make the most of your research budget.