For most customer research, you’re rarely able to measure the attitudes or behaviors of everyone.

Instead you take a sample of your customers and use this sample to make inferences about the rest of your customers.

Even if you’re in a situation where you can collect data from all current customers, it’s not possible to collect data from future customers. So current customers are in a sense a sample of future customers as well.

While sampling is efficient and statistically sound, it comes with some risks. Here are five steps to help reduce some of the risks and make sampling your customer more effective.

1. Identify the Segments

Few customer populations are a homogenous group of people. You can usually group customers by how they think and behave toward a product or service. Some common segments include:

  • New versus repeat
  • Renters versus owners
  • High income versus low income
  • Frequent versus infrequent purchasers
  • Men versus women
  • Domestic versus international

This step sets you up to successfully segment your customers. When you’ve properly segmented your customers, you can then appropriately pick a representative number of customers from each segment (step 2). When you can’t proportionally sample, you can use weighting approaches to offset underrepresented groups.

2. Verify Representatives

Far more important than the number of customers in your sample is the type of customer in your sample. Collecting data from a lot of people about their sentiments toward the home buying experience, for example, won’t make much sense if the people you collect data from never bought a home.

While the type of variables and segments that matter the most depend on the context, when measuring how customers use products and websites we find that prior experience with the brand, product, and domain have a large influence and should be a factor in determining representativeness.

To help establish representativeness, use multiple sources of data to help establish your sampling strategy and verify the sample you eventually get is representative. Data sources can include existing customer research (such as transaction records or prior surveys) as well as third-party published industry data and demographics of your customers.

3. Compute the Sample Size

Fortunately, a sample, even a small sample, is often sufficient to tell you what you need to know about your customers. Sample size questions bewilder most people, and for good reason—the computations involve several unknowns that have to be estimated. Even the best sample size estimates use assumptions, many of which don’t pan out.

The sample size you need will depend primarily on whether you’re making a comparison, discovering problems or insights, or estimating the prevalence of an attribute in the customer population (more on finding the right sample size).

If you want to draw conclusions about each of the segments, then you will need to sample proportionally from each segment.

4. Randomize

When conducting research, ideally you’ll randomly pull a selection of customers to participate. However, in my experience, such an arrangement is extremely rare. Customer research, like randomized clinical trials, has to rely on people volunteering to participate in studies and the ideal of random participation is elusive.

Fortunately, like clinical trials, all is not lost. Instead of relying on randomized participation, we use randomized assignment to help smooth out the selection bias. Randomize whenever possible (e.g. participants, response order, question order) but if you’re unable to, be aware of the often unexpected results of non-random data.

Even banal things like the order of customers on a customer list may inadvertently contain nuisance variables, such as the oldest customers listed first, and you may unknowingly bias your data. Instead of sampling the first 100 customers, shuffle the list order and pick a random number from 1 to 10 (say 7); then pick every seventh customer and invite them to participate. If there was a meaning to the order, at least it’s now been minimized or eliminated.

5. Minimize Bias

Every research effort has the potential for different types of biases to threaten the validity of the findings. While you can’t always prevent these biases, you should be aware of them and use some steps to mitigate them. Three of the more common biases to look out for and minimize are:

  • Non-response: Customers we invite to participate may not, or they may decide to answer only some of the questions. We hope that there isn’t a systematic difference between those who do response and those who don’t and there are ways to look for differences. Using shorter surveys of course can also help

    A variant on this is the order of response; it could be that the first customers to respond differ from laggard responders and stopping a study early (say when you reach your target sample size) may bias toward the more eager (and possibly more favorable) respondents. Fortunately, in the studies we examined, we only found minor effects of the response order.

  • Satisficing: There is a tendency for customers to just tell you what you want to hear—whether it be their satisfaction, likelihood to recommend, or stated interest in a future product. While you can never eliminate this tendency, you can minimize it using techniques like conjoint analysis  that force respondents to pick among alternatives and prioritize.
  • Instrument bias: The instrument is the interface used to collect your data and encompasses everything from the delivery, look and feel, wording of questions, question order, format, and response options. All of these have the potential to bias your data. Books are written on how to ask the right questions and present more objective response options to minimize these biases. This is also a reminder that having access to all your customers doesn’t eliminate bias.