A standardized questionnaire is one that has gone through psychometric validation.
That means the items used in the questionnaire have been shown to:
1. Offer consistent responses (reliability)
2. Measure what they are intended to measure (validity)
3. Are able to differentiate between good and bad qualities (sensitivity)
The advantages of standardized questionnaires
Standardized questionnaires in UX have been shown to perform better than “homegrown” ones [pdf]. An additional advantage of standardized questionnaires is that many also have a reference database called norm scores, which allows you to convert raw scores into percentile ranks. This helps solve one of the biggest challenges researchers have: knowing how to interpret scores.
To take advantage of the favorable properties of standardized questionnaires, you have to administer the same items to participants in the same context. You risk losing the advantages if you change any of the following:
1. Item wording
2. Dropping an item
3. Response option wording
4. Positive or negative wording of items
5. The number of response options
6. Who you administer it to (respondent characteristics)
7. The language (e.g. English to Italian)
When you’re looking to measure a construct, a good first step is to see whether someone else is already measuring it and whether an existing standardized questionnaire exists. For example, if you’re looking to measure how easy software is to use, the System Usability Scale (SUS) is a good place to start. If you want to measure the UX of a website, the SUPR-Q is a great place to start.
When to customize standardized questionnaires
Even if you find a standardized instrument that matches what you’re measuring, it’s always a good idea to examine the content of the items to be sure they are germane and valid for your industry, product, or set of participants. This is essentially revalidating the content validity of the questionnaire.
Small changes are less likely to have a big impact. For example when using the SUS, it’s been shown that changing the word “system” to the name of the product (e.g. QuickBooks) or the type of system (e.g. website) adds clarity for respondents without sacrificing its psychometric properties. Additionally, changing the word “cumbersome” in item 8 to “awkward” has also been shown to help respondents with a less familiar word.
The more you make changes though, the more you need to reestablish that the questionnaire is still reliable and valid. This was what we did when Jim and I created the all-positive version of the SUS.
For example, I often get asked questions about the SUS and its relevance to certain contexts. The first item in the SUS is “I think that I’d like to use this system frequently.” But what if participants don’t have a choice in the system they are using? Is this item still valid?
For most enterprise software applications (HR, Financial, CRM) users don’t have a choice. They HAVE to use the system. Or what about certain applications that people don’t need to use a lot, like a helpdesk app or an IT app? Does it really make sense to ask participants if they’d like to use a software support website frequently? Perhaps a better item would be “when you have to fix a software issue, would you use this system frequently?”
How to change standardized questionnaires
You may decide that a questionnaire needs a change, but before you make changes consider the following steps:
1. Know what you’re measuring. One questionnaire, even a validated one, doesn’t measure everything. The SUS is a good measure of usability; it’s not a replacement for loyalty, satisfaction, or usefulness. If you’re thinking about adding or removing items, be sure you’re not trying to measure additional constructs. If you are, consider using additional standardized questionnaires instead of adding items.
2. Ensure changes are necessary. Perhaps you don’t like how an item reads or think it’s not relevant. But don’t just make a change because you or even some participants don’t like how items read. Examine the responses and be sure it needs a change. This is especially the case when multiple items are averaged together to offset potential problems with any single item (the SUS uses 10 items). For example, we looked at the responses for participants who didn’t have a choice to use enterprise HR systems and found internally consistent (reliable) results and good discrimination between good and bad systems. Keeping the original allowed us to compare the SUS scores to the industry benchmarks with more confidence.
3. Test alternates with the original. If you feel like you need to replace an item or change the questionnaire in some way, test your new item or changes along with the original with the same set of participants. For example, if you want to change the first item in the SUS, administer 11 items (the original 10 plus the new one) with the same set of participants in a study.
4. Correlate the changes. Examine the correlation between the new changes and the original. Run the correlations between other items and between the item and total scores (e.g. total SUS scores and the original item 1). Very high correlations (r >.8) indicate similar responses and suggest the changes are measuring a similar construct. That’s also a good thing if you’re translating the items in a new language. Verify that the difference is responses is also small.
Low correlations suggest you are measuring something different or your changes have additional problems (that’s when you call MeasuringU to help tell the difference). Depending on what you are modifying, a higher correlation also suggests that despite your changes, it’s very similar to the original and might not be worth changing at all (it’s good enough).
For example, when I worked at Intuit’s non-profit accounting software division we began asking users the now infamous Net Promoter Score. But it seemed really quite odd asking users if they’d recommend the product to friends given that users were required to use the product, nor were likely to be talking about enterprise software with friends.
So we piloted a new item that was phrased something like: “If you knew of another professional looking for an accounting system, how likely would you be to recommend it to them?” What we found was a very high correlation between the original NPS and this modified one. This suggested to us that while it could be improved, it was good enough and not worth changing given the rest of the company was using the NPS.
A good questionnaire properly addresses your research questions. The value of a standardized questionnaire like the SUS and SUPR-Q compared to homegrown ones is that you get more reliable and valid results. In many cases you can compare your score to others because of a reference database.
Standardized questionnaires aren’t one-size-fits-all and sometimes need to be changed. But before you do, consider whether you are measuring more than one thing and whether the existing data you get is good enough. If you decide to make changes, test the original and new items with a set of participants and examine the correlation and size of the difference. When correlations are low and differences are high, proceed with caution.
Learn More: UX Measurement Boot Camp
Intensive Training on UX Methods, Metrics and Measurement
|Fall 2020: Delivered Online|