Subtle changes to response items in surveys and questionnaires can affect responses.
Many of the techniques for item and scale construction in user-research come from marketing and psychology. Some topics can be controversial, sensitive or confusing and so having the right question with the right response options is important.
Attitudes about usability aren’t typically controversial so you’re likely to get more honest answers. Consequently, slight changes to item wording and the number of scale-steps are less likely to lead to major difference in scores. Nevertheless it’s important to understand some of those effects when creating and analyzing scales in questionnaires and surveys.
While there are many caveats and exceptions when creating response items, one effect is that respondents tend to favor the left side of a response scale. Take the following two response options:
My College has an excellent reputation
My College has an excellent reputation
More students agreed with the second response option than the first [pdf]. The only difference is the order in which the response options are presented (agree or disagree first). If you code the values from 1 to 5 for the first scale and 5 to 1 on the second scale then you’ll have a higher average score on the second response option.
This phenomenon also held up when a general population rated the qualities of beer using opposite adjectives, personal distress ratings[pdf], and when rating preferences for products A vs B or B vs A. Once again, respondents have a slight bias to items presented first (on the left side of the scale).
Examples of both scale directions can be found in usability questionnaires. Jim Lewis’s PSSUQ[pdf] goes from Agree to Disagree and the System Usability Scale goes from Disagree to Agree.
How large is the Left-Side Bias?
It’s important to keep in mind that this and many other effects you get from changing wording, question direction, labeling and the number of scale steps is small. For example, a typical difference is something like .2-.3 of a point difference (on a 5-point scale) or about 1/3 of a standard deviation difference.
You won’t start seeing these differences until your sample size exceeds 100 or so. As with most effects on response scales, the bias is not universally present in all scales[pdf] and appears to occur more when the item being rated is phrased positively.
When measuring attitudes toward usability (which is usually not a sensitive or politically charged subject) it is usually the case that the effects of unusable interfaces outweigh nuances in questionnaire design. For example, using extremely worded items or questions will have a much larger impact on the responses.
Why the Bias?
Research suggests that it is something about both the participants and the items that cause the left-side bias. It is hypothesized that it has to do with participant motivation, reading habits, and education level in conjunction with a primacy effect, the clarity of the items and specificity of situations.
- A dishonest researcher who wants responses to be slightly higher in agreement can place the favorable response options on the left.
- If you report top-box or top-two box for a stand-alone survey (no comparisons) then putting agree on the left-side will inflate the response a bit.
- If you are comparing the responses to past or future responses, don’t worry—whatever bias exists in the responses it will occur in both surveys. Comparisons are always more meaningful than stand alone results.
- You will only likely notice a difference if your sample size exceeds 100 responses in each group.
- One is not necessarily right or wrong—if you have an existing scale stick with it.
- Chan, J. (1991) “Response-Order Effects in Likert-Type Scales” Educ. and Psychological Measurement; v51 pp531-540
- Holmes, C. (1974), “A Statistical Evaluation of Rating Scales,” Journal of the Market Research Society, 16 (April), 87-107.
- Friedman, H. & Amoo, T., (1999) Journal of Marketing Management, Vol. 9:3, Winter 1999, 114-123.
- Friedman, H. H., P. J. Herksovitz and S. Pollack, (1994) “Biasing Effects of Scale-Checking Styles on Responses to a Likert Scale,” Proc, of the American Statistical Association Annual Conference: Survey Research Methods, pp. 792-795
- Weng, L., Cheng, C., (2000) “Effects of Response Order on Likert-Type Scales” Educ. and Psychological Measurement; v60; 908