Items in questionnaires are typically worded neutrally so as not to state concepts in the extreme. They are like an even-tempered friend—they have opinions but aren’t overly optimistic or chronically pessimistic about things.

What happens when items in a questionnaire or survey are worded in the extreme?

Two years ago we tried a little experiment at the annual UPA conference to find out.

We wanted to know what would happen if we rephrased the moderately worded items of the popular System Usability Scale questionnaire (SUS). Specifically, we wanted to see the effects of using extremely worded items instead of the original neutral items.

For example, item one of the SUS questionnaire is: I think that I would like to use this system frequently.

What would happen if we made it an extreme positive statement?

I think that this is one of my all-time favorite web sites.

Or an extreme negative statement?

I think I never want to use the web site again.

Would respondents notice the difference? If so, how would it affect their scores? A group of us (Keith Karn, Alex Little, Greg Nelson, Jeff Sauro, Jurek Kirakowski, William Albert and Kent Norman) created two new versions of the SUS; one an extreme positive version and the other an extreme negative version (shown below).

The extreme positive SUS.

  1. I think that this is one of my all-time favorite web sites.
  2. I found the web site was really straightforward.
  3. I thought the web site was amazingly easy to use.
  4. I think that technical support services are just not required for the web site.
  5. I found the various pages on the web site worked together very smoothly.
  6. I thought the web site was consistent throughout.
  7. I would imagine anybody could use the web site like a pro from day one.
  8. I found the web site was a delight to use.
  9. I felt completely confident using the web site.
  10. Everything I needed to know about using the website was there for me.

The extreme negative SUS.

  1. I think I never want to use the web site again.
  2. I found the web site to be horribly complex for no good reason.
  3. I thought the web site was very difficult to use.
  4. I think that I would need a permanent hot-line to the help desk to be able to use the web site.*
  5. I found all the pages on the web site to be an ugly mess.
  6. I thought the inconsistency in the web site would kill it.
  7. I found the web site to be completely impossible to use.
  8. I found that this web site was extremely awkward to use.
  9. I felt utterly confused by the web site.
  10. Absolutely nothing about the web site worked

*Indicates my personal favorite.

We sought out volunteers and asked them to review the UPA website. After the review, participants were randomly presented with one of five SUS questionnaires. They received either the all positive extreme, all negative extreme, one of two versions of an extreme mix(half positive and half negative extreme), or as a baseline the standard SUS questionnaire. Around 60 people in total participated giving us between 10-14 responses per condition.

What happened?

In short, extreme wording makes a difference—a big difference in fact. The perception of usability as measured using the original SUS items was 60. The average score on the extremely worded negative questionnaire was 77, or around 25% higher.

The average score on the extremely worded positive questionnaire was 41 or around 30% lower than the original SUS score.

Both differences were statistically significant at the relatively small sample sizes (p <.01) and are shown along with 95% confidence intervals in the graph below.


Figure 1: Mean and 95% Confidence Intervals for SUS scores by type of SUS questionnaire.

People basically reacted to these extreme items by disagreeing to them more.

Interestingly enough though, users that got half-extreme positive and half-extreme negative items showed no significant differences from the standard SUS. The higher responses from the extreme negative items basically canceled out the effects of the lower responses from the extreme positive items. Item intensity and direction were confounded so separating the effects of reversing items and making them extreme is difficult at this sample size (although I’ll cover this in a future blog).

Why do people disagree to extreme items?

There are probably several reasons why users tend to disagree more with the extremely worded items but one good explanation comes from some of the earliest research on rating scales (Thurnstone 1928).

It has been noted that people tend to only agree with statements that are close to their attitude and disagree with all other statements. By rephrasing items to their extreme concept, only respondents who had passionately favorable attitudes about the usability of the UPA website tended to agree with the extremely phrased positive statements—resulting in a significantly lower average score. Likewise, only respondents who passionately disfavored the usability agreed with the extremely negatively questions—resulting in a significant higher average score.

While I don’t recommend that anyone uses the above questions in their next usability evaluation, it should be clear from this data that extremely worded items will make a major difference in scores. In fact, compared to other changes you can make such as the number of scale points or the alignment of the response options these effects are huge—about 3 times as large.

When you’re creating your next survey or questionnaire, keep in mind that questions and items interpreted to be extreme will likely result in fewer people agreeing with them. In most cases you’ll probably want items that have a more neutral wording. Of course, what makes an item “extreme” can be highly contextual and controversial.

In hindsight it seems obvious that reactions to extremely worded questions would be like our reactions to extremely opinionated people—we tend to disagree with them more. The good news is that despite the many ways you can mess-up a questionnaire, as long as you use the same questionnaire when you make comparisons between designs, you’ll probably still have meaningful results.