In a usability evaluation it’s good practice to measure both how users perform on realistic tasks and what they think about the usability of the interface.
But what exactly DO you ask the users?
“Is this usable?” … “Is the interface easy to use?” … “Did you like using the app?”
While you can cobble together a few questions yourself or with your product team, good questions already exist in the form of standardized usability questionnaires.
A standardized questionnaire has gone through the process of psychometric validation. While that sounds like a psycho with a yard stick, it actually means someone spent a lot of time going through dozens, or hundreds of possible questions and winnowed the set down to the ones that are most reliable, valid and sensitive. Usually this is done by having a representative group of participants answer the candidate questions in response to a diverse set of interfaces.
Standardized questionnaires provide these advantages:
- Reliability is basically the repeatability of the questionnaire. It refers to how consistent responses are to the questions. We’d expect the same or similar users to respond about the same when evaluating the same interface. Standardized questionnaires have been shown to be more reliable than similar homegrown ones[PDF]. Reliability is most commonly measured using Cronbach’s alpha, which is a measure of internal reliability and can range from 0 (poor reliability) to 1 (perfect reliability). In usability practice, anything above a .70 is considered sufficiently reliable . For tests that assess individuals and can affect their lives such as college placement or employment decisions, the criteria is higher – targeting .95.
- Validity refers to how well a questionnaire can measure what it is intended to measure. That is, if we hear users complaining about a website or software product as being unusable, the questionnaire should be able to distinguish them from websites or software that are acclaimed for their usability. Validity is usually measured by correlating the scores of one questionnaire to other established questionnaires or other outcome measures such as task time or completion rates.
- Sensitivity: Sensitivity refers to how well the questionnaire can discriminate between good interfaces and bad ones. Even very poorly worded questions with a huge sample size can detect differences between horrible and excellent experiences. Usually we’re dealing with more modest differences and more modest sample sizes. You want as sensitive an instrument as you can get. Sensitivity is often measured using resampling procedures to see how well the questionnaire can differentiate at a fraction of the sample size. The ability to detect differences at even small sample sizes (<10) was one of the major reasons why the System Usability Scale (SUS) and the Single Ease Question (SEQ) are recommended.
- Objectivity: Standardized questionnaires allow usability practitioners to independently verify the measurement statements of other practitioners. It’s harder to stack the deck in favor or against your application if you use an independent instrument that others have found success with.
- Quantification: Standardization allows for a finer grain of reporting and statistical analysis than personal judgment. And in case you were wondering, it’s perfectly legitimate to use statistics with ordinal measures like those used in questionnaires. Just be careful not to make interval statements like (“This interface is twice as usable as the other”) unless you really have an interval scale.
- Economy: It takes a lot of time to sift through the bad questions to find the good ones. Once this process is complete, we can all take advantage of the process.
- Communication: It’s easier to communicate results when we have a common standardized questionnaire. If you’ve used the System Usability Scale (SUS), then just knowing the raw score of an interface is an 80 can be meaningful (that’s a good score).
- Norms: Some standardized questionnaires have normalized references databases which allow you to convert raw scores into percentile ranks. Instead of working with just an average, you can actually see how well your interface stacks up to a lager database of comparables. For websites, the SUPR-Q and WAMMI are both standardized questionnaires that measure usability and other factors (such as trust and loyalty). Scores are compared to a database of hundreds of websites. There is of course a cost to maintaining these databases which is why these aren’t free.
Want to know more? Chapter 8 in our book Quantifying the User Experience provides the most comprehensive source on standardized usability questionnaires to date. Jim Lewis and I have both put the hours in creating standardized usability questionnaires (the PSSUQ[PDF]and SUPR-Q) and we put a lot of that knowledge in the chapter (it has 100 works cited!).
Oh and if you’re looking for a recommendation on which standardized usability questionnaire to use:
- Website usability : use the SUPR-Q which allows you to actually see 100 of the websites in the database to directly compare against.
- Software or any interface consider the System Usability Scale (SUS) which is the most widely used questionnaire allowing us to compile an open-source set of norms.
- Post-task difficulty use the Single Ease Question (SEQ) which performed about as well as an interval scaled questionnaire[PDF].