97 Things To Know About Usability

Jeff Sauro, PhD

September 7, 2010

You are ultimately measuring an interface not users.
Tell the users you are measuring the interface not them.
Usability testing is not QA testing. Usability testing is finding problems with user interactions. QA testing is identifying problems with code that also impacts the user.
Usability is a combination of user-attitudes and performance about an interface.
There is no usability thermometer or survey, usability is an abstract construct that can be measured indirectly through performance and attitudinal measures.
The ISO standard of usability (ISO 9241 pt. 11) defines usability as the intersection between effectiveness, efficiency and satisfaction in a context of use.
Have users attempt realistic tasks during a usability test; don’t just have them check things out.
Tasks should use realistic information and representative scenarios.
Use a mix of both qualitative and quantitative measures.
Nine steps to conducting a usability test:
1. Determine what you want to test
2. Identify your test goals.
3. Write a minimum of 3-5 tasks
4. Recruit a set of users
5. Test your users
6. Collect as many metrics as possible
7. Code your data
8. Generate confidence intervals
9. Report
Use questionnaires to ask users if they think the task is usable (after each task) and at the end of the usability test.
You can still measures task time when users think aloud.
For post-task satisfaction data you can ask just a single question (SEQ).
For post-test satisfaction questionnaires use a standardized instrument like the SUS, SUMI, PSSUQ[pdf] or QUIS.
Testing with five users will only uncover most of the obvious usability problems—not most of ALL problems.
It is unlikely that severe problems affect more users than trivial ones.
Performance data like time, completion rates and errors correlate with perception measures like post-task and post-test satisfaction. The correlation is about the same as the predictive power high-school grades have to first-year college grades[pdf].
Users typically prefer the systems that are more usable, but not always so it’s important to measure both preference and performance.
While ideal, it is not essential to have a randomly selected sample of users for your usability test. Even clinical trials have problems selecting people at random. It is more important to select representative users and understand how the users you aren’t talking to might be different than the ones you are testing.
Heuristic evaluations should be done prior to and in addition to usability testing, not instead of them.
If you conduct a Heuristic Evaluation you should aim for between 3 and 5 experts to review the interface since different evaluators find different problems.
Independent teams evaluating the same interface tend to find different usability problems in both user-testing and Heuristic Evaluations.
Even professions with highly skilled evaluators such as radiology and psychiatry share the problem with usability evaluation: different experts tend to find different problems.
Task scenarios should have specific information with objective success criteria.
Identify success criteria prior to testing so it is clear when a user completes or fails to complete a task successfully.
Only assist users during a task if they absolutely get stuck.
If you assist a user during a task code the task as a failed task attempt.
For tasks where users are expected to have already received training offer some simulated training prior to testing[pdf]. Just know you are less likely to uncover problems with learnability.
Test at least two tasks per user, ideally you should test five or more.
Remoted-Unmoderated usability tests tend to generate similar completion rates and SUS scores as lab-based moderated tests.
Task time data tends to differ by around 30% between remoted-unmoderated usability tests and lab-based tests. Task time data from remote tests is often an unreliable measure of actual user task time.
There is probably a small advantage to using 7-point scales when designing new questionnaires with few items (five or less). The more items you have in your questionnaire the less the number of scale points matter.
For single questions, having more scale points matters, so plan for between 7 and 11 points.
If you have an existing questionnaire, don’t change your scale—the biggest value you get with a questionnaire is comparing it to meaningful data (previous tests or external benchmarks).
Provide an odd-number of scale-points in your questionnaires so users have a neutral point. Forcing users into responses will increase the error in your measurement as having a neutral response toward usability is legitimate.
Mercenary usability testers hired online tend to generate data similar to lab-based studies and remote-unmoderated studies with known sets of volunteer users.
Randomize or counterbalance the task order in your usability test because users are still typically getting used to being “tested” and the first task gets overly penalized.
It is easy to show that an application is unusable with small sample sizes but much harder to show that it is usable at small sample sizes.
Web-analytics don’t replace small sample lab-based studies since most analytic data can’t tell you why users are doing what they’re doing.
At sample sizes of 20, the typical margin of error is plus or minus 20% for completion rates, task times and satisfaction scores.
If you want to cut the margin of error in half you need to quadruple your sample size. If the margin of error is 20% at a sample size of 20, you should plan on testing 80 users to have a margin of error of plus or minus 10%.
Be aware of using too much jargon and BS in your reports and presentations to marketing, management and development.
Usability explains around 30% of changes in customer loyalty; improving the usability of your website or software will likely move your loyalty needles.
Don’t pursue a full-time PhD in a usability field just because you think it will pay-off financially—it probably won’t offset the time out of the workforce and cost of pursuing the degree.
When users fail to complete a task successfully, around 14% still rate the task as being super-easy.
When reporting the average task time for small samples (less than 25 users) use the geometric mean instead of the mean or median.
For large sample studies (above 25 users) the median is best measure of the typical task time.
If you’re going survey users, worry first about what you’ll do with an average response of 4.12 that worrying about the number of scale points or sample size.
You can predict rather accurately how long it will take experienced users completing repetitive tasks using keystroke level modeling[pdf].
Have users repeat the task at least once to get a measure of learnability.
Make sure users are really done with a task before you force them on to the next one. In long usability test sessions in gets easy to want to move things along prematurely.
If you don’t want to interrupt users during a task use retrospective probing where you ask users to recount decisions they made after the task is over.
Users take about the same amount of time to complete tasks when they think aloud versus when they don’t.
When testing be discrete when typing or taking notes—users will hear you typing or notice you writing and become more aware of being observed…some will even ask if they did something “wrong.”
Attempt the tasks yourself in the usability lab on the computer you’ll be testing to find obvious problems in your tasks or bugs in the software.
Pilot test your tasks and the session with a couple users to get the kinks out.
For testing websites you can use the System Usability Scale to measure the perceived usability, although a few of the original questions are a bit redundant and don’t add much information to the average. For software applications it is the most popular choice.
When taking task-time measures always try and come up with an idea of how long a task should take[pdf]. Use benchmarks, previous versions or some reasonable criteria to bring meaning to average times.
There are typically two classes of usability tests: finding and fixing usability problems (formative tests) and describing the usability of an application from average task times, completion rates and other metrics (summative tests). In practice your usability tests should be a mix of finding and fixing and benchmarking.
There is more than a 2 to 1 ratio between the frequency of formative and summative tests. On average, respondents reported conducting 13.8 formative usability tests over a 2-year period and 6.1 summative tests during the same time (source Measuring Usability.com subscribers).
When you see a problem in a small sample usability test it is much more likely that the problem affects a lot of users than a small percent.
Code task completion rates as pass (1) and fail (0). Partial task completion is harder to analyze and is often subjective. Instead code errors and only code success when the task success criteria have been met.
Compute an Adjusted Wald-binomial confidence interval around your task completion rates.
Start recording task times after the user reads and understands the task scenario.
By standardizing raw usability metrics you can average them together and report a Single Usability Measure.
Task time data is positively skewed[pdf] so you should log-transform the raw values prior to performing many statistical computations.
Compute a t-confidence interval around the average task time.
Your data do not need to be normally distributed to use statistics—for task time data a transformation is performed to make the data normal, satisfaction scores have sampling distributions that are normal even after just five users and completion rates use the binomial distribution which doesn’t assume normality.
When reporting average task times you can report: average task completion time (no failed tasks), average time on task (all-task times) and average time to failure (only failed tasks). Never throw away times from failed task attempts.
Keep track of which users encounter which problem. Use this information to understand how likely a user is to encounter a problem before and after user-testing.
If you know how likely a user is to encounter a problem then you can determine what sample size will likely uncover 85% or more problems.
Archive your usability test data and mine the type and frequency of problems—you may find the same type of problems are being encountered all over your application—fixing one problem may have large benefits.
Reductions in user-task times are an excellent way to show improvements in productivity which in turn is a good way to show Return on Investments (ROI).
When justifying usability activities focus on the outputs (time saved, reduced calls to support) rather than the inputs (number of tests conducted, users tested and designs created).
You can identify more problems in an interface or from watching users in a usability test if you use a double expert—someone who is both skilled in usability principles and the domain you’re testing (e.g. accounting or stock-trading).
Assume that problem frequency and severity are independent and report them separately.
Use a rating scale with at least 5-points when rating problem severity.
Different usability experts will likely disagree on the severity of a problem so there if possible have 2 -3 people independently rate problem severity.
Eye-tracking data can be difficult to interpret and should be used with caution. Just because an element is in a user’s visual field doesn’t mean they perceive it.
When measuring task efficiency focus on outcomes not inputs: use time to complete a task instead of clicks, gazes, fixations or saccades. Does it matter that users click or gaze more at some objects if they can complete a task in less time?
When defining tasks pick a combination of core tasks (tasks that are done frequently and are important) and edge case tasks—tasks that aren’t don’t that frequently but when done are important.
If you have time and money to test 12 users during the design and development phase of an application it is better to test 3 rounds of 4 users than use up all 12 in one test.
If you have data that runs contrary to a guideline don’t be afraid to use it. Guidelines are a good place to start in the absence of data, software is only usable if it takes users less time to complete more tasks and they think the application is more usable.
Be wary of folk usability wisdom like “users don’t scroll,” “place the most important information in the upper left” and “pages should contain 35% whitespace.” Such guidance while rooted in data can be highly context sensitive-don’t be afraid to test.
Don’t guess, test. Wherever possible subject your design to actual usage by users.
For a good place to start understanding some of the most important issues in usability read five influential papers.
Make usability reports helpful by including a mix of both positive and negative comments. There’s no need to sugar-coat problems but an all negative report can alienate your audience (which usually includes people who spent a lot of time building the interface you just tore apart).
In a usability test-report try to include some design solutions or suggested fixes to problems rather than just lists of problems.
Use a screen recorder like Camtasia to have a record each test session. It is not only essential for double checking your data and comments but becomes an excellent archive to mine for more usability data as well as testing highlight videos.
Record task-times, completion rates, errors, satisfaction scores and problems found while you’re testing users (if possible). Waiting to code this data after the test can double or triple the time it takes to generate a report and makes a preliminary report difficult to product quickly.
Shortly after usability testing concludes, report some preliminary results—some high level findings and the descriptive statistics such as average completion rate, times and satisfaction scores by task and test. Don’t keep your audience waiting too long!
If someone tells you that you can’t use statistics with small samples then they know enough about statistics to be dangerous…you can and should use statistical calculations for sample sizes as small as 2 (medical professions do it, the military does it and so does the usability profession).
Don’t get obsessed with finding the right number of scale points, labels and the perfect question wording in a questionnaire—usability is typically not a sensitive issue so users are likely to be honest in their responses. The impact of unusable software generally overcomes most problems with questionnaire design.
You can apply a Six-Sigma methodology to usability[pdf] by leveraging the framework of defining user goals, measuring how well the current interface meets those goals, analyzing root causes of usability problems and quantifying your improved solution.
Miller’s 7 +/- rule of short term memory is interesting but trying to apply that to menus lengths, and the number of links and options is almost always inappropriate.
It never hurts to learn more about usability by reading the publications or hearing talks from some of the living legends in the field.
Usability is all about people (interacting with interfaces) but you can measure and manage the impacts of usability improvements just like marketing, sales and development do.