It’s not terribly complicated, yet it’s not universally applied.
When designing an application, website or product, three things help generate a more usable experience: an early focus on the users and tasks, empirical measurement, and iterative design.
These three key principles were articulated by John Gould and Clayton Lewis almost 30 years ago in the seminal 1985 paper, Designing for Usability: Key Principles and What Designers Think.
The same obstacles to designing for usability exist now, in the mobile age, as they did at the dawn of the Graphical User Interface age!
Here are many of the points they made along with some examples of how we’ve applied these key principles.
Early Focus on Users and Tasks
Bring the design team in direct contact with potential users, not through intermediaries or an examination of profiles. Especially in the first phases, collect information about what users are trying to accomplish, what’s not making sense, how well the architecture and system navigation map to user expectations. Using a Top Task Analysis is a efficient method for understanding the vital few tasks users want to accomplish in software or websites.
Iterative Design
Careful design pays off, and having multiple phases is not a license to be sloppy or careless. Getting it right the first time is a good goal, but experience shows that it’s not as easy as it sounds. If you have the budget for testing 15 users, it’s best to split that sample up into three groups of five users. Test the first round of five users, fix the problems that aren’t controversial or won’t introduce new issues, then test again. In fact, it’s often overlooked, but Nielsen recommends testing with five users per round, not five users total, in his famous article on sample sizes in usability tests.
Empirical Measurement
Many development teams may get the first two key principles but are hesitant to measure. Picking a few key usability metrics and tracking them from each round of testing provides an easy objective check on your design decisions. A low-fidelity prototype or changing tasks are not excuses for not measuring. In addition to completion rates, here are some other examples of showing measured improvements from iterative testing.
Task Difficulty
We tested an iPad application in three rounds of testing over a three month period. In each round, we had five or six participants attempt a number of tasks. After each task we asked participants to rate how difficult they found the task using the 7-point Single Ease Question (we administered the scale verbally). In the first round, the prototype was marginally functioning, but we were still able to uncover problems with the navigation and labels. The graph below shows the average scores along with 85% confidence intervals for each round.
Despite changing some tasks we used in each round, there were still three tasks that were common in all three rounds and five tasks that were common in two rounds.
You can see in the graph above that the perception of task difficulty improved statistically from Round 1 to Round 3 for three tasks (notice the error bars on the green bars mostly don’t overlap with the error bars on the blue bars). It was generally consistent for most other common tasks and generally at a high level of ease (around the 90th percentile). This empirical measurement provided clear evidence for a quantifiably better user experience.
Overall Perceptions of Usability
In addition to task-level metrics, you can also measure perceptions of the entire app experience across sessions. Bangor et al, described an iterative design with an installation kit where the System Usability Scale (SUS) was tracked after each session. A visual representation of their data is shown in the graph below across five rounds along with the historical “average” benchmark of 68 for SUS scores.
Percent of Critical/Severe Problems Uncovered
If you track nothing else, it should be the frequency and severity of the usability issues uncovered in each round of testing. Both can be tracked over time to get some idea about improvements. We focus more on the ratio of critical problems to total problems uncovered instead of the raw number of total problems.
The reason is that we’re often uncovering about the same number of total issues in a round of testing, but as the interface improves, the issues tend to become more minor. In some cases, when critical issues are resolved, it allows users to progress further through the tasks and reveal additional problems.
For example, in the same three-round iPad app study described above, we saw the percent of critical issues go from 27% in Round 1 to 17% in Round 2 and finally 8% in Round 3–or about one-third as many as we observed in the first round.
Some recommendations survive the test of time and different technologies—using these three key principles is one of them.