How the PURE Method Builds on 100 Years of Human Factors Research

Jeff Sauro, PhD

Methods evolve and adapt.

The same is true of UX methods that have evolved from other methods, often from disparate fields and dating back decades.

The usability profession itself can trace its roots to the industrial revolution.

The think aloud protocol, one of the signature methods of usability testing, can trace its roots to psychoanalysis, with influence from Freud, Wundt, and Skinner dating back over 100 years.

Recently, Christian Rohrer and I developed a new method called PURE (Practical Usability Rating by Experts), which builds on established methods, most notably the cognitive walkthrough, heuristic evaluation, and GOMS/KLM.

PURE is an analytic—as opposed to empirical—method, meaning that it doesn’t use data collected directly from observing participants (not to be confused with analytics). Instead, evaluators (ideally, experts in HCI principles and the product domain) decompose tasks users perform through an interface and rate the difficulty on a three-point scale to generate a task and product PURE score. The output is both an executive-friendly dashboard and a diagnostic set of issues uncovered as part of the task review (as shown in Figure 1).

Figure 1: An example PURE scorecard for a product experience.

We’ve found PURE scores highly correlate with task and study level metrics collected from users across several websites and apps. It’s quickly become one of the more popular training courses since the method was introduced. As part of our training (and upcoming book), we discuss the origins of the method: from the factory floor to Xerox PARC and CU Boulder.

1900s: Time and Motion Studies

At the end of the 1800s, western economies were transforming as part of the industrial revolution as workers left farms to work in increasingly mechanized factories.

Fredrick Winslow Taylor began to apply scientific management to improve work efficiency and output. Many of his methods became the precursor to what we’d call industrial engineering today. Taylor’s work was combined with and improved by Frank and Lillian Gilbreth’s work (another evolution of a method) and became what’s come to be known as time and motion studies.

A time and motion study essentially involves

  • Divide a task into smaller elements.
  • Time how long each element takes (averaged across people under normal working conditions).
  • Create a standard set of times for common steps, such as

o Time to lift part from floor to table

o Time to put on screw and bolt

o Time to remove part to floor

  • Add the times for an estimate of the total task.

Despite some concerns that time and motion studies would be used as a way to punish workers for slow work, it ultimately has been a successful tool for improving the efficiency (and safety) of work by removing unnecessary steps (wasted time and motions).

Figure 2: Filing a time and motion study. Photo Source.

1970s: Keystroke Level Modeling

In the 1970s, researchers at Xerox PARC and Carnegie Mellon extended the idea of decomposing tasks and applying time and motion studies to human-computer interaction. They developed a method called GOMS (Goals, Operators, Methods, and Selection Rules) that was also a technique meant to reduce unnecessary actions to make software more efficient for the burgeoning computer-using workforce.

GOMS was described in the still highly referenced (but dense) text The Psychology of Human Computer Interaction by Card, Moran, and Newell. GOMS itself represents a family of techniques, the most familiar and accessible of which is Keystroke Level Modeling (KLM). Card, Moran, and Newell conducted their own time and motion studies to build a standardized set of time it would take the typical user to perform actions on a computer (without errors). For example, they found the average time to click a key on a keyboard was 230 milliseconds (about a quarter of a second) and applied Fitts’s law to predict the time it takes to point with a mouse.

With KLM, an evaluator can estimate how long it will take a skilled user to complete a step in a task using only a few of the standard operators (pointing, clicking, typing, and thinking). For a simple introduction to using KLM, see Humane Interface, p. 72.

KLM has been shown to estimate error-free task time to within 10% to 30% of actual times. These estimates can be made from working products, prototypes, or screenshots without needing to collect data directly from users (which is ideal when it’s difficult to test without users). It has been tested on many interfaces and domains such as websites, maps, PDAs, database applications, and was recently updated for mobile interactions [pdf].

Figure 3: The Card, Moran, Newell (1983) human processing model from The Psychology of Human Computer Interaction.

1990s: Heuristic Evaluations and Cognitive Walkthroughs

The 1990s gave rise to two other analytics techniques: the heuristic evaluation and cognitive walkthrough. The heuristic evaluation is still one of the most commonly used methods by UX researchers, although in practice, most people are performing a more generic expert review.

In a heuristic evaluation, an expert in usability principles reviews an interface against a set of broad principles called heuristics. These heuristics were derived from analyzing the root causes of problems uncovered in usability tests. Evaluators then inspect an interface to determine how well it conforms to these heuristics and identify shortcomings. The most famous set of heuristics were derived by Nielsen and Molich but there are other heuristics.

The cognitive walkthrough is a usability inspection method similar to a heuristic evaluation and developed around the same time. The cognitive walkthrough has more of an emphasis on task scenarios than the heuristic evaluation. It was developed by Wharton et al. at the University of Colorado in 1990 [pdf]. Whereas the KLM predicts experienced (error-free) task time, the cognitive walkthrough’s emphasis is on learnability for first time or occasional users.

As part of conducting a cognitive walkthrough, an evaluator must first identify the users’ goals and how they would attempt to accomplish them in the interface. An expert in usability principles then meticulously goes through each step, identifying problems users might encounter as they learn to use the interface.

For each action a user has to take, a reviewer needs to describe the user’s immediate goal and answer and address eight questions and prompts:

  1. First/next atomic action user should take.
  2. How will user access description of action?
  3. How will user associate description with action?
  4. All other variable actions less appropriate?
  5. How will user execute the action?
  6. If timeouts, time for user to decide before timeout?
  7. Execute the action. Describe system response.
  8. Describe appropriate modified goal, if any.

It may come as no surprise that one of the biggest complaints about using the CW method is how long it takes to answer each question. Wharton et al. later refined the questions to four:

  1. Will the user try to achieve the effect that the subtask has?
  2. Will the user notice that the correct action is available?
  3. Will the user understand that the wanted subtask can be achieved by the action?
  4. Does the user get appropriate feedback?


Spencer (2000) further reduced the number of questions in his Streamlined Cognitive Walkthrough [pdf] technique in which you ask only two questions at each user action:

  1. Will the user know what to do at this step?
  2. If the user does the right thing, will they know that they did the right thing, and are making progress towards their goal?

Spencer found that by reducing the number of questions and setting up ground rules for the review team he was able to make CW work at Microsoft.

For more information on how inspection methods compare, see the paper [pdf] by Hollingsed and Novick (2007) that also contains one of the largest collections of references on usability inspection methods.


2016: The PURE Evolution

PURE both shares and extends the methods of KLM, cognitive walkthroughs, and heuristic evaluations.

Some common elements shared across these methods and adapted for PURE include

Analytic methods: The PURE method, like the others described here, is analytic, not empirical like usability testing or card sorting. However, even though users aren’t directly observed, the methods are derived based on observations of user behavior such as common mistakes or the time it takes skilled users to perform actions.

Balance of new and existing users: Whereas KLM focuses on experienced users and the cognitive walkthrough focuses on new users, the PURE method is more flexible in that you can apply it to users across the spectrum from new to experienced. As part of applying the PURE method, you can calibrate your evaluation based on the technical skill and product/domain knowledge of the user.

A focus on tasks: PURE, like KLM and cognitive walkthroughs, focuses on the key tasks users are trying to accomplish in an interface. Even the heuristic evaluation method has morphed to include a focus on tasks. With software, apps, and websites getting increasingly complicated, it’s essential to understand the critical tasks users are trying to perform, using methods like a top-tasks analysis.

Multiple evaluators: Where there is judgment there is disagreement. But different perspectives can be an advantage as combining different qualified perspectives about an interface exploits the wisdom of the crowd. Using at least two (usually no more than 5) evaluators enhances the efficacy of heuristic evaluation and PURE .

Double experts help: Evaluators who know something about human computer interaction and design best practices AND the domain being studied will often be the best evaluators. These “double experts” will likely better know which terms are familiar to the audience and will be better acquainted with the tasks and user goals—an advantage with heuristic evaluations. For example, an evaluator with experience in HCI principles AND accounting software and finance will likely be better at identifying problems and generating accurate PURE scores.

    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top