Understanding Expert Reviews and Inspection Methods

Jeff Sauro, PhD

In user research, there’s more than one expert and there’s also more than one expert review.

The 1990s were the golden age of what is loosely referred to as expert reviews or by the more general term inspection methods.

Unlike usability testing, which relies on observing users interact with a product or website, inspection methods—as the name suggests—are based on evaluators examining an interface for problems.

The idea behind inspection methods can be most directly traced to the work by Michael Fagan. Software code, requirements, and designs are inspected for flaws (called Fagan Inspections) by people knowledgeable about the product and domain.

In my experience, most researchers now see the expert review as something done as a solo (ad hoc) activity and just call it a Heuristic Evaluation (to the chagrin of Rolf Molich). But there isn’t a single inspection method and inspections often involve multiple roles and can certainly be much more structured (including procedures and guidelines).

There are different variations and adaptions of inspection methods, but they generally all share the same core idea of examination by people, “experts” in usability, design, and/or product knowledge. It’s similar to usability testing, which itself has many variations based on the core idea of observing users (e.g., formative think-aloud testing, benchmarking, and competitive benchmarking).

Like usability testing, inspection methods have evolved and have benefited from years of usage. They have their own rich history that I feel are undervalued as an effective way of identifying problems in an interface. They’re more than a second-rate usability test.

One of the primary sources for understanding inspection methods is from a 1994 book by Nielsen and Mack. A more recent contribution is from Chauncey Wilson who put together an servicesable and comprehensive review of inspection methods in his 2014 book. It contains a lot of updated research and how the methods have evolved in the twenty years since the Nielsen and Mack book. If you’re interested in learning more and looking for an servicesable guide, this is the place to start.

With the popularity of PURE as a new analytic method, I wanted to revisit in detail the roots and variety of inspection methods based on our experience at MeasuringU and summarize key parts from the works of Wilson and Nielsen and Mack.

Heuristic Evaluation

A heuristic evaluation involves multiple evaluators examining an interface against a set of principles or design rules called heuristics. We’ve written about heuristic evaluations a lot and it’s probably the most familiar method to UX researchers. The most confusing part for people (Chauncey and Christian Rohrer both agree) is the term heuristic. Heuristics can be thought of as simple and efficient rules that evaluators use as reminders of potential problem areas. Heuristics are typically derived from an examination of the problems uncovered in usability tests to generate overall principles. The expert then inspects the website to determine how well it conforms to these heuristics. Nielsen and Molich [pdf] have the best known ten heuristics, although there can be more, such as the twenty from Barker and Weinschenk. While having only a few heuristics makes it easier to apply them, too few can be too vague and hard to interpret. For example, how do you determine whether a system term really matches the real world when you’re dealing with abstract concepts and functions? And what about problems that fall outside of prescribed heuristics? Are the heuristics really necessary?

Expert Review (Solo or in Teams)

While most researchers report performing heuristic evaluations, we’ve found they’re in fact performing an expert review. This involves varying levels of thoroughness with often only a single person inspecting an interface for potential problems, sometimes with tasks and guidelines, but often informally as a way to catch any obvious issues based on their personal experience or knowledge of the product. We’ve found expert reviews can be effective, but usually with some set of guidelines or checklist (even just functional areas) and especially with multiple evaluators. Expert reviews can get a bad rap as they can be seen as a sort of ad hoc subjective review based on the opinions of one person. The best “experts” are those who have knowledge of the product, the users, the context, tasks, and experience with HCI principles, and ideally have observed hundreds of users on similar interfaces.

Guideline Review

A guideline review involves having an evaluator compare an interface against a detailed set of guidelines. Guidelines can be used for creating an interface (typically used by designers and developers) or evaluating it for compliance (typically performed by usability evaluators). Guideline reviews predate the web but became more popular with the increase in graphical user interfaces (GUIs). One of the best known and most comprehensive set of guidelines is sponsored by the U.S. Air Force and MITRE Corporation. Published in 1986, Guidelines for Designing User Interface Software contains 944 mostly usability-related guidelines (Smith & Mosier, 1986). Apple released its Human Interface Guidelines[pdf] one year later followed by Microsoft [pdf] in 1995.

Guidelines are more granular than heuristics and can be seen more as a checklist than shortcut principles. We’ve developed our own set of guidelines called the Calibrated Evaluator’s Guide (CEG) that uses 106 guidelines for evaluating websites. The guidelines were created by combining other lists of guidelines and refining them based on their ability to differentiate good sites from bad sites, to which evaluators can more reliably respond.

Perspective-based UI Inspection

The perspective-based UI inspection is similar to an expert review but focused on one user/persona or task perspective. Evaluators go through the interface with the consideration of each perspective, such as a power user, new user, or elderly user. There is some evidence that assigning evaluators (either in an expert review or heuristic evaluation) a perspective may find more problems, although the total number of problems found itself may be a problematic metric as famously described by Gray and Salzman. And clearly defining what the perspectives are can be difficult in the absence of data. The PURE method extends the idea of perspectives by including a user type that clearly identifies the assumptions about the user, specifically their technical, product, and domain knowledge.

Cognitive Walkthrough

When you learn to use a new product, do you sign up for a training course or walk through the “what’s new” wizard? Like most people you probably skip the prescribed paths and learn by exploring. The idea behind the cognitive walkthrough is to understand and evaluate that learning by exploring processes using cognitive science principles. A cognitive walkthrough is similar to a heuristic evaluation but with the emphasis on task scenarios that new users would likely perform with the software or website. Prior to conducting a cognitive walkthrough, the evaluator must first identify the users’ goals and how they would attempt to accomplish the goals in the interface. An expert in usability principles then meticulously goes through each step, identifying problems users might encounter as they learn to use the interface. The cognitive walkthrough has evolved since its introduction in 1990, most notably by reducing the number of questions a reviewer has to answer from eight to four to two.

Pluralistic Walkthrough

A pluralistic walkthrough is sort of a hybrid expert review and usability evaluation (and is the exception to the “no users” distinction of inspection methods). It was developed by Randolph Bias in 1994. A facilitator walks a few (4–10) representative users through screenshots or prototypes of an interface for each step of a task (similar to the cognitive walkthrough) and asks the users what step they take next.

In addition to the facilitator and representative users, a separate notetaker records the steps, a product expert answers questions and serves as a sort of live “help document,” and three to four evaluators (like in the heuristic evaluation) observe and record the problems. Including users directly with product evaluators can take some guesswork out of what users might do (such as with the cognitive walkthrough) but it’s unclear whether those same users would be better used in an actual usability test instead.

Formal Usability Inspection

The idea of using more roles is extended in the formal usability inspection. It essentially combines aspects of the heuristic evaluation and cognitive walkthrough (user profiles, task scenarios, and guidelines) into a more formal way of going through the process with specialized roles using many more people. The roles are

Moderator: A sort of lead inspector who calls the shots (like the Principal Investigator).

Author: Responsible for the deliverable and not an evaluator.

Recorder: The notetaker.

Inspectors: Multiple evaluators (4–8) who have similar qualifications as the evaluators in heuristic evaluations and cognitive walkthroughs (knowledge of the product, domain, and HCI principles).

Observers: People who contribute little but act as witnesses to the inspection process.

And if that isn’t enough, there are also two optional roles:

Reader: A person who presents the material in small chunks, a sort of objective proctor. Readers are probably best for dense things such as functional specifications docs or requirements.

Standards bearer: Someone who ensures any standards/guidelines are followed and can act as a consistency check on the inspectors.

The formal usability inspection may seem like a relic of big businesses with big budgets and seems less well-suited for an agile development process and for smaller dev teams where there aren’t even that many people to fill all the roles. It may be better suited for mission critical software. This resource intensiveness may be one of the reasons there hasn’t been much written about it since its publication in 1994. However, all the roles defined do have value and may be more efficiently applied by having fewer people perform the same roles. Maybe a semi-formal inspection method will emerge (no jackets required).

Thanks to Chauncey Wilson for providing comments on this article.

[mc4wp_form id=”3053″]


    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top