While we often talk about usability tests as if there is one type of usability test, the truth is there are several varieties of usability tests.

Each type addresses different research goals.

Don’t confuse the five usability testing types with the interface type or the testing modes.

Interface types are mobile (website or apps), desktop (software or website), or a physical device (like a thermostat).

Testing modes or methods are the ways to conduct a usability study on any interface type. There are three main testing methods; each has its strengths and weaknesses :

  • Moderated in-person: A facilitator is co-located with the participant (often in a lab).
  • Moderated remote: The participant and facilitator are in different locations. Screen sharing software (like GoTo meeting or WebEx) allows the facilitator to remotely watch the participant attempt tasks with software or a website and allows for probing on problems.
  • Unmoderated remote: Software from UserZoom or Loop11 administers tasks automatically to participants around the world. You can collect a lot of data quickly and for a fraction of the cost of in-person testing. In many cases you have a recording of the participant’s screen and webcam, but there’s no way to simultaneously interact with all participants.

With the type of interface and different testing methods in mind here are the five types of usability tests, each addressing a different research goal.

Problem Discovery

A problem discovery usability test is the most common type of usability study. The goal is to uncover (and fix) as many usability problems as possible. When you have a handful of participants attempt a few realistic tasks you can uncover the most common issues.

Problem discovery studies are often called formative studies. The term formative comes from educational research where it’s used to describe testing as a method for diagnosing problem areas in a student’s learning. The same idea applies to an interface: From the evaluation, what problems can be found and corrected to make the interface easier to use?

Problem discovery studies are usually conducted using a moderated approach, where a facilitator can really dig into the actions and utterances of the participant to uncover problems. You can still conduct a formative study using the unmoderated remote method if you have participants think aloud and have their screens and webcams recorded.

One of the most important concepts in all of UX research is the idea of using problem discovery studies iteratively throughout product development.


Problems were found and new designs were created. But did those design changes actually make the interface easier to use? The goal of a benchmark study is to answer the question: How usable is the interface? Measuring the usability of an interface before design changes are made allows you to set a benchmark to compare future designs against.

Benchmark studies are often called summative studies, which also comes from educational research. A summative evaluation is the more familiar (and often maligned) method of testing students to assess how much a learner knows (usually via a standardized test).

We work with many companies that perform benchmark studies annually. This type of historical data really provides context to assess the impact of design changes. Quantifying changes is also one of the essential pillars of UX research.

Benchmark studies require a larger sample size in order to get tighter confidence intervals. Consequently they are usually conducted using the unmoderated remote mode if it is a website benchmark.


Collecting benchmark data tells you how usable your website is and how users are performing on key tasks. A stand-alone benchmark generates a lot of data, but without a comparison to an earlier benchmark you’re often left wondering “are these results good or bad?” To provide a meaningful comparison, have another set of participants attempt the same set of tasks on competing websites or products.

There are usually complaints about the realistic nature of the task scenarios or that some tasks may be “unfair.” Every usability study comes with some level of artificiality that can threaten the validity of the findings. One of the best ways to put your task scenarios and metrics into context is to see how you stack up against the competition (or previous designs).

Because you’re comparing, you’ll need a larger sample size to detect differences; for that reason, many comparative studies are done using an unmoderated remote approach. You can conduct comparative moderated remote or in-person studies; just plan on dedicating a lot of time for moderation and consider a within-subjects approach (where the same participants attempt the same tasks on all competitors).


Where people look and where they click are similar, but not always the same. When you need to understand both where participants eyes are drawn in designs and the sequences of gaze paths, an eye-tracking study is the way to go. The current technology only supports in-person testing.

Eye-tracking studies are also time-intensive. In addition to the usual recruitment and facilitator time, you should plan 10 minutes of analysis time for every 1 minute of eye-tracking data collected with participants. You can conduct eye-tracking on just about any device, although we mostly limit it to desktop and tablet websites and apps.


Most usability studies assess first time use. Even when using participants who have experience with an interface, the task details are almost always unfamiliar in some way. Consequently, the findings often describe initial use, rather than usage over time. This was a common complaint from enterprise software developers who knew users would be both trained and using a system daily for years.

While training should not be a crutch for bad interface design, you can simulate both the effects of training and repeated use in a usability study.

By having participants attempt the same tasks repeatedly in a study, you can quantify the learning curve. In such studies you’re often more interested in performance data like task-time than initial impressions and problem discovery. However, if participants are still encountering problems after repeated trials of the same task (especially after some training), you have some compelling evidence for fixing problems that don’t only affect first time users. 

Learnability studies are split between a moderated in-person approach where you can control the testing environment and extraneous variables more easily, or unmoderated remote, where you can collect a larger sample size to obtain more precise estimates of your metrics.


Despite the differences in usability testing modes and flavors, all generally have the following in common:

  • They use a representative set of users.
  • Participants attempt a realistic set of tasks scenarios.
  • Data is collected about what users do and say (behavioral and attitudinal data).

The “right” method depends on your research goals and many studies involve a combination of the usability types (e.g. eye-tracking with problem discovery). We conduct every type of usability test at MeasuringU and would love to help you conduct the right ones for your goals.