Where Do UX Research Methods Come From?

Jeff Sauro, PhD

UX professionals use many methods to help understand and improve the user experience.

Among the most popular are usability testing, expert reviews, surveys, and card sorting.

But where did these methods come from?

The field of UX research is relatively new, but its methods are not.

And while UX methods may have new names, many of these methods are specialized adaptations of methods with roots in other fields, well back into history.

When you understand the fields where the methods originally came from and how they’ve been adapted, you can effectively use them in UX research.

Here are twelve UX research methods and some of their interesting roots.

Think Aloud Usability Testing and Freud’s Couch: The signature method of having users think aloud while using an interface can trace its roots back to psychoanalysis and work from Freud, Wundt, and Skinner. Usability testing shifted the focus from “fixing” people to using people to fix an interface. There is a rich history of having people speak their thoughts as a means to understand problems, such as the foundational work by Ericson and Simon.

Surveys and Mazda (not the car): A census is a form of survey that dates back to 3800 BC and was even written into the U.S. Constitution in 1787. Not surprisingly, then, some of the first surveys were used to understand who people intended to vote for president (for the 1824 election).

A more modern form of the market research survey can be dated to the early 1900s. R.O. Eastman (not to be confused with George of Kodak fame), at his own research firm, conducted surveys for Cosmopolitan Magazine. One was a survey conducted for General Electric to determine the significance of the term “Mazda,” which they then used for a lightbulb.


Benchmark Studies and Land Surveyors: To know if you’ve improved an interface you need a reference point. UX measures are made more meaningful when compared to an earlier version, a competitor, or an external reference. This act of using an external reference—or benchmark—originally comes from land surveyors who would cut a mark into a stone to secure a bracket called a “bench” when building. More contemporary uses are benchmarks for computer performance.

Quantitative Studies and Farming: Often loosely referred to as large scale or quantitative studies, these include design comparison studies (which design improved brand lift?), A/B testing, or multivariate tests on live websites or apps.

These types of studies can be seen as experimental designs, which have a strong history in the behavioral sciences. For example, see Cook and Campbell’s designations, including quasi-experimental designs. Many methods used to analyze experimental data, such as the Analysis of Variance (ANOVA), actually have roots (pun intended) in agriculture and farming where Ronald Fisher used techniques such as splitting plots of farmland to improve the yield of crops.

Formative/Summative Tests and Educational Testing: The terms “formative” and “summative” are used to loosely differentiate types of usability studies. Formative studies are typically focused on finding and fixing problems whereas summative studies (often incorrectly called quant studies) are used to understand how usable an interface is. These terms can be traced to Michael Scriven who wrote about these studies in educational measurement. Again, the focus has shifted from diagnosing problems and measuring performance in people to an interface.

SUS, SUPRQ, and SAT: The SUS and SUPR-Q are standardized questionnaires developed with what’s called classical test theory in the field of psychometrics. As the name suggests, it was originally developed for creating tests like the Scholastic Aptitude Test (SAT). Recent versions of the SAT, GRE and GMAT, use a more modern services, called item response theory (IRT), which we also used to develop the SUPR-Qm.

Cart Sorting and Cognitive Impairment: Card sorting is an empirical method (included in MUIQ) that has participants sort “cards,” which are typically navigational elements. Some original uses of card sorting came from measuring cognitive ability (or impairment). And like usability testing, card sorting shifted its focus from measuring people to using people to measure the interface.

The current use of card sorting is most similar to the Q-sort technique described by Stephenson in 1953, which used a single subject or multiple small samples of subjects to sort items. It was then described by Canter in 1985 and even applied to map making.


Heuristic Evaluations and Hardware Development: What’s loosely referred to by many UX practitioners as an heuristic evaluation is actually a form of expert review, also called an inspection method. The heuristic evaluation is a specialized form of an inspection method created by Nielsen and Molich (1990). Multiple evaluators familiar with UI best practices, and ideally the product as well, review an interface against a set of principles (called heuristics). The idea of examining an interface for problems can be seen as a natural extension of the software inspection method, often called a Fagan inspection [pdf] after its founder Michael Fagan. Fagan describes the process of inspecting software requirements and code as a natural evolution of his experience with hardware engineering and manufacturing, where many mistakes are caught from inspections even after thorough testing had been conducted.

PURE and the Industrial Revolution: The Practical Usability Rating by Experts (PURE) is an analytic method that approximates UX metrics such as task time and perceived ease. It builds off time and motion studies from the industrial revolution, which were then adapted to work with computers at Xerox PARC in the 1970s (and further applied to GOMS/KLM). While technology has changed from assembly lines and garment workers to websites and mobile apps, the same idea of estimating skilled user performance to improve efficiency and reduce effort hasn’t changed.

Testing with Five Users and Gambling with Dice: The formula for deriving the sample size needed for discovering problems in an interface uses the binomial probability formula. While mathematically known for centuries, its more modern use is derived from the French mathematician Blaise Pascal who applied it to understanding the chances of winning dice games.

Confidence Intervals and WWII: The confidence interval itself is a relatively modern creation attributed to Jerzy Neyman from the 1930s, but the components of the confidence interval dates back a hundred years earlier to 1812 from work by La Place and then applied by Abraham Wald for work in the US military in the 1930s. Abraham Wald’s work famously helped make planes safer by mathematically predicting the weak points on planes (by not looking at where the bullet holes were in returning planes, but where they weren’t). The current binomial confidence interval we recommend for small sample UX research is named after Abraham Wald (called the Adjusted Wald Interval).

Eye Tracking and Reading: Eye tracking is a specialized technique that will show you where people’s eyes fall—such as on advertisements or on website product pages. It was originally used to understand how people read.



    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top