What Are the Different Types of Synthetic Users?

What Are the Different Types of Synthetic Users?

What Are the Different Types of Synthetic Users?
Jim Lewis, PhD and Jeff Sauro, PhD

Feature image showing 5 different AI bots representing 5 synthetic user typesRecruiting participants for research is expensive. It’s also rife with problems: Are these people really who they say they are? Are they actually paying attention? Or is the data from some survey farm where people click through and make money?

AI is disrupting UX research. But the disruption is leading to more software, not less. The need for insights into how people will use that software isn’t going away.

But can AI help? Can we use AI to synthesize people’s attitudes, beliefs, and behaviors? Instead of trying to find the right people to take surveys, could these synthetic users generate insights faster and at almost no cost? News about synthetic users would certainly make headlines. And they do.

But what exactly is a synthetic user? Is that the same as a digital twin? Or a synthetic persona?

To properly assess the effectiveness of AI tools, we think it’s important to have a good understanding of the terms and how they fit together.

In this article, we propose a preliminary taxonomy of five distinct types of synthetic users, organized by how grounded they are in real human data. Before we get to the taxonomy, though, it helps to ask a question that sounds simpler than it is.

What Birds Can Teach Us about Synthetic Users

How do we know that a bird is a bird?

Is it because a bird can fly? Well, bats are mammals that can fly, while penguins are birds that can’t fly.

Is it because they lay eggs? Platypuses are mammals that lay eggs (as do most reptiles, amphibians, fish, and arthropods).

Maybe it’s because birds have feathers rather than scales or fur? That might be true in the present, but in the past, many dinosaurs not in the lineage leading to birds are now known to have had feathers.

The answer, as Linnaeus worked out in the 1700s, is that no single feature is sufficient. A bird is defined by a constellation of characteristics (feathers, beak, two wings, two feet, warm blood, hard-shelled eggs) organized within a hierarchy of kingdom, class, genus, and species. Figure 1 shows how that plays out from the animal kingdom down to a single species. And this is our guide for classifying and understanding synthetic users.

Figure 2: Classification of different types (species) of synthetic users.

Figure 1: Example of classification from the animal kingdom to species of stork.

Synthetic Users: More of a Genus than a Species

The topic of classifying different types of synthetic users is in a state of flux (lots of labels, overlapping meanings, vendor-specific definitions). Despite this, in Figure 2, we attempt a preliminary classification scheme similar to Figure 1 for five types (species) under the genus of Synthetic User.

Classification of different types (species) of synthetic users.

Figure 2: Classification of different types (species) of synthetic users.

Table 1 lists the identifying characteristics for each of these types of synthetic users, primarily focusing on the type of data used to create the synthetic user and how grounded the synthetic user is in actual user data.

Synthetic User TypeIdentifying Characteristics/Descriptions
AI Proto PersonaThis is the weakest (least grounded) type of synthetic user generated with simple role-playing prompts (e.g., “You are a world-class Python programmer”). This method produces preliminary user profiles based on broad assumptions rather than research.
Demographic BasedPrompts specify age, gender, occupation, region, etc. to approximate group-level tendencies. This method is somewhat more grounded than a proto persona but is still limited in the quality of its output, especially when demographics have only weak relationships with research topics (e.g., much UX research).
Persona BasedPrompts focus on richer persona paragraphs (e.g., “Bill G. is a 27-year old male graphic designer who always has his sketchbook at hand, has a track record of being creative and innovative, and is up-to-date on current design trends. How would he complete the following questionnaire?”). Because these synthetic users are still weakly grounded, they are limited to approximate group-level tendencies.
Research GroundedPrompts refer to actual research artifacts with traceable sources but do not attempt to model individual human responses. These are based on actual interviews, survey results, analytics, customer-support logs, or other user data that are typically not available to publicly generated LLMs.
Digital TwinsPrompts refer to rich individual-level data for the purpose of modeling each individual in a dataset. This approach has the strongest grounding in actual user data but its accuracy in real-world deployments is still an open research question.

Table 1: Brief descriptions of types of synthetic users.

Discussion

In this preliminary taxonomy, we’ve defined five types of synthetic users: AI proto persona, demographic based, persona based, research grounded, and digital twins, based on differences in the types of data (e.g., demographic, persona) and the strength of the relationship between the synthetic user and human user data.

Preliminary taxonomies change over time. In this article, we’ve used the levels originally defined by Linnaeus because they were adequate for our purpose. Modern biological taxonomies have eight levels (Domain, Kingdom, Phylum, Class, Order, Family, Genus, Species), and the number of kingdoms has increased to six (Bacteria, Archaea, Protista, Fungi, Plants, Animals).

We fully expect changes to our classification scheme over time, but it’s a start.

For example, we have not included generative agents in this taxonomy because they are qualitatively different from synthetic users that simulate responses and are more like simulated actors, trying to model what people might do over time. This may eventually become its own branch from the genus of synthetic users, separate from the synthetic respondents. Time will tell.

Just like how there are hybrids in the animal kingdom (e.g., mules, ligers), in practice, there may be hybrids of different types of synthetic users. For example, in Bisbee et al.’s “Synthetic Replacements for Human Survey Data? The Perils of Large Language Models” (2024), the researchers used the following prompt to elicit 30 synthetic responses for each of the 7,350 human respondents for each of the 7,350 human respondents in the 2016–2020 ANES survey to get a final dataset with 3,614,400 responses:

It is [YEAR]. You are a [AGE] year-old, [MARST], [RACETH] [GENDER] with [EDUCATION] making [INCOME] per year, living in the United States. You are [IDEO], [REGIS] [PID] who [INTEREST] pays attention to what’s going on in government and politics. Provide responses from this person’s perspective. Use only knowledge about politics that they would have.

Each bracketed item is a variable with values corresponding to a real respondent in the two waves of the ANES. For example, [YEAR] was 2016 or 2020, [AGE] matched the selected respondent’s age, [MARST] was marital status (e.g., married, divorced, single), [IDEO] was political ideology from extremely liberal to extremely conservative, [REGIS] was voter registration status, and [PID] was party membership (Democrat, Independent, Republican).

Thus, this is a hybrid between demographic- and persona-based types with a light sprinkle of digital twinning. It’s less than a fully research-grounded respondent or digital twin because the prompt doesn’t include access to the respondent’s prior answers, interview transcript, open-ended comments, voting history, occupation, or religion. It uses selected ANES variables as conditioning attributes and asks the LLM to answer from that perspective (multiple times for each human respondent).

Summary

Our key conclusions from this exercise are:

“Synthetic users” is more of an umbrella term (like a genus) than a type (like a species).

All five types we’ve described can fall under the umbrella of synthetic users. In practice, that means when we talk about synthetic users, it’s like talking about storks. There are a variety of storks, so knowing which bird we’re talking about helps move the conversation forward.

Key criteria for discriminating among types of synthetic users include data type and grounding.

The types we’ve defined differ in the kind of data used to model responses (e.g., demographic, persona) and the extent to which they are grounded in real user data.

Taxonomies change over time.

We consider this article a necessary exercise in a preliminary taxonomy of synthetic users, but fully expect it to evolve over time, maybe quickly due to rapid changes in these technologies.

0
    0
    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top