Many researchers are familiar with the Hawthorne Effect in which people act differently when observed.

It was named when researchers found workers at the Hawthorne Works factory performed better not because of increased lighting but because they were being watched.

This observer effect happens not only with people but also with particles. In physics, the mere act of observing a phenomenon (like subatomic particle movement) changes the outcome.

The Observer Effect

While there’s been some question about the actual details of the now infamous Hawthorne experiment and people are not subatomic particles, there has been strong evidence for other sources for this aptly named social facilitation, audience effect, or more generally the observer effect: people tend to act differently when being observed.

The “White-Coat” response has shown that a patient’s blood pressure rises from the psychological effect of the office visit (who likes going to the doctor?). The effect also differs depending on the gender of the observer and participant. Research has found, for example, women appear more physiologically affected by social rejection, whereas men react more to achievement challenges.

Interestingly, the behavior isn’t always consistent. There’s some evidence that when people perform rote and simple tasks, performances tend to improve when being observed. Conversely, performance tends to degrade when tasks are complex and less familiar.

For example, performance of skilled pool players improved when being watched, whereas performance from less skilled players worsened. This effect is also mirrored with some of our favorite insects. Cockroaches were able to complete a simple maze faster when in the presence of other cockroaches, but had a hard time with more difficult mazes.

The Observer Effect in Usability Testing

While most of us are less concerned with playing pool or organizing bug races, usability testing at its core is all about observing behavior from participants. We use both monitoring equipment (screen recordings, face and audio recording) and physical observation, with both a facilitator and often other observers (both co-located or remotely observing). Does all this observing affect the results? If so, how much? There have been some studies in the literature to shed light on this question.

Observing Affects the Error Rates

In 2005, Harris et al. had 100 student participants attempt easy and hard tasks in Microsoft Word. Participants were assigned one of four conditions:

  1. Facilitator Group: Facilitator seated behind participant.
  2. No Facilitator Group: No facilitator in room (participant still told being recorded).
  3. Facilitator & Reminding Group: Facilitator seated with participant and with a constant reminder of being watched and electronically recorded.
  4. Facilitator & Design Team: Participant introduced to the design team watching in the observation room (and shown the TV feed).

These researchers did indeed find differences in error rates between groups, but not in a clear pattern. In somewhat puzzling results, they found the No Facilitator Group (2) and Facilitator & Reminding Group (3) had significantly fewer errors that the other two groups (1 and 4).

Monitoring Equipment Increases Stress

In another study, Grubaugh et al. (2005) examined the effects of monitoring equipment. They had 150 university student participants assigned to different lab setup conditions to evaluate the Microsoft OneNote program. Each setup altered the amount of monitoring equipment:

  1. Extra Monitoring: A usability lab with cameras on tripods and audio recording all around the participant.
  2. Classic Usability Lab: One-way mirror and no visible cameras or audio recording around participant.
  3. Reduced Lab: No one-way mirrors, windows, or any visible recording devices.

The researchers found higher error rates in the condition with the most intrusive monitoring equipment (Group 1) compared to the least (Group 3).

Observers Affect Heart Rate & Metrics (Sometimes)

In 2009, Sonderegger & Sauer had 60 participants attempt tasks on a mobile phone prototype in one of three conditions:

  1. Facilitator with two observers in the same room
  2. Single observer in the room (classic usability test)
  3. Electronic setup (no facilitator or observer in the room)

In something that sounds like it’s from a 1950s psychology lab experiment, they wired participants to track heart rate variability (a measure of stress) and also tracked several usability metrics (perceived ease, task completion, time, interface attractiveness, and participant’s emotions).

They found that the presence of the two additional observers (a mix of men and women) had a negative impact on participants. The non-interactive additional observers led to:

  • More stress (decreased heart rate variability)
  • Lower task completion rates
  • Longer task times
  • Less positive emotional affect

There was no difference in perceived usability (using the PSSUQ) or attractiveness.

 

In 2014, Uebelbacher had 80 participants attempt easy and difficult tasks on both a prototype and full finished tour-guide app on an iPhone 3. Participants were assigned one of two conditions:

  1. No observers or facilitators in the room
  2. Facilitator plus two observers (one man and woman)

No differences in task completion, task time, perceived ease (from the PSSUQ), or emotional ratings were found. They did however find a difference in physiological measures (heart rate variability). They were unable to replicate the findings from Sonderegger & Sauer, but did show that heart rate variability was a sensitive measure of stress. They noted:

“When observers were present, there was a much stronger increase in mean heart rate from resting phase to task completion phase (+5.4bpm) than when participants were working on their own.”

Uebelbacher additionally noted that participants with observers present reported feeling more disturbed by observation, but less so with older participants who indicated that they generally felt less observed than younger participants (a potential age effect).

Summary

Here are some findings and takeaways from the research on observers on usability tests.

  1. The type and number of people observing a usability test likely has some effect on the physiology of the participant (e.g. heart rate).
  2. While the heart rate may increase as a sign of increased stress, it’s unclear to what extent heart rate affects usability data.
  3. Participants may actually perform better on some tasks and worse on others, but more research is needed to confirm this effect.
  4. Additional research is needed to understand how age and gender may moderate observation effects (e.g. female versus male observers and older versus younger participants).
  5. The more intrusive the monitoring equipment, there’s likely an increase in stress levels and potentially an effect on performance (e.g. more errors).
  6. When setting up a usability lab, minimize the conspicuousness and intrusiveness of the monitoring equipment as much as possible.
  7. Have observers who aren’t interacting with the participant watch in another room or remotely. In our usability tests, clients watch from an observation room or off site using GoTo Meeting and a live YouTube stream.
  8. Be sure your facilitators and observers aren’t wearing white lab coats.