A 6-Hour Usability Test in an Agile Environment

Jeff Sauro, PhD

December 2, 2014

In the fast-paced world of Agile development, where it’s difficult to find time to get data from users, unmoderated remote testing gives us a way to quickly collect feedback on interface design.

For example, I recently worked with a web-app product team to determine whether users find their new file manager easier to use than the previous one.

We started at 10 a.m., and wanted an answer by the end of the work day.

Here’s how the day went:

Planning

Determine tools

We started with tool selection. We decided to evaluate our designs using UserZoom, the popular unmoderated remote usability-testing tool.

Determine what to test

With only a few hours to conduct our study, we boiled it down to the essentials. We narrowed our focus from several pieces of functionality and diverse user groups to two key design changes.

Select Users

The typical user for this web-application is a student enrolled in an online course, and the tasks require no training. The app is like Dropbox—meant for the general public—so we felt that a test group from the general Internet population would be representative.

Select Tasks

We chose two tasks: uploading a file and downloading a file. Because we were running a remote unmoderated study, where verifying task completion can be difficult, we had participants locate text in the files and then answer a verification question in UserZoom.

Select the Study Type (Between Subjects or Within Subjects)

With little time to collect a larger sample size, we chose a within-subjects study, which allows you to detect statistical differences with smaller sample sizes than a between subjects study. (If you’re not making a comparison, you skip this step.) Participants would attempt two tasks on the old interface and two on the new one, and the tool would randomize the interface order.

Define Metrics

For each task, we knew that we wanted to collect core usability metrics for effectiveness, efficiency and satisfaction:

Task completion: Task completion would be assessed based on the number of correct responses to the verification question. We’d also verify task completion by watching session replay videos.
Task Time: UserZoom automatically collects time data.
Satisfaction: After each task, we wanted participants to answer the seven-point Single Ease Question (SEQ).

Develop Pretest Questions

We developed a few demographic questions to get some idea about the participants. (Because this was a general-purpose task with loose recruitment criteria, the team didn’t need to know much about demographics.)

Develop Posttest Questions

We knew that we wanted to ask participants to select which of the two file managers they preferred. This is one of the advantages of a within-subjects approach: you can’t get this direct comparison of designs with a between-subjects approach.

We didn’t collect and an overall measure of usability, as we could with SUS or SUPR-Q, because our study would examine only a narrow slice of functionality and the post-task ease question would be sufficient.

Study Programming

With the users, tasks, questions, and metrics defined, we programmed these details into UserZoom.

Pretest

Next, we had someone in our office—someone unfamiliar with the tasks and the study—run through the study and think aloud. This led us to reword a few tasks and questions. We were ready to launch the test.

Data Collection

We asked our favorite panel, OP4G, to send participants to take the study. We also brought five participants from Usertesting.com to simultaneously do the study in UserZoom. These participants are trained to think aloud while taking studies, which gives us insights into mental models (similar to what we get using moderated testing).

Over lunch, as the participants completed the study, we discussed the merits of testing and the challenge of balancing feature additions against the effort needed to be sure that the core functionality works.

Analysis

We came back to data from 34 study participants. For the next 90 minutes, we pored over the data and watched the videos from Usertesting.com and UserZoom.

Determine Statistical Significance

We compared the task metrics and then looked at answers to the preference question. We found a statistically higher completion rate, faster task time, and higher task-ease question for one task (downloading a file). The difference we observed on the upload task was not statistically significant.

We also found that participants slightly preferred the new version; but this preference was not statistically significant.

Watch Videos

The videos indicated which interface elements were working and where users were getting lost. They also gave us ideas for the next tests.

Happy Hour

While every metric didn’t indicate statistical significance, we had enough evidence that the design was moving in the right direction, and we could see how it could be further improved. Given these results, limited development resources should probably be applied elsewhere …but that was a question for another day. Off to get a drink and talk about the next test!

Conclusion

Agile development often leaves little time for usability testing. By narrowly focusing your research questions and tasks and using unmoderated testing tools, you can, in a short time, answer design questions with data from users. When working under time constraints, break large studies into smaller ones, and quickly find out what doesn’t work—aligning with the spirit of Agile and Lean methodologies.

A 6-Hour Usability Test in an Agile Environment