Most usability testing involves finding and fixing problems as part of an iterative design process to make an interface more usable. It is typically called a Formative Usability Evaluation. In contrast, a Summative Usability Evaluation describes the current usability of an interface—as measured by things like task times, completion rates and satisfaction scores. Summative tells you how usable an interface is and formative tells you what isn’t usable.
The terms come from Educational Theory where they are used in the same way except to describe student learning (formative—providing immediate feedback to improve leaning vs. summative—evaluating what was learned).
In general I think the terms are helpful and they are now part of the usability vernacular, however, I do have a major problem with an association. At some point practitioners began associating quantitative metrics with summative evaluations and use these terms interchangeably. Since most usability activities are formative this provides a good excuse for not using metrics or quantifying the usability.
That’s a bad thing. Metrics are not exclusive to summative evaluations or benchmarking. Here are two examples
- Completion rates and task times can be valuable diagnostic measures during formative-type tests even when participants are thinking-aloud while they attempt tasks.
- You can estimate how much of the user-population is likely to be affected by problems you observed by using confidence intervals and the binomial probability formula.
I’ve heard practitioners say they don’t have time for summative/quantitative evaluations since they’re too busy testing designs. I’ve never done a summative evaluation and not collected UI problem information. If we found problems that needed to be fixed, we fed them back to the development team to address. In fact, I’d argue there really shouldn’t be a pure summative assessment. It’s just bad practice to ignore problems users have even if your main goal is to gather a usability benchmark.
|Photo by sheilaellen|
|A Formative Usability Test?|
It would be one thing if a purely qualitative services generated consistent and reliable results across practitioners. The CUE studies show that such methods from independent teams are highly variable. While I think almost any usability activity will generate a more usable interface (different ways up the same mountain), there are good and bad practices. The labels formative and qualitative should not mean disguising haphazard efforts, opinions and anecdotes for a legitimate method.
So I’ve come to think of formative and summative as a simplified way to characterize usability activities even though in practice there isn’t as much of a bifurcation. I see them as harmful in that they perpetuate a misconception that quantitative data should only be collected with summative evaluations—which is to say rarely if ever. I advocate disassociating metrics from summative evaluations. Metrics should be collected during any usability evaluation.