How To Measure Learnability

Jeff Sauro, PhD

Learnability is often used interchangeably with usability.

While they are similar concepts, learnability is actually something a bit different.

Part of the confusion is that there are two common uses of the term learnability.

The first use of learnability describes the ability of an interface to allow users to accomplish tasks on the first attempt.

We often refer to this as usability for first time use. Nielsen also defines learnability as easy first time use but lists learnability as a sub component of the construct of usability.

Measuring usability under this definition is basically using our classic usability metrics and measuring task performance for users who have never been exposed to a system or at least have very little exposure to the tasks and interface, even if they’ve used it before. Most usability testing falls into this categorization.

A second definition of learnability is usability over time. Basically, task performance, which is also measured using the classic usability metrics, improves after repeated “trials.” More practice results in less time needed to complete tasks. Typically, the improvement isn’t linear, but logarithmic.

A more learnable system is one that reduces the time it takes to complete tasks as users spend more time with a system faster than others. This can be especially important in instances when a certain amount of training is expected or required with an application. For example, enterprise accounting systems require some expectation of training to learn the organizational rules of bookkeeping.

One criticism of usability testing is that it can be an unfair assessment of actual usage if users don’t have a chance to get acquainted with the interface. This is especially understandable when specialized training is required. In my experience, many applications and websites fall somewhere between the extremes of walk-up-and-use museum kiosks and highly specialized manufacturing order entry systems. Collecting usability metrics over multiple trials helps settle disputes about usability and provides data on first time use and use with practice.

Learnability of Expense Reporting Applications

For example, a few years ago, we were testing two expense reporting web applications. If you work at a large company or a consultancy that tracks expenses and hours, then you probably have familiarity with expense reporting systems. While the basic process of submitting expenses to get reimbursed is a walk-up-and-use application, many companies have specific rules and idiosyncrasies that require some getting used to.

The two expense reporting systems we tested supported the same functionality but had rather different interfaces. It was expected that most employees using the system would have some introduction to it as well as a few discussions with a manager about where and how to submit expenses. We wanted to know: Given a brief introduction to the systems, which one would be more usable learnable? That is, after repeated use (trials), which application would enable users to be more efficient?

To test the systems, we had 26 users who submitted expense reports in various applications attempt the same set of five core expense reporting tasks on both systems. We provided the users with a short video introduction on how to submit the expense reports using both systems prior to their initial attempt on each system. They only received this training once.

Each user repeated the five tasks three times. The application and task order were both counterbalanced to minimize sequence effects. The tasks including submitting an expense report, updating a report, and verifying expenses were paid.

At the end of the tasks, we also administered the System Usability Scale (SUS) for each application in order to get an overall sense of perceived ease of use.

Results

The graphs below show the mean time to complete the tasks for three of the five tasks attempted. The mean time is on the vertical (y axis) and each of the three attempts (called a trial) is on the x-axis.

Figure 1: Mean time to complete tasks as a function of trial for three of five tasks in two comparable expense reporting systems. Product O had generally faster performance.

You can see in all three tasks the downward slope of the lines. This indicates that users are performing the tasks faster as they get more practice. You can also see where the term “learning curve” comes from. When graphed this way, a steeper learning curve represents faster learnability, contrary to the more common use of the term indicating a harder to learn task.

You can also see that users can generally complete tasks faster on Product O (the blue lines). The faster performance was consistent with the perceptions of usability. The SUS score for Product O was 82, and for Product P the score was 53 (See Page 65 in Quantifying the User Experience for the raw scores).

We also found task-difficulty ratings closely mirroring the task-time. There wasn’t much difference in the completion rates largely because we provided some clue as to how to complete the tasks (which isn’t always the case and the subject for another blog.)

One thing we looked for when measuring the repeated trials was whether users of the slower product would ever “catch-up” to users of the faster product. We were looking for converging or crossing learning curves.

The closest we came was on the second task. The graph below shows the same task as the middle graph above but with the axis rescaled to emphasize the change.


Figure 2: Mean task time for task 2 by trial with a scaled y-axis. Product P has initial faster task time (usability) but falls behind on subsequent trials. At this sample size the differences in mean times are not statistically significant.

Product P actually had a faster task completion time on the initial trial, although the difference wasn’t statistically significant. However, on the two subsequent trials, Product O showed better performance, although this is also not statistically significant.

This study illustrates how to measure learnability in a lab based setting. The tasks took between one and three minutes to complete, and so we were limited with how many trials we could perform. The session length was already running between two and two and half hours so our learning curves were almost as short as they get (two being the absolute minimum number of trials). When testing reaction times to menus[pdf] or other quick decision making tasks the learning curves become more pronounced.

The study did provide us with sufficient data on first time use and repeated use, and it allowed us to see how much improvement in time, errors and perceived difficulty we could expect after a few months of usage. In most cases, the third trial had a statistically faster task-completion time that the first attempt. In a few cases there were dramatic reductions in the task time (often a 50% reduction).

This allowed us to discuss the performance of initial use (the common usability test) as well as repeated use, and we were able to identify interface problems that led to persistent interaction problems even after users were familiar with the interface.

The next time you find yourself in a discussion about the biases of testing initial use as a measure of usability, consider including at least one repeated task to get an estimate of learnability. In the same study, you can collect data to understand how the application supports both initial use and usage over time. Both your novice and experienced users will learn to thank you for it.

0
    0
    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top