Task completion is one of the fundamental usability metrics.
It’s the most common way to quantify the effectiveness of an interface. If users can’t do what they intend to accomplish, not much else matters.
While that may seem like a straightforward concept, actually determining whether users are completing a task often isn’t as easy.
The ways to determine task completion will vary based on the mode of evaluation and whether the data is collected from actual use (things users have done or self-reported) or simulated use (typically a usability test). Here are different ways you can determine whether users are completing tasks successfully.
Task Completion From Actual Use
In actual use, as the name suggests, if possible, you can determine when users completed their goals from a record of things they’ve done (actions) or self-reported data on what they’ve recalled doing.
Recording or Observing Actual Behavior
Actions users take—a purchase made, a button clicked, a URL visited, or a file accessed—all have some traceable record, and are ideal for measuring effectiveness. This data can be collected electronically from log files, Google Analytics, or other website tracking software, such as Clicktale.
A second alternative is to observe users (either in person or remotely) attempting tasks. For example, a researcher can observe a small business owner using QuickBooks to see whether the quarterly reporting function is being used correctly (task completion). Unfortunately, in my experience, due to the logistics involved in determining task completion (gaining access to enough users in context), this method seems to be the exception rather than the rule and you’ll need to consider other alternatives.
Recalling Actual Behavior
A surprisingly simple way to assess task completion from actual use is to ask users. Have participants in a survey, interview, intercept, top-task analysis, or diary study tell you whether they were able to complete an action on their most recent visit to a website or use of an app.
We find this method is particularly effective in a top-task survey. Participants first indicate which tasks are most important and then recall if they were able to accomplish those tasks on their most recent use of the software or visit to a website.
While more research is needed to determine how accurate this self-reported task completion data is, data on self-reported usability problems suggests it’s a good place to start in the absence of other data. We have, however, found self-reported task completion to be highly inflated in usability studies but still informative relative to other websites or products being tested. You’re likely to have more accurate data when the tasks are specific and attempted recently.
Task Completion From Simulated Use
Given the complexities and restrictions in measuring actual usage, the more common way to determine task completion rates is through simulating what users do in a controlled environment. This is done by creating a scenario and having users attempt it in a usability test, either moderated or unmoderated.
If you are able to observe users attempt tasks in a usability test, a facilitator or other observers can see whether users complete the required steps. Tasks should have pre-defined success criteria (e.g. a user files an invoice, locates the correct product SKU, or adds a contact to a database). But even with the most detailed success criteria, it’s hard to know all the ways participants may attempt and successfully complete a task.
Observing users allows a facilitator to judge whether unanticipated paths or actions constitute success or failure. It also allows for open-ended, less restrictive tasks that don’t require an artifact of task success (like a visited URL). For example, if you want to know whether participants can locate a TV on a retail website, participants can select any type of TV and a facilitator can determine through the participants’ comments whether the TV they selected met their needs.
Having a recording of these events also allows others to verify and reconsider task completion, making it an ideal way to determine task completion from simulated use. The problem, of course, is that it can take a lot of time and money to facilitate users and thus limits the sample size and geographical reach.
Unmoderated usability testing platforms, like our MUIQ, make collecting effectiveness data from a lot of users quickly possible. But this comes with a price: it’s difficult to determine task completion and usually requires narrowly tailored tasks and success criteria. Writing task scenarios for unmoderated studies is a topic of itself; see Beyond the Usability Lab for more discussion on this topic. Here are ways we’ve found to assess task completion in unmoderated studies.
Validate by Question
Ask participants in a study to provide information they could only get if they completed a task successfully. For example, if you ask participants to find a 4-star rated blender for under $40 on a retail website, have them provide the brand name of the blender they located. For such scenarios, there can only be a limited number of correct answers, usually one or two that meet the criteria. Task success is then determined from participants’ responses to a multiple-choice question with an “other” option. Validating by question is the most common method for assessing task completion in remote unmoderated studies and the one we recommend.
Validate by URL
A particular URL visited can serve as another indication of task success. Software (again shameless plug for our MUIQ platform) will determine task completion by matching full or partial URLs. For example, if there is only one product or page that meets the task success criteria and a participant encounters this page, the participant’s attempt is considered successful. This approach works well when assessing prototypes that have a limited number of pages or successful URLs (one or a few).
However, on most large-scale websites, there are often multiple paths users can take and many variations of product or information pages. Consequently, URLs can differ and you may end up with a higher rate of false failures using this method. For that reason, we use this as a secondary method (or in conjunction with validation by question) to minimize the false failures.
Just as in recalling of actual use, in an unmoderated study, you can ask participants whether they completed a task successfully (using a yes or no question). It’s very simple, but as we’ve found, not very accurate. Participants tend to be overconfident and/or unaware of their failure. If this method is the only one available because of logistical issues, it’s better than nothing, but at best it’s a crude indicator of relative completion rates in a competitive study.
If you have a recording of the participant’s screen from an unmoderated study, you can, for the most part, emulate the benefits of a moderated study by watching what users do and where they go on a website. Our MUIQ platform records events and the screens for desktop and mobile and native app users. While we like this feature, we don’t want to have to use this as our primary method as it takes time to review many videos.
Sometimes you can’t validate by question, URL, or session recording, but want more than self-reported completion rates. For example, participants logging in to a bank account, accessing a mobile app, or using a web app prohibits recording and the customized information makes validation by question unreliable.
Some alternatives we’ve found that can provide an indication of task success are having participants take screenshots, uploading files (or screenshots), and even sending emails with screenshots or other files. Each of these present their own technical challenges, as well as privacy concerns for participants, and so are methods of last resort. They may be your only alternative to have some objective measure of task completion.
While task completion is a simple metric (usually a binary pass or fail), determining task success is usually less simple. Assessing task completion will vary based on whether the data is collected from actual or simulated use, and in simulated use based on the mode of evaluation (moderated or unmoderated usability testing). When an observation can be made (from actual or simulated use), researchers have more freedom in writing task scenarios and determining task success.
In unmoderated testing, validation by question and URL are the primary methods. When all else fails, self-reported task completion is better than nothing, but highly inflated. As with most UX methods, use a combination of approaches to measure effectiveness. For example, ask participants a validation question, view a record of the URLs they visited, and review the session recordings when needed.