In a usability test you typically collect some type of performance data: task times, completion rates and perhaps errors or conversion rates. It is also a good idea to use some type of questionnaire which measures the perceived ease-of-use of an interface. This can be done immediately after a task using a few questions (post-task questionnaires). It can also be done after the usability testing session is complete using a usability questionnaire such as the System Usability Scale ‘SUS’ (post-test questionnaires).

It turns out that the post-task questions tend to have a strong correlation[pdf] with task-performance metrics of completion rates, time and errors (r’s between .4 and .5).  To interpret the correlation coefficient r, you square it to tell you how much the changes in one measure “explain” the changes in the other measure. For example, data from around 40 usability tests show that the average correlation between completion rates and post-task satisfaction scores was .51. In other words, whether a user passes or fails a task explains about 26% of the scores on satisfaction questionnaire administered immediately after the task attempt. In general, if a user fails a task they tend to rate it as being less easy to use.

A correlation of .51 is considered a strong association in the behavioral sciences. One question I’ve been asked is if there is such a strong correlation between usability metrics, do we need to collect both of them?  That is, can we just estimate satisfaction scores from completion rate data and vice-versa?  The answer is that while we could estimate one from the other, we still lose around ¾ of the information (100% minus 26%). When measuring abstract things like usability, redundancy is a good thing. If the correlation was very high, say r =.95, then one metric would explain around 90% of the other metric, there is so much redundancy that we could safely drop one and still have a good idea about the other. Unfortunately, such high correlations are unusual in behavioral science and in usability measurement.

I recommend a triangulation approach of gathering multiple usability metrics.  No single usability metric fully describes the user experience, instead, each paints a partial picture. However, once you gather multiple metrics you can combine them into a composite score, which provides a single usability measure ‘SUM.’