Don’t wait till you’ve built the entire app, website, or product before having people actually use it.
Build a prototype and test it.
But building even modest functionality and aesthetics into a prototype can take time and money. Is it necessary to build out an interactive high-fidelity prototype or will a low-fidelity paper prototype suffice? Or as some suggest, will low-fidelity prototypes lead to misleading results[pdf] or are they even a waste of time?
Paper or Pixel?
Paper prototypes are more than just the ugly stepchild of full-fidelity computer prototypes. Paper prototypes, often with simple black lines and sketches, convey to the user AND remind developers and stakeholders that the system is clearly under development and many changes are expected to be made.
Participants in a study with paper prototypes are more likely to assume anything can be changed and consequently may offer more suggestions. In contrast, with the higher fidelity the prototype, the more finished a product or website appears, and participants may be less amenable for suggesting changes as there’s a strong tendency not to want to hurt people’s feelings and only tell people what they want to hear.
So while paper and low-fidelity prototypes may be faster and easier to create and change, as well as conveying a message of “change” to participants, are they reliable for testing and even collecting UX metrics on, or does testing with low-fidelity prototypes lead to misleading results?
Research History On Prototypes
As with many UX issues that seem contemporary due to rapidly changing technology, there’s actually a good research history on the differences between prototype fidelities that goes back more than 25 years.
The research is nuanced with studies generally showing paper and low-fidelity prototypes offering similar findings to full-fidelity prototypes and actual products, but with some caveats and limitations. Some studies even examine the effect of the medium (paper vs. computer) versus the fidelity (low vs. high). Here are some findings from published studies I’ve found in the literature.
- Nielsen (1990): Found that heuristic evaluators (as opposed to users) were more likely to find major usability problems using a high-fidelity computer mockup versus a paper prototype; however, the paper prototype still identified many usability issues.
- Wiklund, Thurrott, & Dumas (1992): Found no difference in the number of errors or subjective usability ratings between four paper prototypes of varying fidelity compared to the actual product (an electronic dictionary). They did report significant differences for task times across prototypes and compared to the actual product. They reported that making prototypes look more realistic did not increase initial or post-task ratings of ease but did for aesthetics.
- Virzi, Sokolov, & Karis (1996): Conducted two studies on an electronic book and an IVR and found NO substantial difference in the number of usability problems between a paper and high-fidelity prototype in the later stages of product design.
- Catani & Biers (1998): Found no difference in the number and severity of usability problems between paper (low), screen-shots (medium fidelity), and fully functioning VB prototype of students conducting library search tasks.
- Uceta, Dixon, & Resnick (1998): Found no difference in the number of usability issues between paper and computer prototypes. However, they reported testing with the paper prototypes took about 30% longer than the computer-based prototypes.
- Walker, Takayama, & Landay (2002): Found no difference in the number of usability issues between prototype fidelity (paper versus computer), or how the prototype was delivered (via paper versus computer). They did find differences in the type of issues by fidelity but not by medium and participants made more comments about computer prototypes versus paper ones.
- Sefelin, et al. (2003): Found participants are just as likely to offer critiques and suggestions on paper versus high-fidelity computer prototypes. There were no differences in SUS scores or post-task scores; however, 92% of participants preferred the computer prototype as they felt less “observed” and required less work for a facilitator.
- MacKenzie & Read (2007): Tested the typing speed on different keyboard layouts using paper prototypes and found the conclusions of efficiency were similar (although not identical) to another study’s results that used an actual product.
- Sauer & Sonderegger (2009): Compared paper prototypes, high-fidelity prototypes, and actual products (mobile phones), and found both paper and computer prototypes predicted ease of use ratings[pdf] on the actual product, but not task time and aesthetic ratings. They also found that interaction efficiency (ideal steps/total steps) was more predictive than task time.
- Sauer, Seibel, & Rüttinger (2010): Compared low fidelity, high fidelity, and an actual product (a floor sweeper) using novice and expert users and found no difference in the number of usability problems or difference in subjective ratings by prototype fidelity.
- Uebelbacher, Sonderegger, & Sauer (2013): Examined how participants’ perception of the developmental stage of a prototype would affect data. They found no differences between an iPhone 3 electronic city guide prototype described as “early stage” compared to one in its final stage, weeks before release.
Summary & Discussion
So does the fidelity of a prototype affect results? The short answer is yes, but not as much as you’d think. Here are the nuances you should consider when testing with prototypes:
- Usability problem prediction is good. If you’re looking to uncover usability problems and errors, low fidelity (even paper prototypes) seem sufficient to find major issues, but they aren’t a perfect substitute in both usability testing and heuristic evaluations with some differences noted across the studies.
- You won’t uncover all the same issues. While you’ll likely uncover many issues, paper prototypes and even high-fidelity prototypes usually lack the actual interactivity of the finished product or website, meaning you’re likely missing some issues.
- Ease of use ratings are surprisingly consistent across fidelity. For the most part, participants’ subjective judgment of the ease of use isn’t affected much by the fidelity. It seems that participants can correct for the fidelity of the interface when making judgments of ease.
- Aesthetic or emotional ratings tend to be less predictive than usability ratings. There’s some evidence users will overestimate the aesthetic appeal of the product from the low-fidelity prototypes.
- Efficiency metric accuracy is mixed. Task time doesn’t seem to be a good predictor from low-fidelity prototypes but there’s some evidence an interactive efficiency ratio (ideal steps/total steps) is predictive. More research is needed to confirm this finding. It doesn’t mean you can’t collect time, but focus on relative changes in time (for example, comparing time on one version versus another) rather than on estimating the actual time users will take with the finished interface.
- Users prefer high fidelity. There’s evidence participants prefer using high-fidelity prototypes to paper prototypes. This is likely due to the interactivity and aesthetic appeal the high-fidelity prototypes provide.
- You don’t need to start with paper but can. With the proliferation in prototyping software, if it’s just as quick to use high-fidelity prototypes using a prototyping tool, then there’s not much advantage to starting with paper (again unless you specially want to convey you’re using an early stage design that can be changed). Conversely, don’t feel you can’t start with paper.
The articles I reviewed above describe prototyping on a variety of interfaces at different stages of design, including an electronic dictionary, eBook, a floor sweeper, an IVR, keyboard layout, a mobile phone (non-smartphone from 2005), Windows software, and websites. The range of products makes the conclusion across the studies more generalizable. In short, low-fidelity prototypes are reasonable, although not perfect substitutes, for high-fidelity or full products for identifying usability problems and are able to generate reliable estimates of subjective ease ratings but less accurate for task times and aesthetic measures.