{"id":562,"date":"2019-11-13T04:45:08","date_gmt":"2019-11-13T04:45:08","guid":{"rendered":"http:\/\/measuringu.com\/will-would\/"},"modified":"2022-03-21T17:12:25","modified_gmt":"2022-03-21T23:12:25","slug":"will-would","status":"publish","type":"post","link":"https:\/\/measuringu.com\/will-would\/","title":{"rendered":"Will You Recommend or Would You Recommend?"},"content":{"rendered":"
Small changes can have big impacts on rating scales. But to really know what the effects are, you need to test.<\/p>\n
Labels<\/a>, number of points<\/a>, colors<\/a>, and item wording differences can often have unexpected effects on survey responses.<\/p>\n In an earlier analysis<\/a>, we compared the effects of using a three-point recommend item compared to an eleven-point recommend item.<\/p>\n In that study, we showed how a three-point scale stifles respondents and loses information about extreme respondents (who are usually better predictors of future behavior). Those results are consistent with the published literature, which mostly shows that three-point scales have inadequate reliability and poor validity.<\/p>\n In our study, we used a three-point scale that asked respondents whether they will<\/strong> recommend a business to a friend. The usually phrased \u201chow likely are you to recommend\u201d didn\u2019t map well to a yes\/no\/maybe scale, so we reworded it to use the \u201cwill you recommend\u201d as shown in Figure 1.<\/p>\n <\/a><\/p>\n Adam Ramshaw of Genroe wondered whether our use of will<\/strong> affected the results compared to asking whether people would<\/strong> recommend.<\/p>\n There is a subtle difference in meaning between \u201cwill\u201d (which implies a definite future action) compared to \u201dwould\u201d (which implies a potential future action, or even something that was in the future but isn\u2019t anymore<\/a>). Often small differences to a question’s wording or scale format can have important effects. Is that the case here? Does changing the wording of the recommend intention question from \u201cWould you recommend?\u201d to \u201cWill you recommend?\u201d affect results?<\/p>\n We\u2019ve seen many slight wording differences in the Likelihood to Recommend item, including adding qualifiers such as \u201cto a friend or colleague or family member,\u201d or \u201cif you were asked.\u201d To find out what affect changing \u201cwill\u201d to \u201cwould\u201d may have on the results, we conducted two studies.<\/p>\n In September 2019, we asked 320 U.S.-based online panel participants to respond to one of two versions of the 11-point Likelihood to Recommend item for a selected list of nine retailers and airlines they reported having recently purchased from (Amazon, Target, Walmart, Home Depot, Lowes, United Airlines, Southwest, Delta, IKEA).<\/p>\n Participants were randomly assigned to one of the response scales that varied only in whether they used \u201cwill\u201d (148 participants; shown in Figure 2a) or \u201cwould\u201d (172 participants; shown in Figure 2b). The items were shown as part of a larger survey that included additional questions about satisfaction (and the color variations shown in our earlier article<\/a>).<\/p>\n <\/a><\/p>\n <\/a><\/p>\n Learning from the findings of our study on color changes, we first aggregated all responses (so if one person reflected on five brands, their scores are included five times). The aggregated results across participants and brands are shown in Figure 3. The \u201cwould\u201d condition had a slightly lower number of detractors compared to the \u201cwill\u201d condition (26% vs. 32%).<\/p>\n Note: We did not conduct a statistical test on the aggregated data because the same person\u2019s responses are used multiple times within each category, violating the independence assumption of most statistical tests. We conducted statistical tests on the per-brand analysis.<\/em><\/p>\n <\/a><\/p>\n In looking at the differences within the nine brands (see Figure 4) there\u2019s less of a pattern. The \u201cwill\u201d variation had lower mean scores for six brands (e.g., Amazon, Target) and higher for three others (e.g., Delta, Southwest).<\/p>\n <\/a><\/p>\n The results of Study 1 suggest that the \u201cwould\u201d wording has only a small effect of reducing the detractor responses, but only when the results are aggregated across participants and across brands. One shortcoming of this study is the sample sizes weren\u2019t evenly balanced. For example, there were about twice as many people who rated their Likelihood to Recommend on United in the \u201cwould\u201d condition (17) versus the \u201cwill\u201d condition (8). While that\u2019s not a problem when comparing for each brand (the confidence intervals can handle uneven sample sizes), it\u2019s possible that much of the difference we observed in the aggregated data comes from this imbalance in sample sizes. To help offset this potential confounding effect, we conducted a second within-subjects study.<\/p>\n In October 2019 we asked 213 participants to reflect on a subset of the brands from the first study (Target, Lowes, Home Depot, and IKEA) and asked to respond to the 11-point Likelihood to Recommend (LTR) item. Participants were only shown companies that they reported having made a purchase from in the last year.<\/p>\n Because this was a within-subjects study, participants saw both the \u201cwill\u201d and \u201cwould\u201d versions of the LTR item. These questions were part of a larger survey, and the two variants were randomized to be shown either in the beginning of the survey or at the end. Between the two LTR questions were other unrelated questions regarding attitudes toward design elements and other measures of brand attitude and intent to recommend. Thus, roughly half the participants saw the \u201cwould\u201d variant first and the other half saw the \u201cwill\u201d variant first.<\/p>\n The aggregated results across participants and brands are shown in Figure 5 for the LTR. While Study 1 (between subjects) resulted in fewer detractors in the \u201cwould\u201d condition, this effect went away after controlling for the variability between subjects and sample sizes.<\/p>\n In Study 2 the wording of the LTR item had little to no effect on the number of promoters and detractors. The \u201cwill\u201d condition had slightly more promoters than the \u201cwould\u201d condition (45% vs. 44%) and slightly fewer passives (32% vs. 33%).<\/p>\nStudy 1: Will vs. Would Between Subjects<\/h2>\n
Study 1 Results<\/h3>\n
Study 2: Will vs. Would Within Subjects<\/h2>\n
Study 2 Results<\/h3>\n