{"id":562,"date":"2019-11-13T04:45:08","date_gmt":"2019-11-13T04:45:08","guid":{"rendered":"http:\/\/measuringu.com\/will-would\/"},"modified":"2022-03-21T17:12:25","modified_gmt":"2022-03-21T23:12:25","slug":"will-would","status":"publish","type":"post","link":"https:\/\/measuringu.com\/will-would\/","title":{"rendered":"Will You Recommend or Would You Recommend?"},"content":{"rendered":"

\"\"Small changes can have big impacts on rating scales. But to really know what the effects are, you need to test.<\/p>\n

Labels<\/a>, number of points<\/a>, colors<\/a>, and item wording differences can often have unexpected effects on survey responses.<\/p>\n

In an earlier analysis<\/a>, we compared the effects of using a three-point recommend item compared to an eleven-point recommend item.<\/p>\n

In that study, we showed how a three-point scale stifles respondents and loses information about extreme respondents (who are usually better predictors of future behavior). Those results are consistent with the published literature, which mostly shows that three-point scales have inadequate reliability and poor validity.<\/p>\n

In our study, we used a three-point scale that asked respondents whether they will<\/strong> recommend a business to a friend. The usually phrased \u201chow likely are you to recommend\u201d didn\u2019t map well to a yes\/no\/maybe scale, so we reworded it to use the \u201cwill you recommend\u201d as shown in Figure 1.<\/p>\n

\"\"<\/a><\/p>\n

Figure 1: An example of a \u201cwill recommend\u201d item used in our earlier study.<\/p>\n

Adam Ramshaw of Genroe wondered whether our use of will<\/strong> affected the results compared to asking whether people would<\/strong> recommend.<\/p>\n

There is a subtle difference in meaning between \u201cwill\u201d (which implies a definite future action) compared to \u201dwould\u201d (which implies a potential future action, or even something that was in the future but isn\u2019t anymore<\/a>). Often small differences to a question’s wording or scale format can have important effects. Is that the case here? Does changing the wording of the recommend intention question from \u201cWould you recommend?\u201d to \u201cWill you recommend?\u201d affect results?<\/p>\n

We\u2019ve seen many slight wording differences in the Likelihood to Recommend item, including adding qualifiers such as \u201cto a friend or colleague or family member,\u201d or \u201cif you were asked.\u201d To find out what affect changing \u201cwill\u201d to \u201cwould\u201d may have on the results, we conducted two studies.<\/p>\n

Study 1: Will vs. Would Between Subjects<\/h2>\n

In September 2019, we asked 320 U.S.-based online panel participants to respond to one of two versions of the 11-point Likelihood to Recommend item for a selected list of nine retailers and airlines they reported having recently purchased from (Amazon, Target, Walmart, Home Depot, Lowes, United Airlines, Southwest, Delta, IKEA).<\/p>\n

Participants were randomly assigned to one of the response scales that varied only in whether they used \u201cwill\u201d (148 participants; shown in Figure 2a) or \u201cwould\u201d (172 participants; shown in Figure 2b). The items were shown as part of a larger survey that included additional questions about satisfaction (and the color variations shown in our earlier article<\/a>).<\/p>\n

\"\"<\/a><\/p>\n

Figure 2a: The \u201cwill recommend\u201d variant shown to 148 participants.<\/p>\n

\"\"<\/a><\/p>\n

Figure 2b: The \u201dwould recommend\u201d variant shown to 172 participants.<\/p>\n

Study 1 Results<\/h3>\n

Learning from the findings of our study on color changes, we first aggregated all responses (so if one person reflected on five brands, their scores are included five times). The aggregated results across participants and brands are shown in Figure 3. The \u201cwould\u201d condition had a slightly lower number of detractors compared to the \u201cwill\u201d condition (26% vs. 32%).<\/p>\n

Note: We did not conduct a statistical test on the aggregated data because the same person\u2019s responses are used multiple times within each category, violating the independence assumption of most statistical tests. We conducted statistical tests on the per-brand analysis.<\/em><\/p>\n

\"\"<\/a><\/p>\n

Figure 3: Difference in response patterns for aggregated responses for the \u201cwill\u201d and \u201cwould\u201d wording variations of the Likelihood to Recommend item.<\/p>\n

In looking at the differences within the nine brands (see Figure 4) there\u2019s less of a pattern. The \u201cwill\u201d variation had lower mean scores for six brands (e.g., Amazon, Target) and higher for three others (e.g., Delta, Southwest).<\/p>\n

\"\"<\/a><\/p>\n

Figure 4: Difference in response patterns for each of the nine brands for the \u201cwill\u201d and \u201cwould\u201d variations of the Likelihood to Recommend item.<\/p>\n

The results of Study 1 suggest that the \u201cwould\u201d wording has only a small effect of reducing the detractor responses, but only when the results are aggregated across participants and across brands. One shortcoming of this study is the sample sizes weren\u2019t evenly balanced. For example, there were about twice as many people who rated their Likelihood to Recommend on United in the \u201cwould\u201d condition (17) versus the \u201cwill\u201d condition (8). While that\u2019s not a problem when comparing for each brand (the confidence intervals can handle uneven sample sizes), it\u2019s possible that much of the difference we observed in the aggregated data comes from this imbalance in sample sizes. To help offset this potential confounding effect, we conducted a second within-subjects study.<\/p>\n

Study 2: Will vs. Would Within Subjects<\/h2>\n

In October 2019 we asked 213 participants to reflect on a subset of the brands from the first study (Target, Lowes, Home Depot, and IKEA) and asked to respond to the 11-point Likelihood to Recommend (LTR) item. Participants were only shown companies that they reported having made a purchase from in the last year.<\/p>\n

Because this was a within-subjects study, participants saw both the \u201cwill\u201d and \u201cwould\u201d versions of the LTR item. These questions were part of a larger survey, and the two variants were randomized to be shown either in the beginning of the survey or at the end. Between the two LTR questions were other unrelated questions regarding attitudes toward design elements and other measures of brand attitude and intent to recommend. Thus, roughly half the participants saw the \u201cwould\u201d variant first and the other half saw the \u201cwill\u201d variant first.<\/p>\n

Study 2 Results<\/h3>\n

The aggregated results across participants and brands are shown in Figure 5 for the LTR. While Study 1 (between subjects) resulted in fewer detractors in the \u201cwould\u201d condition, this effect went away after controlling for the variability between subjects and sample sizes.<\/p>\n

In Study 2 the wording of the LTR item had little to no effect on the number of promoters and detractors. The \u201cwill\u201d condition had slightly more promoters than the \u201cwould\u201d condition (45% vs. 44%) and slightly fewer passives (32% vs. 33%).<\/p>\n

\"\"<\/a><\/p>\n

Figure 5: Difference in response patterns for aggregated responses for the \u201cwill\u201d and \u201cwould\u201d wording variations of the Likelihood to Recommend item.<\/p>\n

Figure 6 shows there was an overall pattern of LTR scores being slightly (0.9%) higher on \u201cwill\u201d LTR questions across companies. These differences in scores were not large, ranging from 0.03 (0.4%) for Home Depot to 0.12 (1.5%) for Target. Only the difference for Target was statistically significant (using a paired t-test; p = .01)<\/p>\n

\"\"<\/a><\/p>\n

Figure 6: Differences in mean Likelihood to Recommend scores between the \u201cwill\u201d and \u201cwould\u201d versions. The differences for Target were statistically significant (p < .05).<\/p>\n

\u201cWould” Results in Slightly More Negative Scores<\/h2>\n

To better understand what is causing the slight shift in mean scores, we looked at the number of participants who moved categories when the \u201cwould\u201d version of the Likelihood to Recommend item was used compared to \u201cwill.\u201d In total there were 520 responses from the 213 participants across the four brands and two scale variants.<\/p>\n

Table 1 shows that when the \u201cwould\u201d version is shown rather than \u201cwill\u201d version, the largest movement is from positive to negative (6.2% shift to a less positive category): promoters become passives (22) and passives become detractors (10). This is offset with 4.8% moving from negative to positive: passives become promoters (14) and detractors become passives (11). Overall, the \u201cwould\u201d version resulted in slightly more participants (1.4%) moving from positive to negative ratings.<\/p>\n\n\n\n\n\t\n\n\t\n\t\n\t\n\t
From \"Will\"<\/th>To \"Would\"<\/th>Number<\/th>%<\/th>\n<\/tr>\n<\/thead>\n
Promoter<\/td>Passive<\/td>22<\/td>4.2%<\/td>\n<\/tr>\n
Passive<\/td>Promoter<\/td>14<\/td>2.7%<\/td>\n<\/tr>\n
Detractor<\/td>Passive<\/td>11<\/td>2.1%<\/td>\n<\/tr>\n
Passive<\/td>Detractor<\/td>10<\/td>1.9%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n

Table 1: Shift of the 520 responses across four brands from the \u201cwill\u201d to the \u201cwould\u201d versions of the LTR.<\/p>\n

Summary and Takeaways<\/h2>\n

Across two studies with 533 participants responding to a \u201cwill\u201d or \u201cwould\u201d version of the 11-point Likelihood to Recommend item we found:<\/p>\n

Using \u201cwill\u201d instead of \u201cwould\u201d recommend slightly increases scores.<\/strong> Using \u201cwill recommend\u201d instead of \u201cwould recommend\u201d had a small, statistically significant increase of 1.5% for only one brand (Target) in our second study. Across all brands, the average mean difference was less than 1% higher for the \u201cwill\u201d condition. The larger differences we observed in our first study were confounded by the effects of different sample sizes for some brands on each condition. We saw larger differences from showing or not showing the neutral label<\/a> and even the effects of color<\/a>.<\/p>\n

While \u201cwill\u201d implies definitely, the results show opposite.<\/strong> We had suspected using the word \u201cwill\u201d may imply a more definite future action to respondents and reduce scores, as in \u201cI definitely WILL do something,\u201d compared to the less definite \u201cI would do something.\u201d We, therefore, expected the mean scores for \u201cwill\u201d to be slightly lower (as people might hedge their intentions). However, the results show the opposite as the means were nominally higher for the \u201cwill\u201d groups (again, slightly and statistically in only one case).<\/p>\n

It may matter in other contexts.<\/strong> While we didn\u2019t see major differences in this study, it could be that people don\u2019t respond differently enough to the subtle meaning of the words to have a meaningful impact. But it could also be because of at least three other mitigating reasons: the brands we were asking participants to reflect on, our administration in the survey, and the respondents’ inattention. Other brands or experiences may be more sensitive to changes in \u201cwill\u201d to \u201cwould.\u201d It\u2019s also possible that because our intent-to-recommend items were presented in a larger survey, respondents paid less attention and didn\u2019t notice the change in wording. And finally, it\u2019s also likely that participants don\u2019t carefully read items and instead scan them for key words and use shortcuts (heuristics) to interpret and respond to them (regardless of how long the study is). A future study can examine the effects of changing \u201cwill\u201d and \u201cwould\u201d in isolation and see what percentage of participants even notice the difference.<\/p>\n

 <\/p>\n","protected":false},"excerpt":{"rendered":"

Small changes can have big impacts on rating scales. But to really know what the effects are, you need to test. Labels, number of points, colors, and item wording differences can often have unexpected effects on survey responses. In an earlier analysis, we compared the effects of using a three-point recommend item compared to an […]<\/p>\n","protected":false},"author":2,"featured_media":3213,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"default","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"_price":"field_56e41332a1ae5","_stock":"","_tribe_ticket_header":"","_tribe_default_ticket_provider":"Tribe__Tickets_Plus__Commerce__WooCommerce__Main","_tribe_ticket_capacity":"0","_ticket_start_date":"","_ticket_end_date":"","_tribe_ticket_show_description":"","_tribe_ticket_show_not_going":false,"_tribe_ticket_use_global_stock":"","_tribe_ticket_global_stock_level":"","_global_stock_mode":"","_global_stock_cap":"","_tribe_rsvp_for_event":"","_tribe_ticket_going_count":"","_tribe_ticket_not_going_count":"","_tribe_tickets_list":"[]","_tribe_ticket_has_attendee_info_fields":false,"footnotes":""},"categories":[175,43],"tags":[84,44],"acf":[],"ticketed":false,"_links":{"self":[{"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/posts\/562"}],"collection":[{"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/comments?post=562"}],"version-history":[{"count":1,"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/posts\/562\/revisions"}],"predecessor-version":[{"id":31829,"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/posts\/562\/revisions\/31829"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/media\/3213"}],"wp:attachment":[{"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/media?parent=562"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/categories?post=562"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/measuringu.com\/wp-json\/wp\/v2\/tags?post=562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}