{"id":526,"date":"2019-03-20T03:45:01","date_gmt":"2019-03-20T03:45:01","guid":{"rendered":"http:\/\/measuringu.com\/grids-responses\/"},"modified":"2022-03-21T17:28:51","modified_gmt":"2022-03-21T23:28:51","slug":"grids-responses","status":"publish","type":"post","link":"https:\/\/measuringu.com\/grids-responses\/","title":{"rendered":"Do Survey Grids Affect Responses?"},"content":{"rendered":"

\"\"You\u2019ve probably taken a survey or two in your life, maybe even this week.<\/p>\n

Which means you\u2019ve probably answered a few types of survey<\/a> questions, including rating scale questions.<\/p>\n

Earlier I outlined 15 common rating scale questions<\/a>\u00a0with the linear numeric scale being one of the most used.<\/p>\n

Examples of linear numeric scales include the Single Ease Question<\/a> (SEQ) and the Likelihood to Recommend item (LTR) (the latter is shown in Figure 1).<\/p>\n

\"\"<\/a>Figure 1: Example of a linear numeric scale: the common Likelihood to Recommend item used to compute the NPS.<\/p>\n

When asking respondents a lot of linear numeric questions you can save space by combining them into a multiple rating matrix, or \u201cgrid,\u201d such as the one shown in Figure 2.<\/p>\n

\"\"<\/a><\/p>\n

Figure 2: Example of a grid that combines four linear numeric scales.<\/p>\n

While using a grid allows for a more compact presentation, does combining the items into a grid of rating scales versus asking them in isolation affect responses<\/strong>?<\/p>\n

Research on Grids vs. One at a Time<\/h2>\n

As is often the case with other questions on survey items and response options (for example, labeling<\/a>, question order<\/a>, and number of response options<\/a>), it can be difficult to find general rules. A good place to start is the published literature to see what\u2019s already been researched.<\/p>\n

When it comes to this topic, a lot has already been done. The main focus of research has been on differences in reliability, straightlining (participants answering many questions with the same response options), the response\/drop-out rate, and the impact from mobile screens. Here\u2019s a summary of several articles:<\/p>\n

Couper, Traugott, and Lamias<\/a> (2001), in a study of 1,602 students, found increased correlations between items (higher reliability<\/strong>) when placed in a grid versus alone and also found grid responses were completed slightly faster.<\/p>\n

Tourangeau, Couper, and Conrad<\/a> (2004)\u00a0had 2,568 participants from an online panel answer 8 questions about diet and eating habits with three different presentation styles: all on the same page in a grid, two pages (one grid per page), or presented one at a time on separate pages.<\/p>\n

Internal reliability<\/strong> was highest when items were presented altogether in a grid (alpha = .62) and lowest when on separate pages (alpha = .51). However, when presented separately, the items loaded higher on their expected factors. There was also more straightlining in grids<\/strong>. Participants took about 50% longer<\/strong> to complete the questions when presented separately. The authors suspected a “near means related” heuristic (items close to each other ask about similar things) may cause respondents to respond more similarly when items are presented on the same screen.<\/p>\n

Yan<\/a> (2005)\u00a0[pdf] also found an increase in internal reliability<\/strong> when presenting items in a grid versus on separate pages to 2,587 online panel participants.<\/p>\n

Even the U.S. Census Bureau<\/a> (Chesnut, 2008)\u00a0tested the old-school paper-and-pencil forms and found that demographic information presented separately versus in a grid resulted in slightly higher (1.5%) response rates<\/strong>.<\/p>\n

Toepoel, Das, and Van Soest<\/a> (2009)\u00a0measured arousal-seeking tendencies from 2,565 Dutch respondents. They found the reliability was slightly higher<\/strong> when the 40 items were presented on one grid on a screen compared to separately or broken up across ten screens. However, they also found this treatment resulted in higher non-response<\/strong> when placed in a grid.<\/p>\n

Thorndike et al.<\/a> (2009)\u00a0found 710 Swedish respondents preferred the one-at-a time format over a grid when asking quality of life questions even though it took more time to complete<\/strong>.<\/p>\n

Garland (2009), as reported in Callegaro<\/a>,\u00a0had U.S. panel participants rate which one of three forms they preferred: grid, multiple items per screen, or one per screen. He found no difference in reported satisfaction between versions, but contrary to other studies, the one-per-page version had the highest reliability, highest variance explained, and the factor structure that best matched the published factor structure of the questionnaire. Differences in means were also reported but the original article is no longer available to discern the pattern.<\/p>\n

Bell, Mangione, and Kahn<\/a> (2001)\u00a0found no difference in reliabilities<\/strong> for a grid compared to one at a time and slightly faster completion time for the grid, with 4,876 respondents.<\/p>\n

Iglesias, Birks, and Torgerson<\/a> (2001)\u00a0found older UK respondents (older than age 70) missed or skipped significantly more items (27% vs. 9%) when items were arrayed in a \u201cstem and leaf\u201d grid versus one at a time. They also found slightly better reliability<\/strong> when items were displayed separately.<\/p>\n

Callegaro et al.<\/a> (2009)\u00a0had 2,500 U.S. panel participants answer nine items (some required reverse scoring) about mental health in one of five randomly assigned conditions ranging from displayed all in one grid to one per page. They found the one-per-page presentation had slightly higher internal reliability<\/strong> compared to the grid but took more than 50% longer to complete.<\/strong><\/p>\n

Grandmont, Goetzinger, Graff, and Dorbecker<\/a> (2010)\u00a0also found that 7-point Likert items, when presented in a grid, generated higher drop-out rates. Even though respondents took longer to take the one-at-a-time version compared to a grid (19 vs. 15 minutes), there was no difference in how long respondents thought the surveys took (both 15 minutes). Interestingly, they found straightlining was about the same<\/strong> for both the grid and one per page, but the highest when the grid was split across multiple page. The authors suspected respondents were more consciously attempting not to look like they were straightlining when in a big grid.<\/p>\n

Respondents reported disliking long grids the most<\/strong>. This study also asked respondents how they would want to rate 25 product characteristics. Their responses seem to support the idea that \u201cnear means related\u201d but also don\u2019t want all items in one grid:<\/p>\n

\u201cState up front that there will be 25 questions, then divide them into thematic groups, no more than 3\u20135 per screen.\u201d<\/p>\n

\u201cDon\u2019t just throw a list of 25 characteristics up on the same page.\u201d<\/p>\n

Mavletova,\u00a0Couper, and Lebedev<\/a> (2017) also reported higher measurement error (e.g. straightlining<\/strong>) and lower concurrent validity from grids when testing on mobile screens from a Russian panel.<\/p>\n

Liu and Cernat<\/a> (2016) examined responses from 5,644 SurveyMonkey panel participants and actually found higher straightlining<\/strong> in grids but similar response times (for a short, <2 min survey). They also found higher non-responses<\/strong> for grid formats with seven or fewer response options compared to one-at-a-time presentations (especially for mobile respondents). They also found that grids with 9 or 11 response options led to substantial differences compared to item-by-item questions and posit that as the number of columns increase in a grid, the data quality may deteriorate.<\/p>\n

I\u2019ve summarized the findings across these studies in Table 1 (having to infer some conclusions from some papers):<\/p>\n\n\n\n\n\t\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t
<\/th>Grid<\/th>Alone<\/th>No Difference<\/th>\n<\/tr>\n<\/thead>\n
More Straightlining
\n<\/td>
4<\/td><\/td>1<\/td>\n<\/tr>\n
Increases Reliability\/Variance Explained<\/td>4<\/td>4<\/td>1<\/td>\n<\/tr>\n
Scores\/Distributions Differ<\/td>1<\/td><\/td>3<\/td>\n<\/tr>\n
Higher Non-Response<\/td>6<\/td><\/td><\/td>\n<\/tr>\n
Loading on Expected Factor\/Higher Validity<\/td><\/td>3<\/td><\/td>\n<\/tr>\n
Preference<\/td><\/td>4<\/td>2<\/td>\n<\/tr>\n
Takes Longer<\/td><\/td>5<\/td>2<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n

Table 1: Summary of studies comparing grid vs. standalone displays. Numbers represent the number of studies I uncovered that share a finding (e.g. six studies found grid displays increased non-response rates).<\/p>\n

While some results are mixed as other factors moderate the effects, we can conclude that grids seem to increase non-responses and probably increase straightlining in many cases<\/strong>. When items are presented alone they tend to take longer (although participants may not notice as much) and they better match the intended factor structure but there isn\u2019t much difference in scores. Not all grids are created equal as some studies explored, with massive grids (many rows and many columns) being the least preferred and potentially affecting response quality.<\/p>\n

 <\/p>\n

NPS Grid Study<\/h2>\n

To contribute to the extensive literature on grid versus separate page presentation we conducted our own study using an online U.S. panel in February 2019 for the popular Likelihood to Recommend (LTR) item used to compute the Net Promoter Score. We asked participants how likely they would be to recommend the following common brands:<\/p>\n