Scatterplot Jitter—Why and How?

Jim Lewis, PhD • Jeff Sauro, PhD

Feature image showing two scatterplots, unjittered and jittered versions.Scatterplots are powerful tools for visualizing data, especially when data is continuous and unbounded (or nearly so). For example, Figure 1 shows the relationship between concurrently collected System Usability Scale (SUS) and UX-Lite® data for 40 consumer software products.

Example of scatterplot of concurrently collected SUS and UX-Lite data.

Figure 1: Example of scatterplot of concurrently collected SUS and UX-Lite data.

Examination of the scatterplot shows a strong relationship between the SUS and UX-Lite (r = .94). Even though the points are clustered closely together, because the SUS and UX-Lite scores range from 0 to 100 and there were no ties, each point is visually distinct.

But what happens when plotting responses to multipoint rating scales?

Ties Are Likely with Multipoint Rating Scales

In August 2024, we collected data from 298 participants on their experience with one of the social media apps they had used in the past year (Facebook, Instagram, LinkedIn, Snapchat, TikTok, or X). Figure 2 shows examples of two multipoint rating scales that we used in that survey to investigate the relationship between likelihood to recommend and likelihood to discourage.

Example of a likelihood-to-recommend multipoint rating scale.
Example of a likelihood-to-discourage multipoint rating scale.

Figure 2: Example of two multipoint rating scales.

There are 11 responses in the first item and 10 in the second (110 possible pairs of values). If there is any correlation between the responses to the questions, there is a good chance that there will be ties for any reasonably large sample size. This likelihood is even greater when there are fewer response options, like in standard five- or seven-point scales.

Figure 3 shows the scatterplot for the data we collected with these items from 298 respondents.

Scatterplot for ratings of likelihood to recommend and likelihood to discourage.

Figure 3: Scatterplot for ratings of likelihood to recommend and likelihood to discourage.

The correlation between these variables is statistically significant (r(296) = −.57, p < .0001).

But there’s a problem with this visualization of the relationship. The points in the graph look too scattered for the correlation to be that strong. Without ties, the plot would show 298 dots, but it shows only 77. This means that almost 75% of the data was tied and therefore hidden in this representation. Consequently, rather than showing a downward trend from left to right across the scatterplot (which you would expect for a strongly significant negative correlation), the points seem to be scattered haphazardly. In fact, the correlation of the 77 data points displayed in Figure 3 is just −.19.

You might wonder if this happens because it’s inappropriate to compute standard (Pearson) correlations with rating scale data, which, for individual ratings, are discontinuous and ordinal. We do not hold this view because, in aggregate, the means of rating scales become increasingly continuous as the sample size increases, and we’ve found that parametric statistical analyses work well with this type of data. Also, a scatterplot is used to visualize data—it’s not a method for statistical analysis. Finally, if we compute Spearman rank correlations (which do not assume continuity in the raw data), we find they closely match the Pearson correlations (for all the data Pearson = −.57, Spearman = −.52; for the partial data in Figure 3 Pearson = −.19, Spearman = −.21).

Use Jitter to Reveal More Points

There are different strategies to indicate the presence of ties in these types of scatterplots (e.g., color, size, or shape coding). One of the most effective methods is to “jitter” the data by randomly increasing or decreasing the raw values. The effect is something like you’d get if you had tried to plot dots on a graph after drinking too much coffee.

Jittering has the effect of keeping tied scores close together but spreading them out just enough to reveal more points in the scatterplot and provide a better visualization of the trend, as shown in Figure 4.

Scatterplot for ratings of likelihood to recommend and likelihood to discourage (jittered).

Figure 4: Scatterplot for ratings of likelihood to recommend and likelihood to discourage (jittered).

With the jittered version, it’s easier to see the trend from the upper left corner to the lower right. Some points are still obscured—we count about 186 points in the jittered version. Even so, 62% of the 298 points are revealed compared to 25% in the unjittered version.

How to Jitter in Excel and Google Sheets

Some statistical packages include a jitter setting when creating scatterplots. Notable exceptions include SPSS and spreadsheets like Excel and Google Sheets.

To create a jittered scatterplot in Excel or Google Sheets, the first step is to create jittered versions of the values of the two variables being plotted. Use the following function to create them:

=value + (RAND() – 0.5) / 3

This function uses RAND() to generate a new random number between 0 and 1. Subtracting 0.5 from RAND() shifts the random number to range from −0.5 to +0.5, so the jittering spreads the new values to both sides of the original values. The constant (3 in the example above) controls the magnitude of the jitter, which you can adjust as needed for your data.

Note that whenever you make a change in the spreadsheet, the random numbers will be recalculated. So, at some point, you’ll probably want to copy and paste the jittered data as values to lock them down.

Summary and Discussion

Scatterplots are powerful data visualization tools, but they do not work well when the values being plotted can easily tie (e.g., rating scale data). In this article, we described a popular strategy for breaking ties to reveal otherwise hidden relationships.

Ties are likely in scatterplots of rating scale data. Because rating scales have a fixed number of points, usually 5–11, if two rating scales are correlated, there is a good chance that ties will show up for any reasonably large sample size.

Ties in scatterplots obscure the visualization of structure. When graphing a scatterplot with rating scale data, the maximum number of dots is the product of the number of response options in the two scales. For two five-pointhat there will be ties for any t scales, there are only 25 possible pairs of values. For the example presented in this article, there were 110 possible pairs. Even with this number of possible pairs, 75% of the data were tied and thus invisible in the unjittered visualization.

Use jitter to reveal more points. Jittering the data, that is, randomly increasing or decreasing the raw values using a function like =value + (RAND() – 0.5) / 3, leads to the presentation of more dots in the plot, improving the visualization of the relationship between the two variables. And drink as much or as little coffee as you’d like!

0
    0
    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top