People are wary of statistics.
And if people think you can show anything you want with statistics, then this cynicism certainly applies to statistics graphs, too.
For example, a few years ago the following graphic made its way around the Internet as an example of graphic abuse.
Readers balked at what they saw as a misleading graph. It visually depicts a large gap between the proposed and current top tax rate, to 39.6% from 35%.
If you notice, the vertical (y-axis) doesn’t start at zero. Some felt this was a clear example of bias and the pushing of an agenda.
As another example, the following graph depicts the amount of carbon dioxide concentration in the atmosphere in parts per million over the last 100 years. It’s similar to many you can find on the Internet.
It also doesn’t start with a zero and also depicts a large visual increase in carbon dioxide, which coincides with the industrial revolution. Is this graph also misleading and hiding an agenda because it doesn’t start at 0?
To show or not show 0
Is it deceptive or even dishonest to not show the zero on a graph? The famous book, “How to Lie with Statistics” has for generations (it was originally published in 1954) helped people detect deception with statistics and graphics. The book by Darrell Huff is full of great advice for understanding how to make sense of the constant barrage of data and graphs we see every day.
One thing it mentions is to be wary of graphs that don’t start the y-axis at 0. Something you’ll quickly find echoed from other online sources.
What’s interesting about both graphics above is that they display roughly similar increases. The tax graph is an increase in tax rates of 13% [(39.5-35)/35 = .129] and the carbon dioxide displays an increase of roughly 16% from 1900 to 2000 [(335-290)/290 = .1552].
The website from which I pulled the graphic states that a better graph would have shown something like the following:
But is that a panacea? Should all graphs start at 0?
Not everyone thinks so, including the famous data visualization maven Edward Tufte, who said the advice from Huff is “wrong” and the need to show all that empty vertical space for the sake of including a zero is unnecessary.
Here is the average temperature difference in degrees Celsius, similar to what you might see on the Internet (usually as anomalies). It shows an increase in average global temperature and a fair amount of variability per year.
And here’s the exact same data (from Berkeley Earth’s dataset with the y-axis scaled to start at 0. Both show the same change of ~6% increase in average temperature since 1900.
The line in the first graph travels vertically about 64% of the way up the vertical axis. Conversely, the line in the second graph moves up about 7%.
Here’s another graph showing a reduction in cable subscribers. The authors of the graph clearly want to convey the idea that people are “cutting cords”, and the visual drop from 2011 definitely shows this. The visual reduction from 2011 to 2015 is about 68%. However, the actual reduction of subscribers is only around 3%.
But is this really a better graph because it includes the zero?
Both the reduction is lost as well as any sense of trend with this new graph as well as the prior graph on global temperature.
Zero is not the hero
While it’s a good idea to have best practices with displaying data in graphs, the “show the zero” is a rule that clearly can be broken. But showing or not showing the zero alone is not sufficient to declare a graph objective or conversely “deceptive.”
There is no objective graph and you should assume there’s an agenda behind each graph (some more benign than others). The context matters and the reader should understand the context and consequences of the data with or without the zero. For example:
- What are the impacts on the economy for a 15% increase in taxes rates for top earners and small businesses?
- What effect does a 16% increase in carbon dioxide concentration have (or will have) on global temperatures?
- How does or will a 6% increase in average global temperatures affect life on the planet?
- What is the impact on the cable industry and consolidation of companies with a 3% loss of cable subscribers?
One way to help detect deception (or misperception) in graphs is to examine the raw data in a table and convert the raw data into percent changes. While this doesn’t avoid the need for context, it can help standardize often unfamiliar measurements into the more familiar percentages as I’ve done in the graphs above.
Providing context to your reader is helpful as it’s answering the question, “compared to what?” But following an arbitrary rule that all graphs should start with 0 doesn’t make sense. In fact, it may introduce more confusion than it is purportedly solving.
There isn’t an objective graph. But if people want to deceive, they will and the unfocused reader may certainly be swayed. But often displaying the data with or without the zero can provide a different perspective for readers to reconsider the context and the implications.