One of the many lessons learned from the COVID-19 is that sourcing data and creating amazing visualisations is the (relatively) easy part of data science. Ensuring you're comparing apples with apples and telling the story is a different kettle of fish.

Comparing Apples with Pangolins: The Case of the Missing Deaths and the Perils of Big Data

As the COVID-19 story continues, we have been subjected to a crash course in data literacy, bombarded with facts and predictions, introduced to a wide range of new data visualisation graphics, and are learning to love exponential distributions on a semi-log scale.  Publishers try to present data in as simple and compelling way as possible.

Unfortunately, COVID-19 is not simple.  “Which country has suffered the most deaths from COVID-19?” Surely a simple question to answer, with many readily available data sources, from government agencies to research and media organisations. However, it is far from straightforward as some recorded deaths are only those where the individual died in hospital after testing positive for COVID-19 and may not include those who died in nursing homes and were never tested. The dates used may be inconsistent because some countries record them as at the date of death and others as at the date the death was registered.

In an age of big data, sourcing the data and creating the chart is no longer an issue, but interpreting it correctly is fraught and has led to increased attention on the unexplained death data. Assuming no other extenuating circumstances, such as war or famine, a country’s mortality rate is relatively predictable and leaves little room for misinterpretation. But even this is not necessarily accurate: as an example of this, it is not unreasonable to expect transport-related deaths to have reduced substantially during this period as countries have restricted all but essential travel.

It is hard to tell a complex story in one or two charts within the constraints of shortened attention spans and there are so many variables: what is the source of the data and how is it calculated, why has the author chosen a particular technique to visualise the data, what do they want you to see and, perhaps more importantly, what don’t they want you to see?

Given how time-poor we are, digging through mountains of data to answer these questions is not always practical. In some ways the simplest answer is to find a source you trust, adjust for any biases e.g. political pre-disposition, and use this as a base. You still need to ask the questions:

  1. What is the scale? Is it continuous or broken? Does it start at 0? If not, why not? Are right-hand and left-hand scales clearly marked?
  2. What is missing? Is there a data element that should be there that is not e.g. a regional map showing North, South, and East but no West?
  3. Is data obscured? Is there a piece of commentary or some other graphic hiding part of the data?
  4. Is it clear? Is the data clearly labelled? Is it uncluttered?
  5. Is it current? What period does it represent? Are the time intervals appropriate e.g. showing annual data for a trend that is moving daily will not help?
  6. Is it relevant? Does the graphic reflect the narrative?

You may not find the answer you are looking for, but you are less likely to be hoodwinked by eye-catching graphics with spurious data sources.

If you want to see some excellent examples of making big problems accessible and understandable through data, I’d suggest ourworldindata.org. If you want to improve your data visualisation design skills I’d suggest Cole Nussbaumer Knaflic’s excellent book Storytelling with  data.

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on LinkdIn
Share on pinterest
Share on Pinterest

Leave a comment

COVID-19 Effect – The Proof is in the Payments

You don't need to be a data scientist to assess the disruptive impact of COVID-19. But it's nice when the facts support the intuition. Analysing payments data is a useful place to start.

Comparing Apples with Pangolins: The Case of the Missing Deaths and the Perils of Big Data

One of the many lessons learned from the COVID-19 is that sourcing data and creating amazing visualisations is the (relatively) easy part of data science. Ensuring you're comparing apples with apples and telling the story is a different kettle of fish.

Necessity Is The Mother of Invention

There are many risk management tools to help manage a crisis but are they used when the crisis hits? And does it matter?