Visualising data

Data visualisation is an essential tool for a statistician or for that matter, any individual user of statistics. It is used to visualise the data being analysed before more rigorous analysis techniques are applied. Or, once the analysis is complete, it can be used as a medium of relaying the results. Visual aids are quite powerful in not only communicating information but also teasing out details from the data.

I learnt a lot regarding good graphics from one of my favourite lecturers at ANU. Another great source for understanding the power of graphics was the many books written and published by Edward Tufte. Anyone in the field of data visualisation would have heard of Tufte. His 1983 book, The Visual Display of Quantitative Information, which has now been reprinted so many times has been a most invaluable source for my learning.

A good graphic will portray relevant information and be self-explanatory. If you feel that you need to provide too much explanation for your graphic, then you might need to reconsider the graphic you have chosen. I think that it is always better to have a simple graph. It is tempting to make an elaborate looking graph which may even be complicated to produce but more often than not, fails to communicate the information.

If you are ever thinking of producing a picture to portray information then I suggest you follow Tufte’s principles of graphical excellence and integrity as outlined in his book mentioned above. To begin with, what a graphical display should do:

  1. Show the data.
  2. The viewer should be able to make inference regarding the data.
  3. Avoid distorting the data.
  4. Make comparisons of different pieces of data easy.
  5. Reveal several levels of detail.
  6. Serve a reasonably clear purpose: description, exploration, tabulation, or decoration.
  7. Be closely integrated with the statistical and verbal descriptions of a data set.

Tufte’s Principles of Graphical Excellence

  • Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design.
  • Graphical excellence consists of complex ideas communicated with clarity, precision and efficiency.
  • Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.
  • Graphical excellence is nearly always multivariate.
  • And graphical excellence requires telling the truth.

Principles of Graphical Integrity

  • The representation of numbers as physically measured on the surface of the graphic itself should be directly proportional to the numerical quantities represented.
  • Clear, detailed and thorough labelling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.
  • Show data variation, not design variation.
  • In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units.
  • The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
  • Graphics must not quote data out of context.
  • Be aware of the Lie Factor, which is a ratio: (Size of the effect shown in graphic)/(size of the effect in data)