Skip to Main Content

Data Visualization

Examples of good and bad data viz, and how you can make it effective

What makes an effective data visualization?

Data visualizations are meant to make a dataset easier to understand. Unfortunately, some data visualizations make it more difficult by obscuring the overall view of the dataset. Generally, the more visual elements you add to a data visualization (different colors, 3D views, a third axis), the more difficult the visualization is to understand. Simple visualizations are easier to understand for a general audience. Be intentional about what visual elements you choose to include in your graph.

Keep in the mind the following points while you're creating your visualization. 

  • Clearly label all elements of your visualization, including the title, axes, legend, or any trend lines.
  • In a bar chat, your numeric axis should ALWAYS start at 0. Starting at a higher number distorts the view of the visualization by making differences seem a lot larger (see this example). 
  • Colors incite feelings, and can make your data visualization more or less effective accordingly. Check out this guide on using color in data visualization for examples. The colors you choose will depend on what type of data you’re using, the tone you’re going for, and how your colors mesh with the rest of the design choices you make. 
  • When using color, make sure your colors are accessible to those with low vision or colorblindness. Use this contrast checker to check that your text and colors are easily readable.

 

an example bar chart with clearly labeled title, axes, and legend. always start your numeric axis at 0.

Bar and Pie Charts

Bar and pie charts are useful for showing how much of any given category exists. An example might be a class survey of who has siblings and how many they have. The bar chart will show the differences in these categories, while the pie chart will show which percentage of the whole has 0 siblings, 1 sibling, and so on. 

bar chart showing how many students have each number of siblings. x-axis labeled: number of siblings. y-axis labeled: number of students. title: how many siblings do our students have? a sample pie chart showing how many siblings students have. 41.7% have 1 sibling, 27.8 have 2, 16.7 have 3, 5.6 have 4, 8.3 have 0.

For bar charts:

  • Always start the numeric axis at zero. If your chart doesn't show differences clearly enough without starting the axis at a higher number, consider a different type of chart. 
  • If you use color, consider accessibility (covered above) and make sure you include the meaning of the colors in a legend on the chart. Color should not be your only way of distinguishing categories. 

For pie charts:

  • Only use a pie chart when you want to show parts of a whole. In our example, the pie chart would show the entire class with segments representing the size of each group.  
  • Because pie charts represent a whole, they are best used for percentages. The total percentage should never be more than 100%! 
  • Start with your largest slices first (at 12 o'clock). In the US, people intuitively want to read these left to right or clockwise. 

Line Charts

Line charts are an excellent choice for showing changes in data over time. Consider how your data points are connected by the line -- data visualization software will by default draw the shortest line between any two data points, which can create confusion if data is missing. 

Things to consider:

  • If your data is all over the place (i.e., very high some years and very low other years), consider adding a trend line to show the general change in the data over time. Programs like Excel and Tableau will have this formula built in, but other programs may require you to do the calculation yourself. 
  • You can add additional lines to show other categories. Use color and labels to distinguish them. 
  • If data for some time periods is missing, that should be reflected in your chart with a missing segment of the line. Otherwise, the line will connect between the two present data points, which will make it seem as if there is a data point for the missing segment. 
  • To add greater distinction to your data points, consider creating a stepped line chart.

Scatter Plots

Scatter plots show correlation between variables; each variable is mapped on the x or y axis, and a dot represents where they correlate. See the example below showing the correlation between the duration of Old Faithful eruptions and the time between eruptions. 

Scatter plot showing correlation between duration of Old Faithful Eruption and waiting time between eruptions
"Old Faithful Eruptions" is by Maksim is licensed under CC-BY 2.0. 

The plot suggests that eruptions that are longer in duration need more "warm-up time" because the time between eruptions is longer. In addition, the data points are clustered between short wait, short duration and long wait, long duration; this suggests there are generally two types. 

When making a scatter plot, consider:

  • The size of your dots. Each dot represents one data point, so don't make them too large. This type of visualization is useful because it shows clusters of data points to aid understanding of the dataset. 
  • Understand the difference between causation and correlation. Just because two variables correlate does not indicate cause and effect! See Spurious Correlations for some funny examples of this phenomenon.  
  • Scatter plots are best for a "medium" sized dataset. If you have too few points, the visual might mislead your viewer by making it appear as if things correlate when they really don't. Similarly. too many data points will be difficult to read.