top of page

Visualizing Amounts: Dot plots and their derivatives

One of the simplest visualizations of a single numerical variable with a modest number of observations and labels for the observations is a dot plot, or Cleveland dot plot.


In this post, we will use the gapminder data set (which is avaibale in package gapminder) The gapminder dataset is well explored in the statistic course by Professor Jennt Bryan, The University of British Columbia (https://stat545.com/)

library(tidyverse)
library(ggplot2)
library(gapminder)

gapminder %>% 
  filter(year == 2007, continent == "Americas") %>% 
  select(country, lifeExp,pop) %>% 
  ggplot() +
  geom_point(aes(x = lifeExp,
                 y = reorder(country,lifeExp)),
             color = "blue", 
             size = 5) +
  labs(x = "Life Expectancy (years)", y = NULL) + 
  theme(text = element_text(size = 16), 
        panel.border = element_rect(fill = NA, colour = "grey20")) +
  labs(title =  "Life expectancy in Americas (2007)",
       x = "Life expectancy (years)",
       y = NULL) +
  theme_minimal()



ree

In order to highlight the best

gapminder %>% 
  filter(year == 2007, continent == "Americas") %>% 
  select(country, lifeExp,pop) %>% 
  mutate(condition = if_else(lifeExp >= 75, ">=75","<75")) %>% 
  mutate(highlight = factor(condition)) %>% 
  ggplot(aes(x = lifeExp, 
             y = reorder(country,lifeExp),
             color = highlight)) + 
  geom_point(size = 5) +
  scale_color_manual(values=c("red", "blue")) +
  labs(x = 'Life expectency (years)',
       y = NULL,
      title = "The top 10 causes of death in 2016") +
  theme(axis.text = element_text(size = 14, color = "black"),
  axis.title = element_text(size = 14, color = "black",face="bold"),
  plot.title = element_text(color = "black", size = 14, face = "bold"),
  plot.caption = element_text(color = "black",size = 14, face = "italic"),
  legend.position = "none")


ree

The size of the dots can be used to encode an additional numeric variable. This view uses area to encode population size (This is sometimes called a bubble chart)


gapminder %>% 
  filter(year == 2007, continent == "Americas") %>% 
  select(country, lifeExp,pop) %>% 
  ggplot() +
  geom_point(aes(y = reorder(country, lifeExp),
                 x = lifeExp,
                 size = pop/1000000),
             col = "blue") +
  labs(x = "Life Expectancy (years)", y = NULL) +
  scale_size_area("Population\n(Millions)", max_size = 8) +
  theme(text = element_text(size = 16), 
        panel.border = element_rect(fill = NA, colour = "grey20")) +
  theme_minimal()

ree

We can also compare data.

Example: Compare the data for 2002 and 2007 for the Americas


gapminder %>%
  filter(year >= 2002, continent == "Americas") %>% 
  mutate(Year = factor(year)) %>% 
  ggplot() +
    geom_point(aes(y = reorder(country, lifeExp),
                x = lifeExp,
                size = 2,
                color = Year)) +
  scale_color_manual(values=c("#E69F00", "#56B4E9")) +
  labs(x = "Life Expectancy (years)", y = NULL) +
  theme(text = element_text(size = 16), 
        panel.border = element_rect(fill = NA, colour = "grey20")) +
  theme_minimal()


ree

All countries show some improvement in life expectancy. The small improvement for Jamaica.


All countries show some improvement in life expectancy. The small improvement for Jamaica.


gapminder %>%
  filter(year >= 2002, continent == "Americas") %>% 
  mutate(Year = factor(year)) %>% 
  ggplot(aes(y = reorder(country, lifeExp),
           x = lifeExp)) +
  geom_line(aes(group = country), 
            size = 1.5, 
            color = "grey") +
  geom_point(aes(color = Year),
             size = 4) +
  scale_color_manual(values=c("red", "blue")) +
  labs(x = "Life Expectancy (years)", y = NULL) +
  theme(text = element_text(size = 16), 
        panel.border = element_rect(fill = NA, colour = "grey20")) +
  theme_minimal()


ree









Comments


bottom of page