This is the continuation of the previous tutorial. In the previous post, we made a histogram and provide annotations inside the graph showing the length-at-first maturity as well as the sections for juveniles, mature, and the mega-spawners. In this tutorial, I am going to show how to use facets to display a subset of the data, also termed as small multiples. This type of graph is especially useful to compare data across groups, for example, length frequency distribution of a species per fishing gear.

Extracted from the official reference of ggplot2:

Facetting generates small multiples, each displaying a different subset of the data. Facets are an alternative to aesthetics for displaying additional discrete variables.

There are two functions: facet_wrap() and facet_grid(). In this tutorial, we will only be dealing with the facet_wrap().

Preliminaries

To begin, let us load the required packages.

library(ggplot2)
library(dplyr)
library(magrittr)
library(ggthemes)

We will use again the data of Coregonus artedii.

head(cisco_data)
##   X lakeid year4 sampledate gearid spname length weight sex
## 1 1     TR  1981  8/11/1981 VGN032  CISCO    140   21.4   F
## 2 2     TR  1981  8/10/1981 VGN032  CISCO    146   22.3   F
## 3 3     TR  1981  8/11/1981 VGN032  CISCO    147   23.3   F
## 4 4     TR  1981  8/19/1981 VGN032  CISCO    153   23.5   F
## 5 5     TR  1981  8/19/1981 VGN032  CISCO    150   24.0   F
## 6 6     TR  1981  8/19/1981 VGN032  CISCO    152   24.0   F

Furthermore, the objects (variables) from the previous tutorial will be use as well as the custom theme. I modified the theme: axis.text.x = element_text(size = 10, angle = 25) to axis.text.x = element_text(size = 10).

cisco_data_range <- max(cisco_data$length) - min(cisco_data$length)
class_size <- 20
class_interval <- cisco_data_range / class_size
cisco_lm_mm <- 171

Plotting

Length frequency distribution by fishing gear

We will begin to display the length frequency distribution by gear ID. In our data, there are 9 fishing gears used to catch this species.

First, we will make an object containing the title for our graph:

my_title_gear <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per gear"))
gear_multiples <- ggplot(data = cisco_data, aes(x = length)) +
  geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
  labs(title = my_title_gear,
       subtitle = "The dotted red line represents the length at first maturity (171 mm)",
       x = "Total Length (mm)", 
       y = "Frequency") +
  geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
             linetype = "dotted", size = 0.5) +
  facet_wrap(~ gearid) +
  theme_pub()

print(gear_multiples)

According to the book R for Data Science by Grolemund and Wickham, on the topic Facets:

To facet your plot by a single variable, use facet_wrap(). The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”). The variable that you pass to facet_wrap() should be discrete.

In our case, the gearid is the variable that must be supplied to the function facet_wrap() to create a multiple subset of length frequency distribution based on the fishing gears used.

Length frequency distribution by year

Next, we will look at the length frequencies per year.

my_title_year <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per year"))
year_multiples <- ggplot(data = cisco_data, aes(x = length)) +
  geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
  labs(title = my_title_year,
       subtitle = "The dotted red line represents the length at first maturity (171 mm)",
       x = "Total Length (mm)", 
       y = "Frequency") +
  geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
             linetype = "dotted", size = 0.5) +
  facet_wrap(~ year4, nrow = 5) +
  theme_pub()

print(year_multiples)

Length frequency distribution by sex

Lastly, we will going to compare the length frequencies of this species per sex. This is tricky because the column for sex contains NAs. In this tutorial, we will only get the length frequencies of male and female Coregonus artedii, indicated by M and F, respectively, in the data.

We will make a new object containing the sexes of the species:

cisco_sex <- cisco_data %>% 
  filter(sex == c("F", "M"))

The filter function is part of the dplyr package included in the tidyverse. The function will, as you guess, filter the rows of the data frame and will look only for the rows in the sex column containing the values inside the c("F", "M"), which is the female and the male. The == is a relational operator which means equal to.

The same as above, we will make a separate title for the graph:

my_title_sex <- expression(paste("Length frequency of Cisco (", italic("Coregonus artedi"), ") per sex"))
sex_multiples <- ggplot(data = cisco_sex, aes(x = length)) +
  geom_histogram(binwidth = class_interval, color = "black", fill = "gray") +
  labs(title = my_title_sex,
       subtitle = "The dotted red line represents the length at first maturity (171 mm)",
       x = "Total Length (mm)", 
       y = "Frequency") +
  geom_vline(aes(xintercept = cisco_lm_mm), color = "red",
             linetype = "dotted", size = 0.5) +
  facet_wrap(~ sex) +
  theme_pub()

print(sex_multiples)

Conclusion

Hooray! We made it. As you can see, it is very easy to make a small multiples using facet_wrap() if we are only subsetting single variable. It is also easy to reproduce the graph even if you update the original data - you will not going to do a series of manual adjustments if there are modifications in your data.

Hope you enjoy this short tutorial.

via GIPHY