Update: I noticed that in the rendered page, the numbers with decimals are rounded, some are not, so I used as.data.frame() to convert from the tibble format.


Last May 24-25, 2018, I attended the 9th West Pacific East Asia - National Stock Assessment Program (WPEA-NSAP) Tuna Catch Estimates Review Workshop at Ocean 101, Catangnan, General Luna (Siargao), Surigao Del Norte. The said activity aimed to review the NSAP port sampling data collected in each region and what information the BFAR regional offices had compiled that could be used in the annual catch estimates of tuna species. One of the presentations from another region interests me - they showed a map of the fishing grounds in their region and overlaid a pie chart showing the landed catch composition of tuna. When I returned to the office, I tried to make the same graph but using the R language.

As I was browsing for an inspiration, I stumbled at this website in which the author, Guanglai Li, showed the racial composition in largest metro areas in the United States using the package scatterpie by Guangchuang YU. I never heard or read about this package, so I visited the official documentation and found a working example.

Note 1: The code below was adapted from both sites and I modified it to suit my needs. In addition, the code undergone several trials and errors before generating the final code.

Note 2: Since we cannot share the NSAP data without the permission of the higher officials, I used the data from the Philippine Statistics Authority (PSA) and is freely accessible here. I pre-formatted the .csv file and you can download it here.

The data

As noted above, the data I used was extracted from PSA and I only selected the 2015 to 2017 data of three tuna species - bigeye tuna, skipjack, and yellowfin tuna. Furthermore, I only processed the tuna production of marine municipal fisheries. Lastly, only the data from the six provinces of Bicol Region was used - Albay, Camarines Norte, Camarines Sur, Catanduanes, Masbate, and Sorsogon.

The process

To begin, let us load the needed packages.

library(magrittr)
library(tidyverse)
library(scatterpie)
library(maps)
library(mapdata)

I will not discuss here about the scatterpie package but I will extract the phrase from Guanglai Li’s post:

scatterpie … specializes in making pie charts at multiple locations. This package is an extension of ggplot2 so it will be easy for ggplot2 users.

Data manipulation

Before going to making the graph, let us first look at our data.

municipal <- readr::read_csv("data/raw-data/municipal.csv") %>% 
  as.data.frame()
tibble::glimpse(municipal)
## Observations: 18
## Variables: 5
## $ province <chr> "Albay", "Albay", "Albay", "Camarines Norte", "Camari...
## $ species  <chr> "Bigeye tuna", "Skipjack", "Yellowfin tuna", "Bigeye ...
## $ `2015`   <dbl> 285.74, 3.24, 262.57, 448.39, 1347.72, 713.83, 17.14,...
## $ `2016`   <dbl> 281.94, 54.83, 262.66, 627.51, 1381.78, 474.89, 65.85...
## $ `2017`   <dbl> 334.97, 114.55, 289.43, 620.60, 1215.91, 382.25, 314....
head(municipal)
##          province        species    2015    2016    2017
## 1           Albay    Bigeye tuna  285.74  281.94  334.97
## 2           Albay       Skipjack    3.24   54.83  114.55
## 3           Albay Yellowfin tuna  262.57  262.66  289.43
## 4 Camarines Norte    Bigeye tuna  448.39  627.51  620.60
## 5 Camarines Norte       Skipjack 1347.72 1381.78 1215.91
## 6 Camarines Norte Yellowfin tuna  713.83  474.89  382.25

Now we will add a new column containing the sum of columns 2015, 2016, and 2017. We will be going to sum it in a row-wise fashion.

municipal %<>%
  dplyr::rowwise() %>% 
  dplyr::mutate(total = sum(`2015`, `2016`, `2017`)) %>% 
  as.data.frame()

head(municipal)
##          province        species    2015    2016    2017   total
## 1           Albay    Bigeye tuna  285.74  281.94  334.97  902.65
## 2           Albay       Skipjack    3.24   54.83  114.55  172.62
## 3           Albay Yellowfin tuna  262.57  262.66  289.43  814.66
## 4 Camarines Norte    Bigeye tuna  448.39  627.51  620.60 1696.50
## 5 Camarines Norte       Skipjack 1347.72 1381.78 1215.91 3945.41
## 6 Camarines Norte Yellowfin tuna  713.83  474.89  382.25 1570.97

Now, we will only select the province, species, and the total columns.

municipal %<>%
  dplyr::select(province, species, total) %>% 
  as.data.frame()

As I read the documentation, the format of the data frame is in wide format, so we will convert our data frame into that format.

municipal %<>%
  tidyr::spread(key = species, value = total) %>% 
  as.data.frame()

head(municipal)
##          province Bigeye tuna Skipjack Yellowfin tuna
## 1           Albay      902.65   172.62         814.66
## 2 Camarines Norte     1696.50  3945.41        1570.97
## 3   Camarines Sur      397.27  1183.59        1487.27
## 4     Catanduanes      716.94   818.15        1358.92
## 5         Masbate     1314.28    15.76          38.01
## 6        Sorsogon      362.89   320.31         218.74

Since we will plot a pie chart on a map, we need the latitude and longitude of the provinces of Bicol Region.

region5 <- data.frame(
  lon = c(123.632377, 122.763304, 123.348615, 124.242160, 123.558856, 123.930399),
  lat = c(13.171977, 14.139026, 13.525020, 13.708868, 12.306024, 12.759986),
  province = c("Albay", "Camarines Norte", "Camarines Sur", "Catanduanes", "Masbate", "Sorsogon"),
  stringsAsFactors = FALSE)

In addition, we will add the latitude and longitude of the provinces where we put the labels.

provinces <- data.frame(
  lon = c(123.37022, 122.99866, 123.53755, 124.24892, 123.79019, 124.11295),
  lat = c(13.11693, 13.99271, 13.80341, 14.00603, 12.09707, 12.65910),
  province = c("Albay", "Camarines Norte", "Camarines Sur", "Catanduanes", "Masbate", "Sorsogon"),
  stringsAsFactors = FALSE)

Let us now join the municipal and region5 data frames.

municipal <- dplyr::right_join(municipal, region5) %>% 
  as.data.frame()

head(municipal)
##          province Bigeye tuna Skipjack Yellowfin tuna      lon      lat
## 1           Albay      902.65   172.62         814.66 123.6324 13.17198
## 2 Camarines Norte     1696.50  3945.41        1570.97 122.7633 14.13903
## 3   Camarines Sur      397.27  1183.59        1487.27 123.3486 13.52502
## 4     Catanduanes      716.94   818.15        1358.92 124.2422 13.70887
## 5         Masbate     1314.28    15.76          38.01 123.5589 12.30602
## 6        Sorsogon      362.89   320.31         218.74 123.9304 12.75999

Final manipulation of the data frame:

municipal %<>%
  # add new column containing sum of each row
  dplyr::rowwise() %>% 
  dplyr::mutate(total = sum(`Bigeye tuna`, `Skipjack`, `Yellowfin tuna`)) %>% 
  # rename the column names
  dplyr::rename("bigeye tuna" = "Bigeye tuna",
                "skipjack" = "Skipjack",
                "yellowfin tuna" = "Yellowfin tuna") %>% 
  # re-arrange the columns
  dplyr::select(province, lon, lat, "bigeye tuna", 
                "skipjack", "yellowfin tuna", total) %>% 
  as.data.frame()

head(municipal)
##          province      lon      lat bigeye tuna skipjack yellowfin tuna
## 1           Albay 123.6324 13.17198      902.65   172.62         814.66
## 2 Camarines Norte 122.7633 14.13903     1696.50  3945.41        1570.97
## 3   Camarines Sur 123.3486 13.52502      397.27  1183.59        1487.27
## 4     Catanduanes 124.2422 13.70887      716.94   818.15        1358.92
## 5         Masbate 123.5589 12.30602     1314.28    15.76          38.01
## 6        Sorsogon 123.9304 12.75999      362.89   320.31         218.74
##     total
## 1 1889.93
## 2 7212.88
## 3 3068.13
## 4 2894.01
## 5 1368.05
## 6  901.94

The map

We will import the data of the world map.

w2hr <- map_data("world2Hires")

head(w2hr)
##       long      lat group order region subregion
## 1 226.6336 58.42416     1     1 Canada      <NA>
## 2 226.6314 58.42336     1     2 Canada      <NA>
## 3 226.6122 58.41196     1     3 Canada      <NA>
## 4 226.5911 58.40027     1     4 Canada      <NA>
## 5 226.5719 58.38864     1     5 Canada      <NA>
## 6 226.5528 58.37724     1     6 Canada      <NA>
tail(w2hr)
##             long      lat group   order      region subregion
## 2276817 125.0258 11.18471  2284 2276817 Philippines     Leyte
## 2276818 125.0172 11.17142  2284 2276818 Philippines     Leyte
## 2276819 125.0114 11.16110  2284 2276819 Philippines     Leyte
## 2276820 125.0100 11.15555  2284 2276820 Philippines     Leyte
## 2276821 125.0111 11.14861  2284 2276821 Philippines     Leyte
## 2276822 125.0155 11.13887  2284 2276822 Philippines     Leyte

And then we will select the Philippine map:

phil <- w2hr %>% 
  dplyr::filter(region == "Philippines") %>% 
  as.data.frame()

Let us plot the map and zoom-in in Bicol Region.

p1 <- ggplot(phil, aes(long, lat)) +
  geom_map(map = phil, aes(map_id = region), fill = NA, color = "black") +
  coord_quickmap() +
  # zoom in the Bicol Region
  xlim(122, 125) + ylim(11.5, 14.5)

print(p1)

And now we will add the pie charts.

p2 <- p1 +
  # I only copied from Guanglai Li's post most of the code here
  # especially for the values of the radius (r) of the pie chart.
  # The original value is r = sqrt(total)/2000, but the pie charts are smaller,
  # the lower number of the denominator, the bigger the pie chart
  geom_scatterpie(aes(x = lon, y = lat, group = province, r = sqrt(total)/500), 
                  data = municipal,
                  cols = c("bigeye tuna", "skipjack", "yellowfin tuna"),
                  color = "#252728",
                  size = 0.25) +
  geom_text(data = provinces, 
            aes(lon, lat, label = province, fontface = 2),
            nudge_x = 0.05,
            size = 2,
            color = "#FF3333") +
  scale_fill_manual(
    breaks = c("bigeye tuna", "skipjack", "yellowfin tuna"),
    labels = c("Bigeye tuna", "Skipjack", "Yellowfin tuna"),
    values = c("bigeye tuna" = "#003366",
               "skipjack" = "#CCCC66",
               "yellowfin tuna" = "#CC3366")) +
  labs(title = "Distribution of three major species of tuna in Bicol Region",
       caption = "Data Source: Philippine Statistics Authority",
       fill = NULL) +
  coord_fixed() +
  theme_bw() +
  theme(legend.position = c(0.96, 0.02),
        legend.justification = c(1, 0),
        panel.grid = element_blank(),
        panel.border = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        plot.caption = element_text(face = "italic"))

print(p2)

As we can see, more tuna are landed in Camarines Norte compared to other provinces and the dominant species is the skipjack (Katsuwonus pelamis). In Camarines Sur and Catanduanes, the dominant species is the yellowfin tuna (Thunnus albacares), whereas bigeye tuna (Thunnus obesus) dominates the landed catch in Masbate.

That’s it.