I. Introduction

\(~\) \(~\)

Where we’ve been –

The midterm assessment was designed to evaluate your ‘fundamental’ skills as a data scientist, corresponding to Course Objectives (1)-(4,5) in the syllabus. Those skills break down into –

  1. Workflow
    • .Rprojects + Github, Rmarkdown, best R practices, reproducibility
  2. Data Wrangling
    • dplyr, tidyr, forcats, stringr, lubridate
    • plus iteration with purrr::map family
  3. Data Visualization
    • ggplot2,gt
  4. Exploratory Data Analysis
    • Use the above to read, explore, clean/tidy, prepare for analysis and visualize new data

\(~\) \(~\)

Where we’re going –

As we move forward this semester (today and after Spring Break 3/5 - 3/13), we will continuously utilize and extend these ‘fundamental’ skills to maximize R and Rstudio’s potential for data science and analysis. With the aforementioned fundamentals in hand, we are going to learn how to use these ‘advanced’ communication and analysis tools –

  1. Enhanced Visualization
    • Interactivity w/ ggplotly, reactable
    • Enhance gt with gtExtras (and flextable, ftExtra)
    • Intro to flexdashboard
  2. Websites in R + Github
    • Basics, personal webpage
    • Websites as analytical display tools
      • Embedding flexdashboard
  3. Spatial Visualization (today)
    • ‘Tidy Maps’ w/ sforsp + tidyverse + ggmaps and ggplot2
    • Introduction to interactive maps with ggplotly and/or leaflet
  4. Shiny Apps
    • Basics
    • Publish in an R hosted website
      • Embedding interactive flexdashboards
      • Embedding shiny apps
  5. Miscellaneous
    • *Working with big data in R dt_plyr, collapse, h2o, sparklyr
    • *Working with databases in R db_plyr
    • *How to develop a package with Rstudio & Github
    • *Webscraping with R

\(~\) \(~\)

So what exactly are we going to do today?

First, we’re going to briefly cover joining multiple data sources with dplyr with spatial examples. Then we’re going to build onto last week’s lecture and expand our spatial data toolbox in R with more advanced use of sf, ggmaps, tidycensus, as well as an introduction to fully interactive leaflet. Finally, we will apply these tools in an activity where we create and edit a more advanced spatial dashboard together.

#Install the packages for today if you don't already have them
install.packages(c("sf", "ggmap", "tmap", "tidycensus", "leaflet", "osmdata", "tigris"))

II. Joining data in R

Joining data from multiple sources is another aspect of data wrangling which was covered in PUBH 7461, but is an important part of working with real-world data that we should make sure we’re on the same page about heading into the final project.

Laura Le’s wonderful lecture regarding joining data in R, as well as an example/activity with NYC flight data can be found on Canvas here.

III. Advanced sf, ggmap, tidycensus, ggplot, plotly

Simple Features sf Resources

Like many things in the R universe, the sf package has wonderful documentation and examples. Please spend some time reviewing these on your own.

  1. Introduction
  2. Reading, writing, converting Simple Features
  3. Manipulating Simple Features
  4. Plotting Simple Features

Example 1. (MN) More advanced sf + ggplotly

First, let’s download the ggthemes package for a few more thematic choices in our ggplot’s.

#Install ggrepel if necessary 
if (!require(ggthemes)) {
  install.packages("ggthemes", quiet = TRUE)
}

#Call the library
library(ggthemes, quietly = TRUE)

Next, let’s read in our MN .shp file (from last week’s lecture).

#Read in the shape file (don't make a tibble)
mn.df <- st_read("./data/USA_Counties/USA_Counties.shp", quiet = TRUE) %>%
  janitor::clean_names() %>%
  filter(state_name %in% "Minnesota")

Next, let’s build our ggplot but add a little more information with our usual data wrangling skills and employ a better ggthemes.

mn_pop.gg <- mn.df %>%
  dplyr::select(name, white:other, renter_occ, owner_occ, geometry) %>%
  rename(county = name) %>%
  pivot_longer(
    cols      = white:other, #tidy long data by category
    names_to  = "race_category",
    values_to = "race_pop"
  ) %>%
  mutate(
    race_category = str_replace_all(race_category, "_", " ") %>%
                    str_to_title() %>%
                    as_factor()
  ) %>%
  group_by(county) %>% #County level population
  mutate(county_pop = sum(race_pop)) %>%
  group_by(county, race_category) %>%
  summarise(
    perc_race = race_pop / county_pop,
    perc_rent = renter_occ / (renter_occ + owner_occ),
    geometry   = geometry
  ) %>%
  ungroup() %>%
  nest(data = c("race_category", "perc_race", "geometry")) %>%
  mutate(
    text_label = map_chr(.x = data, 
                     ~str_c(
                       "\n",
                       .x$race_category,
                       ": ", 
                       scales::percent(.x$perc_race, accuracy = 0.0001),
                       collapse = ""
                      )
                 ),
    text_label = str_c(county, "\nDemographics", text_label, "\nAvg. Rental Percentage: ", scales::percent(perc_rent, accuracy = 0.01))
  ) %>%
  unnest(data) %>%
  st_as_sf() %>%
  ggplot() +
  geom_sf(aes(fill = perc_rent, text = text_label),
          colour = "black", size = 0.8, alpha = 0.6) +
  labs(
    title = "2017 MN ACS Rent vs. Own % by County" 
  ) +
  scale_fill_viridis_c("Percent Rental", labels = scales::percent) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )


#Plotly
ggplotly(mn_pop.gg, 
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
  style(hoveron = "fills")

Advanced tidycensus

Working with plotly

Example 2. (MN) More advanced tidycensus + ggplotly

Request the ACS estimates for median income in Hennepin County, MN, at the census tract level from the US census.gov via tidycensus.

#Save the cache so you don't need to call the same API request over and over
options(tigris_use_cache = TRUE)

#Call the ACS API, returns a sf object
mn_income.df <- get_acs(
  geography = "tract",
  variables = "B19013_001", #Code for median income
  state     = "MN",
  county    = "Hennepin",
  year      = 2020,
  geometry  = TRUE
)
Getting data from the 2016-2020 5-year ACS
Using FIPS code '27' for state 'MN'
Using FIPS code '053' for 'Hennepin County'

Now let’s plot it with a nice theme and turn it into a plotly

#Add a text label to mn_income.df
mn_income_plotly.df <- mn_income.df %>%
  mutate(
    tract      = str_split(NAME, ",") %>%
                 map_chr(1) %>%
                 str_remove("Census Tract "),
    text_label = str_c(
                  "Tract: ",
                  tract,
                  "\nMedian Income: ",
                  scales::dollar(estimate)
                 )
  )

#Generate the ggplot
income.gg <- ggplot() + 
  geom_sf(data = mn_income_plotly.df, 
          aes(fill = estimate, text = text_label),
          colour = "black", size = 0.1) + 
  labs(title = "Hennepin County, MN 2020 ACS Median Income") + 
  scale_fill_viridis_c("Median Income", labels = scales::dollar) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )


#Display
ggplotly(income.gg,
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
    style(hoveron = "fills")

Combining with ggmap

Example 3. (MN) More advanced ggmap + tidycensus + ggplotly

Now, let’s take the same plot as above but overlay it onto a ‘nice’ map of MN with ggmap.

For reference, here is the complete documentation for the options and types of maps available with ggmap. Alternatively, a nice cheat sheet for ggmap can be found here

1. Create the base map

2. Add the income layer(s) from before and transform into ggplotly

#Display plotly
ggplotly(mn_income.ggmap,
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
    style(hoveron = "fills")

Joining other spatial data (osmdata)

Advanced ggmap

Working with plotly

Working with tidycensus

III. Advanced sf, ggmap, tidycensus, ggplot, plotly

IV. Leaflet

Introduction to leaflet

Example 1. NYC Airbnb

V. Activity

Activity 1 (together). Creating a spatial dashboard of NYC Airbnb’s in Manhattan

Today’s NYC spatial dashboard can be accessed on Canvas under Week 11 Lecture Materials.

Activity 2 (your turn). Edit the dashboard

For today’s activity, please –

  1. Set up a Google Maps API key (directions above) for ggmap
  2. Edit the dashboard as follows –
    • Pick another variable from the ACS Codebook and replace the Median Income plot with this new variable
    • Colour the map’s points by rating instead of price
    • Show the distribution of ratings rather than price (boxplot)
