I. Introduction

\(~\) \(~\)

Where we’ve been –

The midterm assessment was designed to evaluate your ‘fundamental’ skills as a data scientist, corresponding to Course Objectives (1)-(4,5) in the syllabus. Those skills break down into –

  1. Workflow
    • .Rprojects + Github, Rmarkdown, best R practices, reproducibility
  2. Data Wrangling
    • dplyr, tidyr, forcats, stringr, lubridate
    • plus iteration with purrr::map family
  3. Data Visualization
    • ggplot2,gt
  4. Exploratory Data Analysis
    • Use the above to read, explore, clean/tidy, prepare for analysis and visualize new data

\(~\) \(~\)

Where we’re going –

As we move forward this semester (today and after Spring Break 3/5 - 3/13), we will continuously utilize and extend these ‘fundamental’ skills to maximize R and Rstudio’s potential for data science and analysis. With the aforementioned fundamentals in hand, we are going to learn how to use these ‘advanced’ communication and analysis tools –

  1. Enhanced Visualization
    • Interactivity w/ ggplotly, reactable
    • Enhance gt with gtExtras (and flextable, ftExtra)
    • Intro to flexdashboard
  2. Websites in R + Github
    • Basics, personal webpage
    • Websites as analytical display tools
      • Embedding flexdashboard
  3. Spatial Visualization (today)
    • ‘Tidy Maps’ w/ sforsp + tidyverse + ggmaps and ggplot2
    • Introduction to interactive maps with ggplotly and/or leaflet
  4. Shiny Apps
    • Basics
    • Publish in an R hosted website
      • Embedding interactive flexdashboards
      • Embedding shiny apps
  5. Miscellaneous
    • *Working with big data in R dt_plyr, collapse, h2o, sparklyr
    • *Working with databases in R db_plyr
    • *How to develop a package with Rstudio & Github
    • *Webscraping with R

\(~\) \(~\)

So what exactly are we going to do today?

First, we’re going to briefly cover joining multiple data sources with dplyr with spatial examples. Then we’re going to build onto last week’s lecture and expand our spatial data toolbox in R with more advanced use of sf, ggmaps, tidycensus, as well as an introduction to fully interactive leaflet. Finally, we will apply these tools in an activity where we create and edit a more advanced spatial dashboard together.

#Install the packages for today if you don't already have them
install.packages(c("sf", "ggmap", "tmap", "tidycensus", "leaflet", "osmdata", "tigris"))

II. Joining data in R

Joining data from multiple sources is another aspect of data wrangling which was covered in PUBH 7461, but is an important part of working with real-world data that we should make sure we’re on the same page about heading into the final project.

Laura Le’s wonderful lecture regarding joining data in R, as well as an example/activity with NYC flight data can be found on Canvas here.

III. Advanced sf, ggmap, tidycensus, ggplot, plotly

Simple Features sf Resources

Like many things in the R universe, the sf package has wonderful documentation and examples. Please spend some time reviewing these on your own.

  1. Introduction
  2. Reading, writing, converting Simple Features
  3. Manipulating Simple Features
  4. Plotting Simple Features

Example 1. (MN) More advanced sf + ggplotly

First, let’s download the ggthemes package for a few more thematic choices in our ggplot’s.

#Install ggrepel if necessary 
if (!require(ggthemes)) {
  install.packages("ggthemes", quiet = TRUE)
}

#Call the library
library(ggthemes, quietly = TRUE)

Next, let’s read in our MN .shp file (from last week’s lecture).

#Read in the shape file (don't make a tibble)
mn.df <- st_read("./data/USA_Counties/USA_Counties.shp", quiet = TRUE) %>%
  janitor::clean_names() %>%
  filter(state_name %in% "Minnesota")

Next, let’s build our ggplot but add a little more information with our usual data wrangling skills and employ a better ggthemes.

mn_pop.gg <- mn.df %>%
  dplyr::select(name, white:other, renter_occ, owner_occ, geometry) %>%
  rename(county = name) %>%
  pivot_longer(
    cols      = white:other, #tidy long data by category
    names_to  = "race_category",
    values_to = "race_pop"
  ) %>%
  mutate(
    race_category = str_replace_all(race_category, "_", " ") %>%
                    str_to_title() %>%
                    as_factor()
  ) %>%
  group_by(county) %>% #County level population
  mutate(county_pop = sum(race_pop)) %>%
  group_by(county, race_category) %>%
  summarise(
    perc_race = race_pop / county_pop,
    perc_rent = renter_occ / (renter_occ + owner_occ),
    geometry   = geometry
  ) %>%
  ungroup() %>%
  nest(data = c("race_category", "perc_race", "geometry")) %>%
  mutate(
    text_label = map_chr(.x = data, 
                     ~str_c(
                       "\n",
                       .x$race_category,
                       ": ", 
                       scales::percent(.x$perc_race, accuracy = 0.0001),
                       collapse = ""
                      )
                 ),
    text_label = str_c(county, "\nDemographics", text_label, "\nAvg. Rental Percentage: ", scales::percent(perc_rent, accuracy = 0.01))
  ) %>%
  unnest(data) %>%
  st_as_sf() %>%
  ggplot() +
  geom_sf(aes(fill = perc_rent, text = text_label),
          colour = "black", size = 0.8, alpha = 0.6) +
  labs(
    title = "2017 MN ACS Rent vs. Own % by County" 
  ) +
  scale_fill_viridis_c("Percent Rental", labels = scales::percent) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )


#Plotly
ggplotly(mn_pop.gg, 
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
  style(hoveron = "fills")

Advanced tidycensus

Working with plotly

Example 2. (MN) More advanced tidycensus + ggplotly

Request the ACS estimates for median income in Hennepin County, MN, at the census tract level from the US census.gov via tidycensus.

#Save the cache so you don't need to call the same API request over and over
options(tigris_use_cache = TRUE)

#Call the ACS API, returns a sf object
mn_income.df <- get_acs(
  geography = "tract",
  variables = "B19013_001", #Code for median income
  state     = "MN",
  county    = "Hennepin",
  year      = 2020,
  geometry  = TRUE
)
Getting data from the 2016-2020 5-year ACS
Using FIPS code '27' for state 'MN'
Using FIPS code '053' for 'Hennepin County'

Now let’s plot it with a nice theme and turn it into a plotly

#Add a text label to mn_income.df
mn_income_plotly.df <- mn_income.df %>%
  mutate(
    tract      = str_split(NAME, ",") %>%
                 map_chr(1) %>%
                 str_remove("Census Tract "),
    text_label = str_c(
                  "Tract: ",
                  tract,
                  "\nMedian Income: ",
                  scales::dollar(estimate)
                 )
  )

#Generate the ggplot
income.gg <- ggplot() + 
  geom_sf(data = mn_income_plotly.df, 
          aes(fill = estimate, text = text_label),
          colour = "black", size = 0.1) + 
  labs(title = "Hennepin County, MN 2020 ACS Median Income") + 
  scale_fill_viridis_c("Median Income", labels = scales::dollar) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )


#Display
ggplotly(income.gg,
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
    style(hoveron = "fills")

Combining with ggmap

Example 3. (MN) More advanced ggmap + tidycensus + ggplotly

Now, let’s take the same plot as above but overlay it onto a ‘nice’ map of MN with ggmap.

For reference, here is the complete documentation for the options and types of maps available with ggmap. Alternatively, a nice cheat sheet for ggmap can be found here

1. Create the base map

2. Add the income layer(s) from before and transform into ggplotly

#Display plotly
ggplotly(mn_income.ggmap,
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
    style(hoveron = "fills")

Joining other spatial data (osmdata)

Advanced ggmap

Working with plotly

Working with tidycensus

III. Advanced sf, ggmap, tidycensus, ggplot, plotly

IV. Leaflet

Introduction to leaflet

Example 1. NYC Airbnb

V. Activity

Activity 1 (together). Creating a spatial dashboard of NYC Airbnb’s in Manhattan

Today’s NYC spatial dashboard can be accessed on Canvas under Week 11 Lecture Materials.

Activity 2 (your turn). Edit the dashboard

For today’s activity, please –

  1. Set up a Google Maps API key (directions above) for ggmap
  2. Edit the dashboard as follows –
    • Pick another variable from the ACS Codebook and replace the Median Income plot with this new variable
    • Colour the map’s points by rating instead of price
    • Show the distribution of ratings rather than price (boxplot)
---
title: "Week 11: Spatial Mapping II"
subtitle: "sf, ggplot, ggmaps, tidycensus, and leaflet"
author: "Quinton Neville"
date: "April 1st, 2022"
output:
  html_document:
    df_print: paged
  html_notebook:
    highlight: tango
    theme: readable
header-includes: 
  \usepackage{graphicx}
  \usepackage{float}
  \usepackage{amsmath}
---

```{r setup, include = FALSE, echo = FALSE}
#Load the good stuff
library(tidyverse)
library(lubridate)
library(gt)
library(paletteer)
library(ggridges)
library(plotly)
library(gtExtras)
library(sf)
library(ggmap)
library(tidycensus)
library(leaflet)
library(osmdata)
library(tigris)
library(ggsflabel)
library(ggthemes)

#Working directory for .RMD
knitr::opts_knit$set(echo = TRUE,
                     root.dir = rprojroot::find_rstudio_root_file())

#Controlling figure output in markdown
knitr::opts_chunk$set(
#  fig.height =   
  fig.width = 6,
#  fig.asp = .5,
  out.width = "90%",
#  out.height = 
 fig.align  = "center",
  cache = TRUE,
  eval  = TRUE,
  echo  = TRUE,
  warning = FALSE
)

#My Colours (from viridis)
my_purple <- "#440154FF"
my_yellow <- "#FDE725FF"

#Set Theme for ggplot2
theme_set(theme_bw() + theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom"))

#Set Scientific notation output and decimal places for knitr
options(scipen = 999)
options(digits = 4)
options(dplyr.summarise.inform = FALSE)
```

# I. Introduction {.jumbotron}  

$~$
$~$

## Where we've been --

The midterm assessment was designed to evaluate your 'fundamental' skills as a data scientist, corresponding to _Course Objectives_ (1)-(4,5) in the [syllabus](https://canvas.umn.edu/courses/293049/files?preview=26516582). Those skills break down into -- 

1. **Workflow** 
    - .Rprojects + Github, Rmarkdown, best `R` practices, reproducibility
2. **Data Wrangling**
    - `dplyr`, `tidyr`, `forcats`, `stringr`, `lubridate`
    - plus iteration with `purrr::map` family
3. **Data Visualization** 
    - `ggplot2`,`gt`
4. **Exploratory Data Analysis**
    - Use the above to read, explore, clean/tidy, prepare for analysis and visualize _new_ data   

$~$
$~$

## Where we're going --    

As we move forward this semester (today and after Spring Break 3/5 - 3/13), we will continuously utilize and extend these 'fundamental' skills to maximize `R` and Rstudio's potential for data science and analysis. With the aforementioned fundamentals in hand, we are going to learn how to use these 'advanced' communication and analysis tools --

5. **Enhanced Visualization** 
    - Interactivity w/ `ggplotly`, `reactable`
    - Enhance `gt` with `gtExtras` (and `flextable`, `ftExtra`)
    - Intro to `flexdashboard`
6. **Websites in R + Github**
    - Basics, personal webpage 
    - Websites as analytical display tools
        - Embedding `flexdashboard`
7. **Spatial Visualization** **(today)**
    - 'Tidy Maps' w/ `sf`or`sp` + `tidyverse` + `ggmaps` and `ggplot2`
    - Introduction to interactive maps with `ggplotly` and/or `leaflet`
8. **Shiny Apps**
    - Basics
    - Publish in an `R` hosted website
        - Embedding interactive `flexdashboards`
        - Embedding `shiny` apps
9. **Miscellaneous**
    - *Working with big data in R `dt_plyr`, `collapse`, `h2o`, `sparklyr`
    - *Working with databases in R `db_plyr`
    - *How to develop a package with Rstudio & Github
    - *Webscraping with R

$~$
$~$

# {.panel .panel-primary}
## {.panel-heading}
### So what exactly are we going to do today? {.panel-title}
## {.panel-body}


First, we're going to briefly cover joining multiple data sources with `dplyr` with spatial examples. Then we're going to build onto last week's lecture and expand our spatial data toolbox in `R` with more advanced use of `sf`, `ggmaps`, `tidycensus`, as well as an introduction to fully interactive `leaflet`. Finally, we will apply these tools in an activity where we create and edit a more advanced spatial dashboard together.  

```{r eval = FALSE}
#Install the packages for today if you don't already have them
install.packages(c("sf", "ggmap", "tmap", "tidycensus", "leaflet", "osmdata", "tigris"))
```

# II. Joining data in `R` {.jumbotron}  

Joining data from multiple sources is another aspect of *data wrangling* which was covered in PUBH 7461, but is an important part of working with real-world data that we should make sure we're on the same page about heading into the final project.  

Laura Le's wonderful lecture regarding joining data in R, as well as an example/activity with NYC flight data can be found on Canvas [here](https://canvas.umn.edu/courses/293049/pages/joining-data-with-dplyr).

![](./images/pubh_7462_midterm_rubric.png)

# III. Advanced `sf`, `ggmap`, `tidycensus`, `ggplot`, `plotly` {.jumbotron}  

# Simple Features `sf` Resources   

Like many things in the `R` universe, the `sf` package has wonderful documentation and examples. Please spend some time reviewing these on your own.  

1. [Introduction](https://r-spatial.github.io/sf/articles/sf1.html)
2. [Reading, writing, converting Simple Features](https://r-spatial.github.io/sf/articles/sf2.html)
3. [Manipulating Simple Features](https://r-spatial.github.io/sf/articles/sf3.html)
4. [Plotting Simple Features](https://r-spatial.github.io/sf/articles/sf5.html)

# {.panel .panel-primary}
## {.panel-heading}
### Example 1. (MN) More advanced `sf` + `ggplotly`  {.panel-title}
## {.panel-body}  

First, let's download the `ggthemes` package for a few more thematic choices in our `ggplot`'s.  

```{r message = FALSE, warning = FALSE}
#Install ggrepel if necessary 
if (!require(ggthemes)) {
  install.packages("ggthemes", quiet = TRUE)
}

#Call the library
library(ggthemes, quietly = TRUE)
```


Next, let's read in our MN .shp file (from last week's lecture).  

```{r mn_read}
#Read in the shape file (don't make a tibble)
mn.df <- st_read("./data/USA_Counties/USA_Counties.shp", quiet = TRUE) %>%
  janitor::clean_names() %>%
  filter(state_name %in% "Minnesota")
```

Next, let's build our `ggplot` but add a little more information with our usual data wrangling skills and employ a better `ggthemes`.  

```{r mn_plot_1}
mn_pop.gg <- mn.df %>%
  dplyr::select(name, white:other, renter_occ, owner_occ, geometry) %>%
  rename(county = name) %>%
  pivot_longer(
    cols      = white:other, #tidy long data by category
    names_to  = "race_category",
    values_to = "race_pop"
  ) %>%
  mutate(
    race_category = str_replace_all(race_category, "_", " ") %>%
                    str_to_title() %>%
                    as_factor()
  ) %>%
  group_by(county) %>% #County level population
  mutate(county_pop = sum(race_pop)) %>%
  group_by(county, race_category) %>%
  summarise(
    perc_race = race_pop / county_pop,
    perc_rent = renter_occ / (renter_occ + owner_occ),
    geometry   = geometry
  ) %>%
  ungroup() %>%
  nest(data = c("race_category", "perc_race", "geometry")) %>%
  mutate(
    text_label = map_chr(.x = data, 
                     ~str_c(
                       "\n",
                       .x$race_category,
                       ": ", 
                       scales::percent(.x$perc_race, accuracy = 0.0001),
                       collapse = ""
                      )
                 ),
    text_label = str_c(county, "\nDemographics", text_label, "\nAvg. Rental Percentage: ", scales::percent(perc_rent, accuracy = 0.01))
  ) %>%
  unnest(data) %>%
  st_as_sf() %>%
  ggplot() +
  geom_sf(aes(fill = perc_rent, text = text_label),
          colour = "black", size = 0.8, alpha = 0.6) +
  labs(
    title = "2017 MN ACS Rent vs. Own % by County" 
  ) +
  scale_fill_viridis_c("Percent Rental", labels = scales::percent) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )


#Plotly
ggplotly(mn_pop.gg, 
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
  style(hoveron = "fills")
```

# Advanced `tidycensus`  

## Working with `plotly` 

# {.panel .panel-primary}
## {.panel-heading}
### Example 2. (MN) More advanced `tidycensus` + `ggplotly` {.panel-title}
## {.panel-body}  

Request the ACS estimates for _median income_ in Hennepin County, MN, at the census tract level from the US census.gov via `tidycensus`.

```{r tidy_census, message = FALSE, warning = FALSE}
#Save the cache so you don't need to call the same API request over and over
options(tigris_use_cache = TRUE)

#Call the ACS API, returns a sf object
mn_income.df <- get_acs(
  geography = "tract",
  variables = "B19013_001", #Code for median income
  state     = "MN",
  county    = "Hennepin",
  year      = 2020,
  geometry  = TRUE
)
```

Now let's plot it with a nice theme and turn it into a plotly

```{r mn_income}
#Add a text label to mn_income.df
mn_income_plotly.df <- mn_income.df %>%
  mutate(
    tract      = str_split(NAME, ",") %>%
                 map_chr(1) %>%
                 str_remove("Census Tract "),
    text_label = str_c(
                  "Tract: ",
                  tract,
                  "\nMedian Income: ",
                  scales::dollar(estimate)
                 )
  )

#Generate the ggplot
income.gg <- ggplot() + 
  geom_sf(data = mn_income_plotly.df, 
          aes(fill = estimate, text = text_label),
          colour = "black", size = 0.1) + 
  labs(title = "Hennepin County, MN 2020 ACS Median Income") + 
  scale_fill_viridis_c("Median Income", labels = scales::dollar) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )


#Display
ggplotly(income.gg,
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
    style(hoveron = "fills")
```

## Combining with `ggmap`  

# {.panel .panel-primary}
## {.panel-heading}
### Example 3. (MN) More advanced `ggmap` + `tidycensus` + `ggplotly` {.panel-title}
## {.panel-body}  

Now, let's take the same plot as above but overlay it onto a 'nice' map of MN with `ggmap`.  

For reference, [here](https://rdrr.io/cran/ggmap/man/get_map.html) is the complete documentation for the options and types of maps available with `ggmap`. Alternatively, a nice cheat sheet for `ggmap` can be found [here](https://www.nceas.ucsb.edu/sites/default/files/2020-04/ggmapCheatsheet.pdf)  

### 1. Create the base map  

```{r ggmap_base, message = FALSE}
#Get the bounding box and county outline
mn.box           <- osmdata::getbb("minnesota")
hennepin.box     <- osmdata::getbb("hennepin")
hennepin.outline <- osmdata::getbb("hennepin", format_out = "polygon")[[1]] %>%
  as_tibble() %>%
  rename(longitude = V1, latitude = V2)


#Get map from ggmap
#Get the base map (foundational layer)
mn_base.map <- get_map(
                location = hennepin.box,
                source   = "google",
                maptype  = "roadmap",
                crop = TRUE
               )

#Create the base map
hennepin_base.gg <- ggmap(mn_base.map) +
  geom_polygon(data = mn.outline, aes(x = longitude, y = latitude), colour = "black", size = 1.6, alpha = 0.1) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )
    
#Display base map
hennepin_base.gg
```

### 2. Add the income layer(s) from before and transform into `ggplotly`  

```{r ggmap_plotly, message = FALSE, warning = FALSE, results = FALSE}
#First, need to make sure the coordinate systems between the ggmap and geom_sf match
coord_ggmap <- st_crs(hennepin_base.gg) #NA
coord_sf    <- st_crs(mn_income_plotly.df) #NAD83

#Overlay thge sf info from tidycensus ACS income estimates
mn_income.ggmap <- hennepin_base.gg +  
  st_crs(hennepin_base.gg)
st_crs(mn_income_plotly.df)
  geom_sf(data = mn_income_plotly.df, 
          aes(fill = estimate, text = text_label),
          colour = "black", size = 0.1,
          inherit.aes = FALSE) + 
  labs(title = "Hennepin County, MN 2020 ACS Median Income") + 
  scale_fill_viridis_c("Median Income", labels = scales::dollar) +
  theme_map() +
  theme(
    plot.title   = element_text(size  = 24,
                                hjust = 0.5),
    legend.text  = element_text(size = 20),
    legend.title = element_text(size = 20),
    legend.position = "right"
  )
```

```{r}
#Display plotly
ggplotly(mn_income.ggmap,
         tooltip = "text",
         height  = 600,
         width   = 800) %>%
    style(hoveron = "fills")
```

## Joining other spatial data  (`osmdata`)

# Advanced `ggmap`  

## Working with `plotly`  

## Working with `tidycensus`  

# III. Advanced `sf`, `ggmap`, `tidycensus`, `ggplot`, `plotly` {.jumbotron}  


# IV. Leaflet {.jumbotron}  

# Introduction to `leaflet`  

# {.panel .panel-primary}
## {.panel-heading}
### Example 1. NYC Airbnb {.panel-title}
## {.panel-body}

# V. Activity {.jumbotron}  

# {.panel .panel-success}
## {.panel-heading}
### Activity 1 (together). Creating a spatial dashboard of NYC Airbnb's in Manhattan {.panel-title}
## {.panel-body}  

Today's NYC spatial dashboard can be accessed on Canvas under [Week 11 Lecture Materials](https://canvas.umn.edu/courses/293049/pages/week-11-lecture?module_item_id=7496270).  

# {.panel .panel-success}
## {.panel-heading}
### Activity 2 (your turn). Edit the dashboard {.panel-title}
## {.panel-body}  

For today's activity, please -- 

1. Set up a Google Maps API key (directions above) for `ggmap`
2. Edit the dashboard as follows -- 
    - Pick another variable from the [ACS Codebook]() and replace the Median Income plot with this new variable
    - Colour the map's points by rating instead of price
    - Show the distribution of ratings rather than price (boxplot)   
