Chapter 37 Drawing maps with Leaflet

Packages for this chapter:

library(ggbiplot)
library(tidyverse)
library(ggrepel)

37.1 The brain of a cat, revisited

Earlier, we looked at an ethics study that had to do with a fictional brain of a fictional cat. I said there was actually a town called Catbrain. It’s in England, near Bristol, and seems to be home to a street of car dealerships.

Find the latitude and longitude of “Catbrain UK” (though I don’t think there are any others).
Draw a map of Catbrain using Leaflet.
Make a dataframe containing some other British cities as well as Catbrain, and find their latitudes and longitudes.
Draw a map containing the places you picked.

37.2 Making a map of Wisconsin

The file link contains the road distances (in miles) between 12 cities in Wisconsin and neighbouring states. We are going to try to draw a map of the area using Leaflet.

Read in the data, displaying the data that you read in.
Make a dataframe containing the names of the locations (get rid of the columns containing distances), and add a column of the abbreviations of the states they are in. All of them are in Wisconsin (WI), except for the last three: Dubuque is in Iowa (IA), St. Paul is in Minnesota (MN) and Chicago is in Illinois (IL).
Create a new column in which the abbreviation for the state is glued on to the end of each location, separated by a space.
Look up the latitudes and longitudes of these twelve places.
Obtain a Leaflet map of the area containing these twelve cities.

37.3 The Cross-City Line

When I went to university (in Birmingham, England, a long time ago), I was very excited because I would be travelling to campus by train. My journey was on the Cross-City Line, a metro-type service with lots of stops short distances apart, but run in those days by diesel trains (the electrification came later).

A list of the stations on the line is in http://ritsokiguess.site/datafiles/cross-city.csv. There is one column in the data file, called station. We are going to draw a map of these.

Read in and display (some of) the station names.
In preparation for geocoding, create a second column in the dataframe that consists of the station names with “station UK” on the end. (This is to improve the chances of the geocoder finding the actual railway station.)
Look up the longitudes and latitudes of all the stations, organizing your dataframe so that they are visible.
Make a Leaflet map of the stations. Use circle markers or the “pin” markers as you prefer.
Zoom in to see whether the geocoding did indeed find each of the stations. Comment briefly on what you find.

My solutions follow:

37.4 The brain of a cat, revisited

Find the latitude and longitude of “Catbrain UK” (though I don’t think there are any others).

Solution

Make sure you have these two packages loaded:

library(leaflet)
library(tmaptools)

To find the latitude and longitude of Catbrain:

catbrain <- tibble(place = "Catbrain UK")
catbrain %>% mutate(ll = list(geocode_OSM(place))) %>% 
  unnest_wider(ll) %>% 
  unnest_wider(coords) -> catbrain

catbrain

## # A tibble: 1 x 5
##   place       query           x     y bbox      
##   <chr>       <chr>       <dbl> <dbl> <list>    
## 1 Catbrain UK Catbrain UK -2.61  51.5 <bbox [4]>

Remember that the output from geocode_OSM is a list, and it has in it a thing called coords that is the longitude and latitude together, and another thing called bbox that we don’t use. So we have to unnest twice to get the longitude (as x) and the latitude (as y) out for drawing in a moment.

\(\blacksquare\)

Draw a map of Catbrain using Leaflet.

Solution

That goes this way:

leaflet(data = catbrain) %>% 
  addTiles() %>% 
  addCircleMarkers(lng = ~x, lat = ~y) -> catbrain_map
catbrain_map

There are car dealerships are along Lysander Road. Zoom in to see them. Or zoom out to see where this is. You can keep zooming out until you know where you are, using the plus and minus buttons, or your mouse wheel.

The name Catbrain, according to link, means “rough stony soil”, from Middle English, and has nothing to do with cats or their brains at all.

Extra: I was actually surprised that this worked at all, because with only one point, how does it know what scale to draw the map? Also, unless your UK geography is really good, you won’t have any clue about exactly where this is. That’s the reason for the next part.

\(\blacksquare\)

Make a dataframe containing some other British cities as well as Catbrain, and find their latitudes and longitudes.

Solution

I chose the cities below, mostly somewhere near Catbrain. You could fire up a Google map, zoom it out until it contains something you know, and pick some places you’ve heard of. (I happen to know British geography pretty well, so I just picked some mostly nearby places out of my head. I didn’t really want to pick London, but I figured this was the one you might know.)

catbrain2 <- tribble(
  ~where,
  "Catbrain UK",
  "Bristol UK",
  "Taunton UK",
  "Newport UK",
  "Gloucester UK",
  "Cardiff UK",
  "Birmingham UK",
  "London UK",
  "Caldicot UK"
)
catbrain2 %>%
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(where))) %>% 
  unnest_wider(ll) %>% 
  unnest_wider(coords) -> catbrain2

catbrain2

## # A tibble: 9 x 5
##   where         query              x     y bbox      
##   <chr>         <chr>          <dbl> <dbl> <list>    
## 1 Catbrain UK   Catbrain UK   -2.61   51.5 <bbox [4]>
## 2 Bristol UK    Bristol UK    -2.60   51.5 <bbox [4]>
## 3 Taunton UK    Taunton UK    -3.10   51.0 <bbox [4]>
## 4 Newport UK    Newport UK    -3.00   51.6 <bbox [4]>
## 5 Gloucester UK Gloucester UK -2.25   51.9 <bbox [4]>
## 6 Cardiff UK    Cardiff UK    -3.18   51.5 <bbox [4]>
## 7 Birmingham UK Birmingham UK -1.90   52.5 <bbox [4]>
## 8 London UK     London UK     -0.128  51.5 <bbox [4]>
## 9 Caldicot UK   Caldicot UK   -2.75   51.6 <bbox [4]>

The first time I did this, I forgot the rowwise, which we didn’t need the first time (there was only one place), but here, it causes odd problems if you omit it.

\(\blacksquare\)

Draw a map containing the places you picked.

Solution

The map-drawing is almost the same, just changing the dataframe:

leaflet(data = catbrain2) %>% 
  addTiles() %>% 
  addCircleMarkers(lng = ~x, lat = ~y)

Now, if you have any sense of the geography of the UK, you know where you are. The big river (the Severn) is the border between England and Wales, so the places above and to the left of it are in Wales, including Caldicot (see question about Roman pottery).

You can zoom this map in (once you have figured out which of the circles is Catbrain) and find Lysander Road again, and also the M5 (see below).

More irrelevant extra: the M5 is one of the English “motorways” (like 400-series highways or US Interstates). The M5 goes from Birmingham to Exeter. You can tell that this is England because of the huge number of traffic circles, known there as “roundabouts”. One of the first things they teach you in British driving schools is how to handle roundabouts: which lane to approach them in, which (stick-shift) gear to be in, and when you’re supposed to signal where you’re going. I hope I still remember all that for when I next drive in England!

\(\blacksquare\)

37.5 Making a map of Wisconsin

The file link contains the road distances (in miles) between 12 cities in Wisconsin and neighbouring states. We are going to try to draw a map of the area using Leaflet.

Read in the data, displaying the data that you read in.

Solution

my_url <- "http://ritsokiguess.site/datafiles/wisconsin.txt"
wisc <- read_table(my_url)

## 
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## cols(
##   location = col_character(),
##   Appleton = col_double(),
##   Beloit = col_double(),
##   Fort.Atkinson = col_double(),
##   Madison = col_double(),
##   Marshfield = col_double(),
##   Milwaukee = col_double(),
##   Monroe = col_double(),
##   Superior = col_double(),
##   Wausau = col_double(),
##   Dubuque = col_double(),
##   St.Paul = col_double(),
##   Chicago = col_double()
## )

wisc

## # A tibble: 12 x 13
##    location  Appleton Beloit Fort.Atkinson Madison Marshfield Milwaukee Monroe Superior Wausau Dubuque St.Paul
##    <chr>        <dbl>  <dbl>         <dbl>   <dbl>      <dbl>     <dbl>  <dbl>    <dbl>  <dbl>   <dbl>   <dbl>
##  1 Appleton         0    130            98     102        103       100    149      315     91     196     257
##  2 Beloit         130      0            33      50        185        73     33      377    186      94     304
##  3 Fort.Atk…       98     33             0      36        164        54     58      359    166     119     287
##  4 Madison        102     50            36       0        138        77     47      330    139      95     258
##  5 Marshfie…      103    185           164     138          0       184    170      219     45     186     161
##  6 Milwaukee      100     73            54      77        184         0    107      394    181     168     322
##  7 Monroe         149     33            58      47        170       107      0      362    186      61     289
##  8 Superior       315    377           359     330        219       394    362        0    223     351     162
##  9 Wausau          91    186           166     139         45       181    186      223      0     215     175
## 10 Dubuque        196     94           119      95        186       168     61      351    215       0     274
## 11 St.Paul        257    304           287     258        161       322    289      162    175     274       0
## 12 Chicago        186     97           113     146        276        93    130      467    275     184     395
## # … with 1 more variable: Chicago <dbl>

The first time I did this, I had a blank line on the end of the data file, so I had a blank location and missing values for all the distances for it. I tidied that up before sharing the file with you, though.

\(\blacksquare\)

Make a dataframe containing the names of the locations (get rid of the columns containing distances), and add a column of the abbreviations of the states they are in. All of them are in Wisconsin (WI), except for the last three: Dubuque is in Iowa (IA), St. Paul is in Minnesota (MN) and Chicago is in Illinois (IL).

Solution

There seems to be a bit of base R attached to this, however you do it. I am going to create a dataframe pretending they are all in Wisconsin, and then fix it up afterwards:

wisc %>% 
  select(!where(is.numeric)) %>% 
  mutate(state = "WI") -> wisc
wisc

## # A tibble: 12 x 2
##    location      state
##    <chr>         <chr>
##  1 Appleton      WI   
##  2 Beloit        WI   
##  3 Fort.Atkinson WI   
##  4 Madison       WI   
##  5 Marshfield    WI   
##  6 Milwaukee     WI   
##  7 Monroe        WI   
##  8 Superior      WI   
##  9 Wausau        WI   
## 10 Dubuque       WI   
## 11 St.Paul       WI   
## 12 Chicago       WI

(I checked that in this question I didn’t need the road distances for anything, so I saved it back into the original dataframe. Also, the select is unnecessarily fancy: I could have just selected the location column, but this one says “don’t select any of the columns that are numeric”.)

Now we have to fix up the states of the last three places, which is where the base R seems to come in (but see the Extra):

wisc$state[10] <- "IA"
wisc$state[11] <- "MN"
wisc$state[12] <- "IL"
wisc

## # A tibble: 12 x 2
##    location      state
##    <chr>         <chr>
##  1 Appleton      WI   
##  2 Beloit        WI   
##  3 Fort.Atkinson WI   
##  4 Madison       WI   
##  5 Marshfield    WI   
##  6 Milwaukee     WI   
##  7 Monroe        WI   
##  8 Superior      WI   
##  9 Wausau        WI   
## 10 Dubuque       IA   
## 11 St.Paul       MN   
## 12 Chicago       IL

The states of the last three locations are now correct.

Extra: I didn’t know about the following until literally just now, but there is a tidyverse way to do this as well (that may look familiar to those of you that know about SQL). Let’s start by pretending again that everything is in Wisconsin:

wisc %>% 
  mutate(state = "WI") -> wisc2
wisc2

## # A tibble: 12 x 2
##    location      state
##    <chr>         <chr>
##  1 Appleton      WI   
##  2 Beloit        WI   
##  3 Fort.Atkinson WI   
##  4 Madison       WI   
##  5 Marshfield    WI   
##  6 Milwaukee     WI   
##  7 Monroe        WI   
##  8 Superior      WI   
##  9 Wausau        WI   
## 10 Dubuque       WI   
## 11 St.Paul       WI   
## 12 Chicago       WI

and then change the ones that need changing. What you do is to make a little dataframe of the ones that need changing:

changes <- tribble(
  ~location, ~state,
  "Dubuque", "IA",
  "St.Paul", "MN",
  "Chicago", "IL"
)
changes

## # A tibble: 3 x 2
##   location state
##   <chr>    <chr>
## 1 Dubuque  IA   
## 2 St.Paul  MN   
## 3 Chicago  IL

Note that the columns in here have exactly the same names as the ones in the original dataframe where everything was in Wisconsin.

I did this by copy-pasting the city names whose states needed changing out of the display of my wisc2. You might think that a left join is what we need now, and it almost is. Note that I want to match the locations but not the states:

wisc2 %>% left_join(changes, by = "location")

## # A tibble: 12 x 3
##    location      state.x state.y
##    <chr>         <chr>   <chr>  
##  1 Appleton      WI      <NA>   
##  2 Beloit        WI      <NA>   
##  3 Fort.Atkinson WI      <NA>   
##  4 Madison       WI      <NA>   
##  5 Marshfield    WI      <NA>   
##  6 Milwaukee     WI      <NA>   
##  7 Monroe        WI      <NA>   
##  8 Superior      WI      <NA>   
##  9 Wausau        WI      <NA>   
## 10 Dubuque       WI      IA     
## 11 St.Paul       WI      MN     
## 12 Chicago       WI      IL

and the story here is that if state.y has a value, use that, otherwise use the value in state.x. This can even be done: there is a function coalesce³⁶ that will do exactly that:

wisc2 %>% left_join(changes, by = "location") %>% 
  mutate(state=coalesce(state.y, state.x))

## # A tibble: 12 x 4
##    location      state.x state.y state
##    <chr>         <chr>   <chr>   <chr>
##  1 Appleton      WI      <NA>    WI   
##  2 Beloit        WI      <NA>    WI   
##  3 Fort.Atkinson WI      <NA>    WI   
##  4 Madison       WI      <NA>    WI   
##  5 Marshfield    WI      <NA>    WI   
##  6 Milwaukee     WI      <NA>    WI   
##  7 Monroe        WI      <NA>    WI   
##  8 Superior      WI      <NA>    WI   
##  9 Wausau        WI      <NA>    WI   
## 10 Dubuque       WI      IA      IA   
## 11 St.Paul       WI      MN      MN   
## 12 Chicago       WI      IL      IL

The idea behind coalesce is that you give it a list of columns, and the first one of those that is not missing gives its value to the new column. So I feed it state.y first, and then state.x, and the new state contains the right things. (Can you explain what happens if you do it the other way around?)

But, there is a better way. Let’s go back to our all-Wisconsin dataframe:

wisc2

## # A tibble: 12 x 2
##    location      state
##    <chr>         <chr>
##  1 Appleton      WI   
##  2 Beloit        WI   
##  3 Fort.Atkinson WI   
##  4 Madison       WI   
##  5 Marshfield    WI   
##  6 Milwaukee     WI   
##  7 Monroe        WI   
##  8 Superior      WI   
##  9 Wausau        WI   
## 10 Dubuque       WI   
## 11 St.Paul       WI   
## 12 Chicago       WI

and our dataframe of corrections to make:

changes

## # A tibble: 3 x 2
##   location state
##   <chr>    <chr>
## 1 Dubuque  IA   
## 2 St.Paul  MN   
## 3 Chicago  IL

We can make those changes in one step, thus:

wisc2 %>% 
  rows_update(changes) -> wisc

## Matching, by = "location"

wisc

## # A tibble: 12 x 2
##    location      state
##    <chr>         <chr>
##  1 Appleton      WI   
##  2 Beloit        WI   
##  3 Fort.Atkinson WI   
##  4 Madison       WI   
##  5 Marshfield    WI   
##  6 Milwaukee     WI   
##  7 Monroe        WI   
##  8 Superior      WI   
##  9 Wausau        WI   
## 10 Dubuque       IA   
## 11 St.Paul       MN   
## 12 Chicago       IL

This works because the first column of changes, namely location, is the one that is looked up in wisc2. (rows_update has a by in the same way that left_join does if you want to change this.) So all three locations in changes are looked up in wisc2, and any that match have their state changed to the one in changes.

In database terms, the location is known as a “key” column. That means that each city appears once only in the column, and so the replacements in wisc are only happening once. To a statistician, location is an “identifier variable”, identifying the individuals in the dataset. Unless you are doing something like repeated measures, each individual will only give you one measurement, but even then, if you have wide format, the identifier variables will still only appear once.

Magic. Now that I have learned about this, I will be using it a lot.

\(\blacksquare\)

Create a new column in which the abbreviation for the state is glued on to the end of each location, separated by a space.

Solution

A couple of ways: a literal gluing, using paste:

wisc %>% 
  mutate(lookup = paste(location, state))

## # A tibble: 12 x 3
##    location      state lookup          
##    <chr>         <chr> <chr>           
##  1 Appleton      WI    Appleton WI     
##  2 Beloit        WI    Beloit WI       
##  3 Fort.Atkinson WI    Fort.Atkinson WI
##  4 Madison       WI    Madison WI      
##  5 Marshfield    WI    Marshfield WI   
##  6 Milwaukee     WI    Milwaukee WI    
##  7 Monroe        WI    Monroe WI       
##  8 Superior      WI    Superior WI     
##  9 Wausau        WI    Wausau WI       
## 10 Dubuque       IA    Dubuque IA      
## 11 St.Paul       MN    St.Paul MN      
## 12 Chicago       IL    Chicago IL

or the same idea using str_c from stringr (part of the tidyverse), only this time you have to supply the space yourself:

wisc %>% 
  mutate(lookup = str_c(location, " ", state))

## # A tibble: 12 x 3
##    location      state lookup          
##    <chr>         <chr> <chr>           
##  1 Appleton      WI    Appleton WI     
##  2 Beloit        WI    Beloit WI       
##  3 Fort.Atkinson WI    Fort.Atkinson WI
##  4 Madison       WI    Madison WI      
##  5 Marshfield    WI    Marshfield WI   
##  6 Milwaukee     WI    Milwaukee WI    
##  7 Monroe        WI    Monroe WI       
##  8 Superior      WI    Superior WI     
##  9 Wausau        WI    Wausau WI       
## 10 Dubuque       IA    Dubuque IA      
## 11 St.Paul       MN    St.Paul MN      
## 12 Chicago       IL    Chicago IL

or a way you might have forgotten, using unite (which is the inverse of separate):

wisc %>% 
  unite(lookup, c(location, state), sep = " ") -> wisc
wisc

## # A tibble: 12 x 1
##    lookup          
##    <chr>           
##  1 Appleton WI     
##  2 Beloit WI       
##  3 Fort.Atkinson WI
##  4 Madison WI      
##  5 Marshfield WI   
##  6 Milwaukee WI    
##  7 Monroe WI       
##  8 Superior WI     
##  9 Wausau WI       
## 10 Dubuque IA      
## 11 St.Paul MN      
## 12 Chicago IL

This last one is my favourite, because it gets rid of the two constituent columns location and state that we don’t need any more. The syntax is the name of the new column, a vector of columns to unite together, and (optionally) what to separate the joined-together values with. The default for that is an underscore, but here we want a space because that’s what the geocoder (coming up) expects.

\(\blacksquare\)

Look up the latitudes and longitudes of these twelve places.

Solution

This is geocoding, with the disentangling afterwards that is (I hope) becoming familiar:

wisc %>% 
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(lookup))) %>% 
  unnest_wider(ll) %>% 
  unnest_wider(coords) -> wisc
wisc

## # A tibble: 12 x 5
##    lookup           query                x     y bbox      
##    <chr>            <chr>            <dbl> <dbl> <list>    
##  1 Appleton WI      Appleton WI      -88.4  44.3 <bbox [4]>
##  2 Beloit WI        Beloit WI        -89.0  42.5 <bbox [4]>
##  3 Fort.Atkinson WI Fort.Atkinson WI -88.8  42.9 <bbox [4]>
##  4 Madison WI       Madison WI       -89.4  43.1 <bbox [4]>
##  5 Marshfield WI    Marshfield WI    -90.2  44.7 <bbox [4]>
##  6 Milwaukee WI     Milwaukee WI     -87.9  43.0 <bbox [4]>
##  7 Monroe WI        Monroe WI        -89.6  42.6 <bbox [4]>
##  8 Superior WI      Superior WI      -92.1  46.6 <bbox [4]>
##  9 Wausau WI        Wausau WI        -89.6  45.0 <bbox [4]>
## 10 Dubuque IA       Dubuque IA       -90.7  42.5 <bbox [4]>
## 11 St.Paul MN       St.Paul MN       -93.1  45.0 <bbox [4]>
## 12 Chicago IL       Chicago IL       -87.6  41.9 <bbox [4]>

Yes, I forgot the rowwise as well the first time.

\(\blacksquare\)

Obtain a Leaflet map of the area containing these twelve cities.

Solution

The usual:

leaflet(data = wisc) %>% 
  addTiles() %>% 
  addCircleMarkers(lng = ~x, lat = ~y) -> locs
locs

The nice thing about this map is that you can play with it: zoom it (using the plus/minus on the map or your mouse wheel), or move it around by clicking and dragging. To identify the cities: well, the big ones are obvious, and you can zoom in to identify the others. (You have to zoom in quite a long way to find Monroe, and the geocoder seems to have found its airport, which is not actually in the city.)

I like identifying the cities with circles, but there are other possibilities, such as “icon markers” that look like Google map pins:

leaflet(data = wisc) %>% 
  addTiles() %>% 
  addMarkers(lng = ~x, lat = ~y) -> locs
locs

You can even attach text to the markers that appears when you click on them, but that’s farther than I would go here.

\(\blacksquare\)

37.6 The Cross-City Line

A list of the stations on the line is in http://ritsokiguess.site/datafiles/cross-city.csv. There is one column in the data file, called station. We are going to draw a map of these.

Read in and display (some of) the station names.

Solution

Nothing terribly unexpected here:

stations <- read_csv(my_url)

## 
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## cols(
##   station = col_character()
## )

stations

## # A tibble: 24 x 1
##    station              
##    <chr>                
##  1 Redditch             
##  2 Alvechurch           
##  3 Barnt Green          
##  4 Longbridge           
##  5 Northfield           
##  6 King's Norton        
##  7 Bournville           
##  8 Selly Oak            
##  9 Birmingham University
## 10 Five Ways            
## # … with 14 more rows

\(\blacksquare\)

In preparation for geocoding, create a second column in the dataframe that consists of the station names with “station UK” on the end. (This is to improve the chances of the geocoder finding the actual railway station.)

Solution

I wrote this back into the original dataframe. Create a new one if you prefer:

stations %>% 
  mutate(longname = str_c(station, " station UK")) -> stations
stations

## # A tibble: 24 x 2
##    station               longname                        
##    <chr>                 <chr>                           
##  1 Redditch              Redditch station UK             
##  2 Alvechurch            Alvechurch station UK           
##  3 Barnt Green           Barnt Green station UK          
##  4 Longbridge            Longbridge station UK           
##  5 Northfield            Northfield station UK           
##  6 King's Norton         King's Norton station UK        
##  7 Bournville            Bournville station UK           
##  8 Selly Oak             Selly Oak station UK            
##  9 Birmingham University Birmingham University station UK
## 10 Five Ways             Five Ways station UK            
## # … with 14 more rows

\(\blacksquare\)

Look up the longitudes and latitudes of all the stations, organizing your dataframe so that they are visible.

Solution

Two steps: the first is to do the geocoding, and the second is to disentangle what comes back.

First, then:

stations %>% 
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(longname))) -> stations
stations

## # A tibble: 24 x 3
## # Rowwise: 
##    station               longname                         ll              
##    <chr>                 <chr>                            <list>          
##  1 Redditch              Redditch station UK              <named list [3]>
##  2 Alvechurch            Alvechurch station UK            <named list [3]>
##  3 Barnt Green           Barnt Green station UK           <named list [3]>
##  4 Longbridge            Longbridge station UK            <named list [3]>
##  5 Northfield            Northfield station UK            <named list [3]>
##  6 King's Norton         King's Norton station UK         <named list [3]>
##  7 Bournville            Bournville station UK            <named list [3]>
##  8 Selly Oak             Selly Oak station UK             <named list [3]>
##  9 Birmingham University Birmingham University station UK <named list [3]>
## 10 Five Ways             Five Ways station UK             <named list [3]>
## # … with 14 more rows

The longitudes and latitudes are hidden in the list-column that I called ll, so the second step is to get them out:

stations %>% unnest_wider(ll) %>% 
  unnest_wider(coords) -> stations
stations

## # A tibble: 24 x 6
##    station               longname                        query                               x     y bbox     
##    <chr>                 <chr>                           <chr>                           <dbl> <dbl> <list>   
##  1 Redditch              Redditch station UK             Redditch station UK             -1.94  52.3 <bbox [4…
##  2 Alvechurch            Alvechurch station UK           Alvechurch station UK           -1.97  52.3 <bbox [4…
##  3 Barnt Green           Barnt Green station UK          Barnt Green station UK          -1.99  52.4 <bbox [4…
##  4 Longbridge            Longbridge station UK           Longbridge station UK           -1.98  52.4 <bbox [4…
##  5 Northfield            Northfield station UK           Northfield station UK           -1.97  52.4 <bbox [4…
##  6 King's Norton         King's Norton station UK        King's Norton station UK        -1.93  52.4 <bbox [4…
##  7 Bournville            Bournville station UK           Bournville station UK           -1.94  52.4 <bbox [4…
##  8 Selly Oak             Selly Oak station UK            Selly Oak station UK            -1.94  52.4 <bbox [4…
##  9 Birmingham University Birmingham University station … Birmingham University station … -1.93  52.5 <bbox [4…
## 10 Five Ways             Five Ways station UK            Five Ways station UK            -1.91  52.5 <bbox [4…
## # … with 14 more rows

The two unnest_widers are because the longitudes and latitudes are hidden inside a thing called coords which is itself nested within ll. Do the first unnest_wider, and see what you have. This will tell you that you need to do another one.

The values seem reasonable; this is the UK, most of which is slightly west of the Greenwich meridian, and the latitudes look sensible given that the UK is north of southern Ontario.

\(\blacksquare\)

Make a Leaflet map of the stations. Use circle markers or the “pin” markers as you prefer.

Solution

I used the pin markers (with addMarkers), though addCircleMarkers is as good. The code for drawing the map is always the same; the work here is in setting up the geocoding:

leaflet(data = stations) %>% 
  addTiles() %>% 
  addMarkers(lng = ~x, lat = ~y)

This seems to extend across the city of Birmingham, as it should. There are quite a lot of stations, so the pins overlap each other. Zoom in to see them in a bit more detail, or zoom out to orient yourself if your UK geography needs some work.

\(\blacksquare\)

Zoom in to see whether the geocoding did indeed find each of the stations. Comment briefly on what you find.

Solution

The stations go south to north, so the most southerly one you find should be Redditch and the most northerly is Lichfield Trent Valley.

If you zoom in enough, you’ll see where the railway line goes (a grey line). The points seem to be mainly close to it. But if you zoom in a bit more, some of the pins are right on the railway (such as Alvechurch), and some of them, like Redditch and Barnt Green, are a bit off, because the geocoder found the centre of the place rather than its railway station. This continues as you go north; Northfield and King’s Norton are right where they should be, but Bournville is not (Bournville station is about halfway between where you see Bournville and Stirchley on the map.) Likewise, Gravelly Hill station is right where it should be, but Aston is not.³⁷

Extra: geocode_OSM uses a free geocoder called Nominatim. This has some options. The defaults are to return only the first “hit”, and to return just the coordinates and the “bounding box”. These can be changed. Let’s see what we can find for Aston:

tibble(where = "Aston UK") %>% 
  mutate(info = list(geocode_OSM(where, return.first.only = FALSE,
                            details = TRUE))) -> d
d

## # A tibble: 1 x 2
##   where    info       
##   <chr>    <list>     
## 1 Aston UK <list [10]>

There are now 10 things returned. Let’s unnest this and see what we have:

d %>% unnest(info) %>% 
  unnest_wider(info)

## # A tibble: 10 x 13
##    where  query  coords bbox  place_id osm_type osm_id place_rank display_name   class type  importance icon  
##    <chr>  <chr>  <list> <lis> <chr>    <chr>    <chr>  <chr>      <chr>          <chr> <chr> <chr>      <chr> 
##  1 Aston… Aston… <dbl … <bbo… 9814552  node     96663… 18         Aston, Birmin… place town  0.5141099… https…
##  2 Aston… Aston… <dbl … <bbo… 65583750 node     59202… 30         Aston, Lovers… rail… stat… 0.4415759… https…
##  3 Aston… Aston… <dbl … <bbo… 3972412  node     48708… 19         Aston, Flints… place vill… 0.4412913… https…
##  4 Aston… Aston… <dbl … <bbo… 126341   node     26127… 19         Aston, West O… place vill… 0.4268930… https…
##  5 Aston… Aston… <dbl … <bbo… 118240   node     23777… 19         Aston, East H… place vill… 0.385      https…
##  6 Aston… Aston… <dbl … <bbo… 17080467 node     17098… 19         Aston, Cheshi… place vill… 0.385      https…
##  7 Aston… Aston… <dbl … <bbo… 2586055… relation 35941… 20         Aston, East H… boun… admi… 0.36       https…
##  8 Aston… Aston… <dbl … <bbo… 2580670… relation 14404… 20         Aston, Cheshi… boun… admi… 0.36       https…
##  9 Aston… Aston… <dbl … <bbo… 2427860  node     35624… 20         Aston, Claver… place haml… 0.36       https…
## 10 Aston… Aston… <dbl … <bbo… 5573721  node     59965… 20         Aston, Maer, … place haml… 0.36       https…

There are 10 locations it found matching “Aston UK”, and for each of those there is the information you see, a total of 12 columns’ worth in addition to the name of the place we are looking up. Perhaps the most interesting are the columns class and type near the end:

d %>% unnest(info) %>% 
  unnest_wider(info) %>% 
  select(where, class, type)

## # A tibble: 10 x 3
##    where    class    type          
##    <chr>    <chr>    <chr>         
##  1 Aston UK place    town          
##  2 Aston UK railway  station       
##  3 Aston UK place    village       
##  4 Aston UK place    village       
##  5 Aston UK place    village       
##  6 Aston UK place    village       
##  7 Aston UK boundary administrative
##  8 Aston UK boundary administrative
##  9 Aston UK place    hamlet        
## 10 Aston UK place    hamlet

Oh look, the second one is the station.

This makes me think that with sufficient patience we could do this for all our places, and pick out the one that is the station:

stations <- read_csv(my_url)

## 
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## cols(
##   station = col_character()
## )

stations %>% 
  mutate(longname = str_c(station, " UK")) %>% 
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(longname, 
                   return.first.only = FALSE,
                   details = TRUE))) -> stations

stations %>% unnest(ll) %>% 
  unnest_wider(ll) %>% 
  select(station, coords, class, type) %>% 
  filter(class == "railway", type == "station") %>% 
  unnest_wider(coords) -> d

## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1

## # A tibble: 22 x 5
##    station                   x     y class   type   
##    <chr>                 <dbl> <dbl> <chr>   <chr>  
##  1 Redditch              -1.95  52.3 railway station
##  2 Alvechurch            -1.97  52.3 railway station
##  3 Barnt Green           -1.99  52.4 railway station
##  4 Longbridge            -1.98  52.4 railway station
##  5 Northfield            -1.97  52.4 railway station
##  6 King's Norton         -1.93  52.4 railway station
##  7 Bournville            -1.93  52.4 railway station
##  8 Selly Oak             -1.94  52.4 railway station
##  9 Five Ways             -1.91  52.5 railway station
## 10 Birmingham New Street -1.90  52.5 railway station
## # … with 12 more rows

Almost. We’re missing University and Lichfield City stations, but it looks as if we got the rest:

leaflet(data = d) %>% 
  addTiles() %>% 
  addMarkers(lng = ~x, lat = ~y)

If you zoom in, you’ll see that the ones we got are all the actual stations, and not the area from which the station takes its name.

\(\blacksquare\)

I knew this existed, but I couldn’t remember what it was called, which made it hard to search for. My first port of call was na_if, which converts values to NA, the opposite of what I wanted. But from its See Also I got na_replace, and from the See Also of that, I found out what coalesce does.↩
If you’re a soccer fan, this Aston is what Aston Villa is named after. See if you can find the team’s stadium Villa Park on your map.↩