38  Drawing maps with Leaflet

Packages for this chapter:

library(ggbiplot)
library(tidyverse)
library(ggrepel)

38.1 The Cross-City Line

When I went to university (in Birmingham, England, a long time ago), I was very excited because I would be travelling to campus by train. My journey was on the Cross-City Line, a metro-type service with lots of stops short distances apart, but run in those days by diesel trains (the electrification came later).

A list of the stations on the line is in http://ritsokiguess.site/datafiles/cross-city.csv. There is one column in the data file, called station. We are going to draw a map of these.

  1. Read in and display (some of) the station names.

  2. In preparation for geocoding, create a second column in the dataframe that consists of the station names with “station UK” on the end. (This is to improve the chances of the geocoder finding the actual railway station.)

  3. Look up the longitudes and latitudes of all the stations, organizing your dataframe so that they are visible.

  4. Make a Leaflet map of the stations. Use circle markers or the “pin” markers as you prefer.

  5. Zoom in to see whether the geocoding did indeed find each of the stations. Comment briefly on what you find.

38.2 Making a map of Wisconsin

The file link contains the road distances (in miles) between 12 cities in Wisconsin and neighbouring states. We are going to try to draw a map of the area using Leaflet.

  1. Read in the data, displaying the data that you read in.

  2. Make a dataframe containing the names of the locations (get rid of the columns containing distances), and add a column of the abbreviations of the states they are in. All of them are in Wisconsin (WI), except for the last three: Dubuque is in Iowa (IA), St. Paul is in Minnesota (MN) and Chicago is in Illinois (IL).

  3. Create a new column in which the abbreviation for the state is glued on to the end of each location, separated by a space.

  4. Look up the latitudes and longitudes of these twelve places.

  5. Obtain a Leaflet map of the area containing these twelve cities.

38.3 The brain of a cat, revisited

Earlier, we looked at an ethics study that had to do with a fictional brain of a fictional cat. I said there was actually a town called Catbrain. It’s in England, near Bristol, and seems to be home to a street of car dealerships.

  1. Find the latitude and longitude of “Catbrain UK” (though I don’t think there are any others).

  2. Draw a map of Catbrain using Leaflet.

  3. Make a dataframe containing some other British cities as well as Catbrain, and find their latitudes and longitudes.

  4. Draw a map containing the places you picked.

My solutions follow:

38.4 The Cross-City Line

When I went to university (in Birmingham, England, a long time ago), I was very excited because I would be travelling to campus by train. My journey was on the Cross-City Line, a metro-type service with lots of stops short distances apart, but run in those days by diesel trains (the electrification came later).

A list of the stations on the line is in http://ritsokiguess.site/datafiles/cross-city.csv. There is one column in the data file, called station. We are going to draw a map of these.

  1. Read in and display (some of) the station names.

Solution

Nothing terribly unexpected here:

stations <- read_csv(my_url)
Rows: 24 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): station

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stations

\(\blacksquare\)

  1. In preparation for geocoding, create a second column in the dataframe that consists of the station names with “station UK” on the end. (This is to improve the chances of the geocoder finding the actual railway station.)

Solution

I wrote this back into the original dataframe. Create a new one if you prefer:

stations %>% 
  mutate(longname = str_c(station, " station UK")) -> stations
stations

\(\blacksquare\)

  1. Look up the longitudes and latitudes of all the stations, organizing your dataframe so that they are visible.

Solution

Two steps: the first is to do the geocoding, and the second is to disentangle what comes back.

First, then:

stations %>% 
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(longname))) -> stations
No results found for "King's Norton station UK".
No results found for "Butler's Lane station UK".
stations

The longitudes and latitudes are hidden in the list-column that I called ll, so the second step is to get them out:

stations %>% unnest_wider(ll) %>% 
  unnest_wider(coords) -> stations
stations

The two unnest_widers are because the longitudes and latitudes are hidden inside a thing called coords which is itself nested within ll. Do the first unnest_wider, and see what you have. This will tell you that you need to do another one.

The values seem mostly reasonable; this is the UK, most of which is slightly west of the Greenwich meridian, and the latitudes look sensible given that the UK is north of southern Ontario. You might find that some of the stations were not found at all (which will give you warnings the rest of the way), and you might find that something was found for a station but the wrong thing, which will show up on your map as a place way off the trend of the other places.

\(\blacksquare\)

  1. Make a Leaflet map of the stations. Use circle markers or the “pin” markers as you prefer.

Solution

I used the pin markers (with addMarkers), though addCircleMarkers is as good. The code for drawing the map is always the same; the work here is in setting up the geocoding:

leaflet(data = stations) %>% 
  addTiles() %>% 
  addMarkers(lng = ~x, lat = ~y)
Warning in validateCoords(lng, lat, funcName): Data contains 2 rows with either
missing or invalid lat/lon values and will be ignored

You might find that some of the markers are off a straight line through Birmingham. In these cases, the geocoder found a different place with (apparently) the same name, or a name that was close enough to similar. This seems to be different each time you run it, so you can try again.

This (mostly) seems to extend across the city of Birmingham, as it should. There are quite a lot of stations, so the pins overlap each other. Zoom in to see them in a bit more detail, or zoom out to orient yourself if your UK geography needs some work.

\(\blacksquare\)

  1. Zoom in to see whether the geocoding did indeed find each of the stations. Comment briefly on what you find.

Solution

The stations go south to north, so the most southerly one you find should be Redditch and the most northerly is Lichfield Trent Valley.

If you zoom in enough, you’ll see where the railway line goes (a grey line). The points seem to be mainly close to it. But if you zoom in a bit more, some of the pins are right on the railway, and some of them are a bit off, because the geocoder found the centre of the place (or something else named after the place) rather than its railway station. For example, when I did it, Gravelly Hill station was right where it should be, but Aston was not.1

Extra: geocode_OSM uses a free geocoder called Nominatim. This has some options. The defaults are to return only the first “hit”, and to return just the coordinates and the “bounding box”. These can be changed. Let’s see what we can find for Aston:

tibble(where = "Aston UK") %>% 
  mutate(info = list(geocode_OSM(where, return.first.only = FALSE,
                            details = TRUE))) -> d
d         

There are now 10 things returned. Let’s unnest this and see what we have:

d %>% unnest(info) %>% 
  unnest_wider(info)

There are 10 locations it found matching “Aston UK”, and for each of those there is the information you see, a total of 12 columns’ worth in addition to the name of the place we are looking up. Perhaps the most interesting are the columns class and type near the end (keep scrolling right):

d %>% unnest(info) %>% 
  unnest_wider(info) %>% 
  select(where, class, type)

You can see which one is the railway station.

This makes me think that with sufficient patience we could do this for all our places, and pick out the one that is the station:

stations <- read_csv(my_url)
Rows: 24 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): station

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
stations %>% 
  mutate(longname = str_c(station, " UK")) %>% 
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(longname, 
                   return.first.only = FALSE,
                   details = TRUE))) -> stations2
stations2

The number in the ll column tells you how many things the geocoder found that match the input longname. One of each of these is, we hope, a railway station.

stations2 %>% unnest(ll) %>% 
  unnest_wider(ll, names_sep = "_") %>% 
  select(station, ll_coords, ll_class, ll_type) %>% 
  filter(ll_class == "railway", ll_type == "station") %>% 
  unnest_wider(ll_coords) -> d
d

If you want to see how this works, run it one line at a time.

We almost got there. We’re missing University2 and Lichfield City stations, but it looks as if we got the rest:

leaflet(data = d) %>% 
  addTiles() %>% 
  addMarkers(lng = ~x, lat = ~y)

If you zoom in, you’ll see that the ones we got are all the actual stations, and not the area from which the station takes its name.

\(\blacksquare\)

38.5 Making a map of Wisconsin

The file link contains the road distances (in miles) between 12 cities in Wisconsin and neighbouring states. We are going to try to draw a map of the area using Leaflet.

  1. Read in the data, displaying the data that you read in.

Solution

my_url <- "http://ritsokiguess.site/datafiles/wisconsin.txt"
wisc <- read_table(my_url)

── Column specification ────────────────────────────────────────────────────────
cols(
  location = col_character(),
  Appleton = col_double(),
  Beloit = col_double(),
  Fort.Atkinson = col_double(),
  Madison = col_double(),
  Marshfield = col_double(),
  Milwaukee = col_double(),
  Monroe = col_double(),
  Superior = col_double(),
  Wausau = col_double(),
  Dubuque = col_double(),
  St.Paul = col_double(),
  Chicago = col_double()
)
wisc

The first time I did this, I had a blank line on the end of the data file, so I had a blank location and missing values for all the distances for it. I tidied that up before sharing the file with you, though.

\(\blacksquare\)

  1. Make a dataframe containing the names of the locations (get rid of the columns containing distances), and add a column of the abbreviations of the states they are in. All of them are in Wisconsin (WI), except for the last three: Dubuque is in Iowa (IA), St. Paul is in Minnesota (MN) and Chicago is in Illinois (IL).

Solution

There seems to be a bit of base R attached to this, however you do it. I am going to create a dataframe pretending they are all in Wisconsin, and then fix it up afterwards:

wisc %>% 
  select(!where(is.numeric)) %>% 
  mutate(state = "WI") -> wisc
wisc

(I checked that in this question I didn’t need the road distances for anything, so I saved it back into the original dataframe. Also, the select is unnecessarily fancy: I could have just selected the location column, but this one says “don’t select any of the columns that are numeric”.)

Now we have to fix up the states of the last three places, which is where the base R seems to come in (but see the Extra):

wisc$state[10] <- "IA"
wisc$state[11] <- "MN"
wisc$state[12] <- "IL"
wisc

The states of the last three locations are now correct.

Extra: I didn’t know about the following until literally just now, but there is a tidyverse way to do this as well (that may look familiar to those of you that know about SQL). Let’s start by pretending again that everything is in Wisconsin:

wisc %>% 
  mutate(state = "WI") -> wisc2
wisc2

and then change the ones that need changing. What you do is to make a little dataframe of the ones that need changing:

changes <- tribble(
  ~location, ~state,
  "Dubuque", "IA",
  "St.Paul", "MN",
  "Chicago", "IL"
)
changes

Note that the columns in here have exactly the same names as the ones in the original dataframe where everything was in Wisconsin.

I did this by copy-pasting the city names whose states needed changing out of the display of my wisc2. You might think that a left join is what we need now, and it almost is. Note that I want to match the locations but not the states:

wisc2 %>% left_join(changes, join_by(location))

and the story here is that if state.y has a value, use that, otherwise use the value in state.x. This can even be done: there is a function coalesce3 that will do exactly that:

wisc2 %>% left_join(changes, join_by(location)) %>% 
  mutate(state=coalesce(state.y, state.x))

The idea behind coalesce is that you give it a list of columns, and the first one of those that is not missing gives its value to the new column. So I feed it state.y first, and then state.x, and the new state contains the right things. (Can you explain what happens if you do it the other way around?)

But, there is a better way. Let’s go back to our all-Wisconsin dataframe:

wisc2

and our dataframe of corrections to make:

changes

We can make those changes in one step, thus:

wisc2 %>% 
  rows_update(changes) -> wisc
Matching, by = "location"
wisc

This works because the first column of changes, namely location, is the one that is looked up in wisc2. (rows_update has a by in the same way that left_join does if you want to change this.) So all three locations in changes are looked up in wisc2, and any that match have their state changed to the one in changes.

In database terms, the location is known as a “key” column. That means that each city appears once only in the column, and so the replacements in wisc are only happening once. To a statistician, location is an “identifier variable”, identifying the individuals in the dataset. Unless you are doing something like repeated measures, each individual will only give you one measurement, but even then, if you have wide format, the identifier variables will still only appear once.

Magic. Now that I have learned about this, I will be using it a lot.

\(\blacksquare\)

  1. Create a new column in which the abbreviation for the state is glued on to the end of each location, separated by a space.

Solution

A couple of ways: a literal gluing, using paste:

wisc %>% 
  mutate(lookup = paste(location, state))

or the same idea using str_c from stringr (part of the tidyverse), only this time you have to supply the space yourself:

wisc %>% 
  mutate(lookup = str_c(location, " ", state))

or a way you might have forgotten, using unite (which is the inverse of separate_wider):

wisc %>% 
  unite(lookup, c(location, state), sep = " ") -> wisc
wisc

This last one is my favourite, because it gets rid of the two constituent columns location and state that we don’t need any more. The syntax is the name of the new column, a vector of columns to unite together, and (optionally) what to separate the joined-together values with. The default for that is an underscore, but here we want a space because that’s what the geocoder (coming up) expects.

\(\blacksquare\)

  1. Look up the latitudes and longitudes of these twelve places.

Solution

This is geocoding, with the disentangling afterwards that is (I hope) becoming familiar:

wisc %>% 
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(lookup))) %>% 
  unnest_wider(ll) %>% 
  unnest_wider(coords) -> wisc
wisc
# A tibble: 12 × 5
   lookup           query                x     y bbox      
   <chr>            <chr>            <dbl> <dbl> <list>    
 1 Appleton WI      Appleton WI      -88.4  44.3 <bbox [4]>
 2 Beloit WI        Beloit WI        -89.0  42.5 <bbox [4]>
 3 Fort.Atkinson WI Fort.Atkinson WI -88.8  42.9 <bbox [4]>
 4 Madison WI       Madison WI       -89.4  43.1 <bbox [4]>
 5 Marshfield WI    Marshfield WI    -90.2  44.7 <bbox [4]>
 6 Milwaukee WI     Milwaukee WI     -87.9  43.0 <bbox [4]>
 7 Monroe WI        Monroe WI        -90.6  43.9 <bbox [4]>
 8 Superior WI      Superior WI      -92.1  46.7 <bbox [4]>
 9 Wausau WI        Wausau WI        -89.6  45.0 <bbox [4]>
10 Dubuque IA       Dubuque IA       -90.7  42.5 <bbox [4]>
11 St.Paul MN       St.Paul MN       -93.1  44.9 <bbox [4]>
12 Chicago IL       Chicago IL       -87.6  41.9 <bbox [4]>

Yes, I forgot the rowwise as well the first time.

\(\blacksquare\)

  1. Obtain a Leaflet map of the area containing these twelve cities.

Solution

The usual:

leaflet(data = wisc) %>% 
  addTiles() %>% 
  addCircleMarkers(lng = ~x, lat = ~y) -> locs
locs

The nice thing about this map is that you can play with it: zoom it (using the plus/minus on the map or your mouse wheel), or move it around by clicking and dragging. To identify the cities: well, the big ones are obvious, and you can zoom in to identify the others. (You have to zoom in quite a long way to find Monroe, and the geocoder seems to have found its airport, which is not actually in the city.)

I like identifying the cities with circles, but there are other possibilities, such as “icon markers” that look like Google map pins:

leaflet(data = wisc) %>% 
  addTiles() %>% 
  addMarkers(lng = ~x, lat = ~y) -> locs
locs

You can even attach text to the markers that appears when you click on them, but that’s farther than I would go here.

\(\blacksquare\)

38.6 The brain of a cat, revisited

Earlier, we looked at an ethics study that had to do with a fictional brain of a fictional cat. I said there was actually a town called Catbrain. It’s in England, near Bristol, and seems to be home to a street of car dealerships.

  1. Find the latitude and longitude of “Catbrain UK” (though I don’t think there are any others).

Solution

Make sure you have these two packages loaded:

library(leaflet)
library(tmaptools)

To find the latitude and longitude of Catbrain:

catbrain <- tibble(place = "Catbrain UK")
catbrain %>% mutate(ll = list(geocode_OSM(place))) %>% 
  unnest_wider(ll) %>% 
  unnest_wider(coords) -> catbrain
catbrain

Remember that the output from geocode_OSM is a list, and it has in it a thing called coords that is the longitude and latitude together, and another thing called bbox that we don’t use. So we have to unnest twice to get the longitude (as x) and the latitude (as y) out for drawing in a moment.

\(\blacksquare\)

  1. Draw a map of Catbrain using Leaflet.

Solution

That goes this way:

leaflet(data = catbrain) %>% 
  addTiles() %>% 
  addCircleMarkers(lng = ~x, lat = ~y) -> catbrain_map
catbrain_map

There are car dealerships are along Lysander Road. Zoom in to see them. Or zoom out to see where this is. You can keep zooming out until you know where you are, using the plus and minus buttons, or your mouse wheel.

The name Catbrain, according to link, means “rough stony soil”, from Middle English, and has nothing to do with cats or their brains at all.

Extra: I was actually surprised that this worked at all, because with only one point, how does it know what scale to draw the map? Also, unless your UK geography is really good, you won’t have any clue about exactly where this is. That’s the reason for the next part.

\(\blacksquare\)

  1. Make a dataframe containing some other British cities as well as Catbrain, and find their latitudes and longitudes.

Solution

I chose the cities below, mostly somewhere near Catbrain. You could fire up a Google map, zoom it out until it contains something you know, and pick some places you’ve heard of. (I happen to know British geography pretty well, so I just picked some mostly nearby places out of my head. I didn’t really want to pick London, but I figured this was the one you might know.)

catbrain2 <- tribble(
  ~where,
  "Catbrain UK",
  "Bristol UK",
  "Taunton UK",
  "Newport UK",
  "Gloucester UK",
  "Cardiff UK",
  "Birmingham UK",
  "London UK",
  "Caldicot UK"
)
catbrain2 %>%
  rowwise() %>% 
  mutate(ll = list(geocode_OSM(where))) %>% 
  unnest_wider(ll) %>% 
  unnest_wider(coords) -> catbrain2
catbrain2

The first time I did this, I forgot the rowwise, which we didn’t need the first time (there was only one place), but here, it causes odd problems if you omit it.

\(\blacksquare\)

  1. Draw a map containing the places you picked.

Solution

The map-drawing is almost the same, just changing the dataframe:

leaflet(data = catbrain2) %>% 
  addTiles() %>% 
  addCircleMarkers(lng = ~x, lat = ~y)

Now, if you have any sense of the geography of the UK, you know where you are. The big river (the Severn) is the border between England and Wales, so the places above and to the left of it are in Wales, including Caldicot (see question about Roman pottery).

You can zoom this map in (once you have figured out which of the circles is Catbrain) and find Lysander Road again, and also the M5 (see below).

More irrelevant extra: the M5 is one of the English “motorways” (like 400-series highways or US Interstates). The M5 goes from Birmingham to Exeter. You can tell that this is England because of the huge number of traffic circles, known there as “roundabouts”. One of the first things they teach you in British driving schools is how to handle roundabouts: which lane to approach them in, which (stick-shift) gear to be in, and when you’re supposed to signal where you’re going. I hope I still remember all that for when I next drive in England!

\(\blacksquare\)


  1. If you’re a soccer fan, this Aston is what Aston Villa is named after. See if you can find the team’s stadium Villa Park on your map, which is actually closer to Witton station on another line.↩︎

  2. The station is called just University, not Birmingham University, which makes it hard to find.↩︎

  3. I knew this existed, but I couldn’t remember what it was called, which made it hard to search for. My first port of call was na_if, which converts values to NA, the opposite of what I wanted. But from its See Also I got na_replace, and from the See Also of that, I found out what coalesce does.↩︎