Worksheet 11

Published

March 21, 2025

Packages

library(ggbiplot)
library(tidyverse)
library(tmaptools)
library(leaflet)
library(conflicted)
conflicts_prefer(dplyr::summarize)
conflicts_prefer(dplyr::filter)

Rugby league teams by location

There are 37 teams that play professional or semi-professional rugby league in Europe.1 These are listed in the file at http://ritsokiguess.site/datafiles/rugby-league-teams.csv, including the name of each team, its location, and the league in which they play (from Super League, best, to League One, worst). Our aim is to make a map of the locations of the teams, to see what we can learn about where the teams tend to be from.

  1. Read in and display (some of) the data.
  1. Look up the latitudes and longitudes of the location where each team plays.
  1. Draw a map showing where these 37 teams play.
  1. Where are most of the teams found?
  1. What can you find out about why most of the teams are located where they are?
  1. Re-draw your map, but now colouring the points according to which league the team plays in.

Need for cognition

Questionnaires are often used to identify aspects of personality. The data in http://www.utsc.utoronto.ca/~butler/d29/cognition.txt, were the results of a questionnaire intended to identify people’s “need for cognition”. The questionnaire items are shown here.2 Each response is on a scale of 1–5, with 5 denoting “strongly agree”” and 1 denoting “strongly disagree”. Some of the items are “reverse-coded”, which means that “strongly disagree” is intended to correspond to a strong “need for cognition”. (This is to stop people from simply answering “strongly agree” all the way down the list.) The questionnaire item file indicates which items were reverse-coded. The column id is a serial number identifying the respondent, and the questionnaire responses are in columns c1 through c18.

  1. Read in and display some of the data. Hint: the values are separated by single spaces, but there are some extra spaces on the beginning of the rows to make the columns line up. You might get one “parsing failure” warning that you can ignore.
  1. Check that all of the responses to the questionnaire items are between 1 and 5, except for some missing ones. (In this question and the next one, you might want to use a search engine to hunt for ideas, but you have seen the solution ideas before, so that when you see the search results, you need to be able to look at them and say “ah yes, we saw that before” or something similar that tells you that you know it will solve your problem because it has done so before.)
  1. Create and save a dataframe in which the respondents with any missing values at all are dropped.
  1. Run a principal components analysis on your new dataframe, having removed the column of IDs. Make a screeplot.
  1. Based on your screeplot, how many principal components do you think you should use? Explain briefly (in your head, if nowhere else).
  1. Look at the summary output of your principal components analysis. What does this tell you about your proposed number(s) of components?
  1. Look at the loadings on component 1. Can you explain the pattern of plus and minus signs on the loadings? (Hint: look at the description of the items linked above.)
  1. What items does component 2 mainly depend on? (For me, there were five loadings noticeably higher than the rest.) What do these items seem to have in common?
  1. Make a biplot of these data, using the id values from your cleaned (no NAs) dataset.
  1. Find a person who scored low on component 1. By looking at their original data, show that it makes sense that they would have scored low on component 1.
  1. Find a person who scored high on component 2. By looking at their original data, show that this too makes sense.

Footnotes

  1. Rugby league is also played in Australia, traditionally around the city of Sydney, and in places in the Pacific like New Zealand and Papua New Guinea.↩︎

  2. This was rather evidently made by putting a textbook on top of a photocopier and saying “scan this”.↩︎