library(tidyverse)
library(smmr)
library(MASS)
library(nnet)
library(survival)
library(survminer)
library(car)
library(lme4)
library(ggbiplot)
library(ggrepel)
library(broom)
library(rpart)
library(bootstrap)
library(cmdstanr)
library(posterior)
library(bayesplot)
library(tmaptools)
library(leaflet)
library(conflicted)
conflict_prefer("summarize", "dplyr")
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
conflict_prefer("mutate", "dplyr")
conflict_prefer("count", "dplyr")
conflict_prefer("arrange", "dplyr")
conflict_prefer("rename", "dplyr")
conflict_prefer("id", "dplyr")
Problems and Solutions in Applied Statistics (2nd ed)
Introduction
This book contains a collection of problems, and my solutions to them, in applied statistics with R. These come from my courses STAC32, STAC33, and STAD29 at the University of Toronto Scarborough.
You will occasionally see question parts beginning with a *; this means that other question parts refer back to this one. (One of my favourite question strategies is to ask how two different approaches lead to the same answer, or more generally to demonstrate that there are different ways to see the same thing.)
Thanks to Dann Sioson for spotting some errors and making some useful suggestions.
If you see anything, file an issue on the Github page for now. Likely problems include:
- some LaTeX construction that I didn’t catch (eg. block quotes)
- disappeared footnotes (that will show up as an apparently missing sentence in the text)
- references to “in class” or a lecture or a course by course number, which need to be eliminated (in favour of wording like “a previous course”)
- references to other questions or question parts that are wrong (likely caused by not being “labels” or “refs” in the original LaTeX)
- my contorted English that is difficult to understand.
As I read through looking for problems like these, I realize that there ought to be a textbook that reflects my way of doing things. There isn’t one (yet), though there are lecture notes. Current versions of these are at:
Students at the University of Toronto can use the online R Studio at r.datatools.utoronto.ca
for free as long as they are students here; others can use rstudio.cloud
to try out an online interface (the free tier may not last you very long). Ultimately, the recommended approach for everyone is to install R Studio on their own computer (instructions in the lecture notes above) and then there should be no problems accessing anything. R Studio runs the same whether online or on your own computer.
A little background:
STAC32 is an introduction to R as applied to statistical methods that have (mostly) been learned in previous courses. This course is designed for students who have a second non-mathematical applied statistics course such as this. The idea is that students have already seen a little of regression and analysis of variance (and the things that precede them), and need mainly an introduction of how to run them in R.)
STAC33 is an introduction to R, and applied statistics in general, for students who have a background in mathematical statistics. The way our courses are structured, these students have a strong mathematical background, but not very much experience in applications, which this course is designed to provide. The material covered is similar to STAC32, with a planned addition of some ideas in bootstrap and practical Bayesian statistics. There are some questions on these here.
STAD29 is an overview of a number of advanced statistical methods. I start from regression and proceed to some regression-like methods (logistic regression, survival analysis, log-linear frequency table analysis), then I go a little further with analysis of variance and proceed with MANOVA and repeated measures. I finish with a look at classical multivariate methods such as discriminant analysis, cluster analysis, principal components and factor analysis. I cover a number of methods in no great depth; my aim is to convey an understanding of what these methods are for, how to run them and how to interpret the results. Statistics majors and specialists cannot take this course for credit (they have separate courses covering this material with the proper mathematical background). D29 is intended for students in other disciplines who find themselves wanting to learn more statistics; we have an Applied Statistics Minor program for which C32 and D29 are two of the last courses.
Packages used somewhere in this book
The bottom lines are below used with the conflicted
package: if a function by the name shown is in two or more packages, prefer the one from the package shown.
All of these packages are on CRAN, and may be installed via the usual install.packages
, with the exceptions of:
smmr
on Github: install with
::install_github("nxskok/smmr") devtools
ggbiplot
on Github: install with
::install_github("vqv/ggbiplot") devtools
cmdstanr
,posterior
, andbayesplot
: install with
install.packages("cmdstanr",
repos = c("https://mc-stan.org/r-packages/",
getOption("repos")))
install.packages("posterior",
repos = c("https://mc-stan.org/r-packages/",
getOption("repos")))
install.packages("bayesplot",
repos = c("https://mc-stan.org/r-packages/",
getOption("repos")))