---
title: "Repeated measures analysis"
editor:
markdown:
wrap: 72
---
## Repeated measures by profile analysis
- More than one response *measurement* for each subject. Might be
- measurements of the same thing at different times
- measurements of different but related things
- Generalization of matched pairs ("matched triples", etc.).
- Variation: each subject does several different treatments at
different times (called *crossover design*).
- Expect measurements on same subject to be correlated, so assumptions
of independence will fail.
- Called *repeated measures*. Different approaches, but *profile
analysis* uses `Manova` (set up right way).
- Another approach uses *mixed models* (random effects).
## Packages
```{r bProfile-1, results="hide", message=FALSE}
library(car)
library(tidyverse)
library(lme4) # for mixed models later
```
## Example: histamine in dogs
- 8 dogs take part in experiment.
- Dogs randomized to one of 2 different drugs.
- Response: log of blood concentration of histamine 0, 1, 3 and 5
minutes after taking drug. (Repeated measures.)
- Data in `dogs.txt`, column-aligned.
## Read in data
```{r bProfile-2 }
my_url <- "http://ritsokiguess.site/datafiles/dogs.txt"
dogs <- read_table(my_url)
```
## Setting things up
```{r bProfile-3}
dogs
response <- with(dogs, cbind(lh0, lh1, lh3, lh5))
response
```
## The repeated measures MANOVA
Get list of response variable names; we call them `times`. Save in data
frame.
```{r bProfile-4, echo=FALSE}
options(width = 70)
```
\small
```{r bProfile-5, error=TRUE}
times <- colnames(response)
times
times.df <- data.frame(times=factor(times))
times.df
```
\normalsize
## Fitting the model
```{r}
dogs.1 <- lm(response ~ drug, data = dogs)
dogs.2 <- Manova(dogs.1,
idata = times.df,
idesign = ~times
)
```
## The output (some; there is a lot)
\tiny
```{r}
summary(dogs.2)
```
\normalsize
## What there is here
- three sets of tests, for
- times
- drug
- their interaction
- two *types* of test for each of these:
- multivariate
- univariate
- multivariate is the same as MANOVA
- univariate is more powerful *if* it applies
## Sphericity
- The thing that decides whether the univariate tests apply is called
"sphericity".
- This holds if the outcomes have equal variance (to each other) and
have the same (positive) correlation across subjects.
- Tested using Mauchly's test (part of output)
- If sphericity rejected, there are adjustments to the univariate
P-values due to Huynh-Feldt and Greenhouse-Geisser. Huynh-Feldt
better if responses not actually normal (safer).
## Univariate tests
\scriptsize
```{r}
summary(dogs.2)$sphericity.tests
summary(dogs.2)$pval.adjustments
summary(dogs.2)$univariate.tests
```
\normalsize
## Comments
- The sphericity test for the interaction is almost significant
- The H-F adjusted P-value for the interaction is a bit bigger than
the univariate one, but still strongly significant.
- Therefore any lack of sphericity does not affect our conclusion:
there is an interaction between drug and time
- ie that the effect of time on log-histamine is different for the two
drugs.
## Comments
- Here, univariate test with Huynh-Feldt correction to P-value for
interaction was 0.00073.
- Significant interaction *is* the conclusion here.
- If the interaction had not been significant:
- cannot remove interaction with time
- so look at univariate (better, especially if adjusted for
sphericity) tests of main effects in *this* model
## Next
- Interaction significant. Pattern of response over time different for
the two drugs.
- Want to investigate interaction.
## The wrong shape
- But data frame has several observations per line ("wide format"):
\scriptsize
```{r bProfile-6 }
dogs %>% slice(1:6)
```
\normalsize
- Plotting works with data in "long format": one response per line.
- The responses are log-histamine at different times, labelled
`lh`-something. Call them all `lh` and put them in one column, with
the time they belong to labelled.
## Running `pivot_longer`, try 1
\footnotesize
```{r bProfile-7, size="footnotesize"}
dogs %>% pivot_longer(starts_with("lh"),
names_to = "time", values_to = "lh")
```
\normalsize
## Getting the times
Not quite right: for the times, we want just the numbers, not the
letters `lh` every time. Want new variable containing just number in
`time`: `parse_number`.
\footnotesize
```{r bProfile-8}
dogs %>%
pivot_longer(starts_with("lh"),
names_to = "timex", values_to = "lh") %>%
mutate(time = parse_number(timex))
```
\normalsize
## What I did differently
- I realized that `pivot_longer` was going to produce something like
`lh1`, which I needed to do something further with, so this time I
gave it a temporary name `timex`.
- This enabled me to use the name `time` for the actual numeric time.
- This works now, so next save into a new data frame `dogs.long`.
## Saving the pipelined results
```{r bProfile-9 }
dogs %>%
pivot_longer(starts_with("lh"),
names_to = "timex", values_to = "lh") %>%
mutate(time = parse_number(timex)) -> dogs.long
dogs.long
```
## Comments
This says:
- Take data frame dogs, and then:
- Combine the columns `lh0` through `lh5` into one column called `lh`,
with the column that each `lh` value originally came from labelled
by `timex`, and then:
- Pull out numeric values in `timex`, saving in `time` and then:
- save the result in a data frame `dogs.long`.
## Interaction plot
\small
```{r bProfile-10, fig.height=3.5}
ggplot(dogs.long, aes(x = time, y = lh,
colour = drug, group = drug)) +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun = mean, geom = "line")
```
\normalsize
## Comments
- Plot mean `lh` value at each time, joining points on same drug by
lines.
- drugs same at time 0
- after that, Trimethaphan higher than Morphine.
- Effect of drug not consistent over time: significant interaction.
## Take out time zero
- Lines on interaction plot would then be parallel, and so interaction
should no longer be significant.
- Go back to original "wide" `dogs` data frame.
```{r bProfile-11, size="footnotesize", error=TRUE}
response <- with(dogs, cbind(lh1, lh3, lh5)) # excl time 0
dogs.1 <- lm(response ~ drug, data = dogs)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
dogs.2 <- Manova(dogs.1,
idata = times.df,
idesign = ~times
)
```
## Results (univariate)
\scriptsize
```{r}
summary(dogs.2)$sphericity.tests
summary(dogs.2)$pval.adjustments
summary(dogs.2)$univariate.tests
```
\normalsize
## Comments
- sphericity: no problem (P-value 0.25)
- univariate test for interaction no longer significant (P-value
0.082)
- look at main effects:
- strong significance of time, even after taking out time 0
- actually *not* significant drug effect, despite interaction plot
## Is the non-significant drug effect reasonable?
- Plot *actual data*: `lh` against `days`, labelling observations by
drug: "spaghetti plot".
- Uses long data frame (confusing, yes I know):
- Plot (time,lh) points coloured by drug
- and connecting measurements for each *dog* by lines.
- This time, we want `group = dog` (want the measurements for each
*dog* joined by lines), but `colour = drug`:
```{r platanias}
ggplot(dogs.long, aes(x = time, y = lh,
colour = drug, group = dog)) +
geom_point() + geom_line() -> g
```
## The spaghetti plot
```{r hoverla,fig.height=4.5}
g
```
## Comments
- For each dog over time, there is a strong increase and gradual
decrease in log-histamine. The gradual decrease explains the
significant time effect after we took out time 0.
- The pattern is more or less the same for each dog, regardless of
drug. This explains the non-significant interaction.
- Most of the trimethaphan dogs (blue) have higher log-histamine
throughout (time 1 and after), and some of the morphine dogs have
lower.
- *But* two of the morphine dogs have log-histamine profiles like the
trimethaphan dogs. This ambiguity is probably why the `drug` effect
is not quite significant.
## Mixed models
- Another way to fit repeated measures
- Subjects (on whom repeated measures taken) are *random sample of all
possible subjects* (random effects)
- Times and treatments are *the only ones we care about* (fixed
effects)
- Use package `lme4` function `lmer` (like `lm` in some ways)
- Uses long-format "tidy" data
## Fitting the model (uses `lme4`)
```{r bProfile-13, message=FALSE}
# dogs.long including time zero
dogs.3 <- lmer(lh~drug*time+(1|dog), data=dogs.long)
```
- note specification of random effect: each dog has "random intercept"
that moves log-histamine up or down for that dog over all times
## What can we drop?
- using `drop1`:
```{r bProfile-14}
drop1(dogs.3,test="Chisq")
```
- Interaction again not significant, but P-value smaller than before
## Re-fit without interaction
```{r bProfile-15}
dogs.4 <- update(dogs.3,.~.-drug:time)
drop1(dogs.4,test="Chisq")
```
- This time neither drug nor (surprisingly) time is significant.
- MANOVA and `lmer` methods won't agree, but both valid ways to
approach problem.
## The exercise data
- 30 people took part in an exercise study.
- Each subject was randomly assigned to one of two diets ("low fat" or
"non-low fat") and to one of three exercise programs ("at rest",
"walking", "running").
- There are $2\times3 = 6$ experimental treatments, and thus each one
is replicated $30/6=5$ times.
- Nothing unusual so far.
- However, each subject had their pulse rate measured at three
different times (1, 15 and 30 minutes after starting their
exercise), so have repeated measures.
## Reading the data
Separated by *tabs*:
```{r bProfile-16 }
url <- "http://ritsokiguess.site/datafiles/exercise2.txt"
exercise.long <- read_tsv(url)
exercise.long
```
- This is "long format", which is usually what we want.
- But for repeated measures analysis, we want *wide* format!
- `pivot_wider`.
## Making wide format
- `pivot_wider` needs: a column that is going to be split, and the
column to make the values out of:
\footnotesize
```{r bProfile-18}
exercise.long %>% pivot_wider(names_from=time,
values_from=pulse) -> exercise.wide
exercise.wide %>% sample_n(5)
```
\normalsize
- Normally `pivot_longer` \texttt{min01, min15,
min30} into one column called `pulse` labelled by the number of
minutes. But `Manova` needs it the other way.
## Setting up the repeated-measures analysis
- Make a response variable consisting of `min01, min15, min30`:
\small
```{r bProfile-19 }
response <- with(exercise.wide, cbind(min01, min15, min30))
```
\normalsize
- Predict that from `diet` and `exertype` and interaction using `lm`:
\small
```{r bProfile-20 }
exercise.1 <- lm(response ~ diet * exertype,
data = exercise.wide
)
```
\normalsize
- Run this through `Manova`:
\small
```{r bProfile-21, error=TRUE}
times <- colnames(response)
times.df <- data.frame(times=factor(times))
exercise.2 <- Manova(exercise.1,
idata = times.df,
idesign = ~times)
```
\normalsize
## Sphericity tests
```{r}
summary(exercise.2)$sphericity.tests
```
No problem with sphericity; go to univariate tests.
## Univariate tests
```{r, include=FALSE}
w <- getOption("width")
options(width = 120)
```
\scriptsize
```{r}
summary(exercise.2)$univariate.tests
```
\normalsize
```{r, include=FALSE}
options(width = w)
```
- The three-way interaction is significant
- the effect of diet on pulse rate over time is different for the
different exercise types
## Making some graphs
- Three-way interactions are difficult to understand. To make an
attempt, look at some graphs.
- Plot time trace of pulse rates for each individual, joined by lines,
and make *separate* plots for each `diet-exertype` combo.
- `ggplot` again. Using *long* data frame:
```{r bProfile-24 }
g <- ggplot(exercise.long, aes(
x = time, y = pulse,
group = id
)) + geom_point() + geom_line() +
facet_grid(diet ~ exertype)
```
- `facet_grid(diet~exertype)`: do a separate plot for each combination
of diet and exercise type, with diets going down the page and
exercise types going across. (Graphs are usually landscape, so have
the factor `exertype` with more levels going across.)
## The graph(s)
```{r bProfile-25, fig.height=5}
g
```
## Comments on graphs
- For subjects who were at rest, no change in pulse rate over time,
for both diet groups.
- For walking subjects, not much change in pulse rates over time.
Maybe a small increase on average between 1 and 15 minutes.
- For both running groups, an overall increase in pulse rate over
time, but the increase is stronger for the `lowfat` group.
- No consistent effect of diet over all exercise groups.
- No consistent effect of exercise type over both diet groups.
- No consistent effect of time over all diet-exercise type combos.
## "Simple effects" of diet for the subjects who ran
- Looks as if there is only any substantial time effect for the
runners. For them, does diet have an effect?
- Pull out only the runners from the wide data:
```{r bProfile-26 }
exercise.wide %>%
filter(exertype == "running") -> runners.wide
```
- Create response variable and do MANOVA. Some of this looks like
before, but I have different data now:
\footnotesize
```{r bProfile-27}
response <- with(runners.wide, cbind(min01, min15, min30))
runners.1 <- lm(response ~ diet, data = runners.wide)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
runners.2 <- Manova(runners.1,
idata = times.df,
idesign = ~times
)
```
\normalsize
## Sphericity tests
```{r}
summary(runners.2)$sphericity.tests
```
- No problem, look at univariate tests.
## Univariate tests
\footnotesize
```{r}
summary(runners.2)$univariate.tests
```
\normalsize
- Interaction still significant
- dependence of pulse rate on time still different for the two
diets
## How is the effect of diet different over time?
- Table of means. Only I need long data for this:
```{r bProfile-29 }
runners.wide %>%
pivot_longer(starts_with("min"),
names_to = "time", values_to = "pulse") %>%
group_by(time, diet) %>%
summarize(
mean = mean(pulse),
sd = sd(pulse)
) -> summ
```
- Result of `summarize` is data frame, so can save it (and do more
with it if needed).
## Understanding diet-time interaction
- The summary:
\footnotesize
```{r bProfile-30}
summ
```
\normalsize
- Pulse rates at any given time higher for `lowfat` (diet effect),
- Pulse rates increase over time of exercise (time effect),
- but the *amount by which pulse rate higher* for a diet depends on
time: `diet` by `time` interaction.
## Interaction plot
- We went to trouble of finding means by group, so making interaction
plot is now mainly easy:
```{r bProfile-31, fig.height=2.7}
ggplot(summ, aes(x = time, y = mean, colour = diet,
group = diet)) + geom_point() + geom_line()
```
## Comment on interaction plot
- The lines are not parallel, so there is interaction between diet and
time for the runners.
- The effect of time on pulse rate is different for the two diets,
even though all the subjects here were running.