Tidying data: extras

Packages

library(tidyverse)

The pig feed data again

my_url <- "http://ritsokiguess.site/datafiles/pigs1.txt"
pigs <- read_table(my_url)
pigs

Make longer (as before)

pigs %>% pivot_longer(-pig, names_to="feed", 
                      values_to="weight") -> pigs_longer
pigs_longer

Make wider two ways 1/2

pivot_wider is inverse of pivot_longer:

pigs_longer %>% 
  pivot_wider(names_from=feed, values_from=weight)

we are back where we started.

Make wider 2/2

pigs_longer %>% 
  pivot_wider(names_from=pig, values_from=weight)

Disease presence and absence at two locations

Frequencies of plants observed with and without disease at two locations:

Species     Disease present         Disease absent
       Location X Location Y  Location X Location Y
A            44         12          38        10
B            28         22          20        18

This has two rows of headers, so I rewrote the data file:

Species  present_x present_y    absent_x  absent_y
A            44         12          38        10
B            28         22          20        18

Read in

… into data frame called prevalence:

my_url <- "http://ritsokiguess.site/STAC32/disease.txt"
prevalence <- read_table(my_url)
prevalence

Lengthen and separate

prevalence %>% 
  pivot_longer(-Species, names_to = "column", 
               values_to = "freq") %>% 
  separate_wider_delim(column, "_", 
                       names = c("disease", "location"))

Making longer, the better way

prevalence %>% 
  pivot_longer(-Species, names_to=c("disease", "location"),
               names_sep="_", 
               values_to="frequency") -> prevalence_longer 
prevalence_longer

Making wider, different ways 1/2

prevalence_longer %>% 
  pivot_wider(names_from=c(Species, location), 
              values_from=frequency)

Making wider, different ways 2/2

prevalence_longer %>% 
  pivot_wider(names_from=location, values_from=frequency)

Interlude

Pigs data again:

pigs_longer %>% 
  group_by(feed) %>% 
  summarize(weight_mean=mean(weight))

What if summary is more than one number?

eg. quartiles:

pigs_longer %>% 
  group_by(feed) %>% 
  summarize(r=quantile(weight, c(0.25, 0.75)))

Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.

`summarise()` has grouped output by 'feed'. You can override using the
`.groups` argument.

Following the hint (gives no warning)

pigs_longer %>% 
  group_by(feed) %>% 
  reframe(r=quantile(weight, c(0.25, 0.75)))

this also works

pigs_longer %>% 
  group_by(feed) %>% 
  summarize(r=list(quantile(weight, c(0.25, 0.75)))) %>% 
  unnest(r)

Or, even better, use `enframe`:

quantile(pigs_longer$weight, c(0.25, 0.75))

   25%    75% 
65.975 90.225

enframe(quantile(pigs_longer$weight, c(0.25, 0.75)))

A nice look

Run this one line at a time to see how it works:

pigs_longer %>% 
  group_by(feed) %>% 
  summarize(r=list(enframe(quantile(weight, c(0.25, 0.75))))) %>% 
  unnest(r) %>% 
  pivot_wider(names_from=name, values_from=value) -> d
d

A hairy one

18 people receive one of three treatments. At 3 different times (pre, post, followup) two variables y and z are measured on each person:

my_url <- "http://ritsokiguess.site/STAC32/repmes.txt"
repmes0 <- read_table(my_url)
repmes0

Create unique ids

repmes0 %>% mutate(id=str_c(treatment, ".", rep)) %>% 
  select(-rep) %>% 
  select(id, everything()) -> repmes
repmes

Attempt 1

repmes %>% pivot_longer(contains("_"),
                        names_to=c("time", "var"),
                        names_sep="_",
                        values_to = "vvv"
                         )

Comment

This is too long! We wanted a column called y and a column called z, but they have been pivoted-longer too.

Attempt 2

repmes %>% pivot_longer(contains("_"),
                        names_to=c("time", ".value"),
                        names_sep="_"
                        ) -> repmes3
repmes3

Comment

This has done what we wanted.

Make a graph

ggplot(repmes3, aes(x=fct_inorder(time), y=y, 
                    colour=treatment, group = id)) +
  geom_point() + geom_line()

Comment

A so-called “spaghetti plot”:
- The three measurements for each person are joined by lines
- The lines are coloured by treatment.

Or do the plot with means

repmes3 %>% group_by(treatment, ftime=fct_inorder(time)) %>% 
  summarize(mean_y=mean(y)) %>% 
  ggplot(aes(x=ftime, y=mean_y, colour=treatment, 
             group=treatment)) + 
    geom_point() + geom_line()

Comment

On average, the two real treatments go up and level off
but the control group is very different.

Tidying data: extras

Packages

The pig feed data again

Make longer (as before)

Make wider two ways 1/2

Make wider 2/2

Disease presence and absence at two locations

Read in

Lengthen and separate

Making longer, the better way

Making wider, different ways 1/2

Making wider, different ways 2/2

Interlude

What if summary is more than one number?

Following the hint (gives no warning)

this also works

Or, even better, use enframe:

A nice look

A hairy one

Create unique ids

Attempt 1

Comment

Attempt 2

Comment

Make a graph

Comment

Or do the plot with means

Comment

Or, even better, use `enframe`: