Packages
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The pig feed data again
my_url <- "http://ritsokiguess.site/datafiles/pigs1.txt"
pigs <- read_table (my_url)
pigs
# A tibble: 5 × 5
pig feed1 feed2 feed3 feed4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 60.8 68.7 92.6 87.9
2 2 57 67.7 92.1 84.2
3 3 65 74 90.2 83.1
4 4 58.6 66.3 96.5 85.7
5 5 61.7 69.8 99.1 90.3
Make longer (as before)
pigs %>% pivot_longer (- pig, names_to= "feed" ,
values_to= "weight" ) -> pigs_longer
pigs_longer
# A tibble: 20 × 3
pig feed weight
<dbl> <chr> <dbl>
1 1 feed1 60.8
2 1 feed2 68.7
3 1 feed3 92.6
4 1 feed4 87.9
5 2 feed1 57
6 2 feed2 67.7
7 2 feed3 92.1
8 2 feed4 84.2
9 3 feed1 65
10 3 feed2 74
11 3 feed3 90.2
12 3 feed4 83.1
13 4 feed1 58.6
14 4 feed2 66.3
15 4 feed3 96.5
16 4 feed4 85.7
17 5 feed1 61.7
18 5 feed2 69.8
19 5 feed3 99.1
20 5 feed4 90.3
Make wider two ways 1/2
pivot_wider
is inverse of pivot_longer
:
pigs_longer %>%
pivot_wider (names_from= feed, values_from= weight)
# A tibble: 5 × 5
pig feed1 feed2 feed3 feed4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 60.8 68.7 92.6 87.9
2 2 57 67.7 92.1 84.2
3 3 65 74 90.2 83.1
4 4 58.6 66.3 96.5 85.7
5 5 61.7 69.8 99.1 90.3
we are back where we started.
Make wider 2/2
Or
pigs_longer %>%
pivot_wider (names_from= pig, values_from= weight)
# A tibble: 4 × 6
feed `1` `2` `3` `4` `5`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 feed1 60.8 57 65 58.6 61.7
2 feed2 68.7 67.7 74 66.3 69.8
3 feed3 92.6 92.1 90.2 96.5 99.1
4 feed4 87.9 84.2 83.1 85.7 90.3
Disease presence and absence at two locations
Frequencies of plants observed with and without disease at two locations:
Species Disease present Disease absent
Location X Location Y Location X Location Y
A 44 12 38 10
B 28 22 20 18
This has two rows of headers, so I rewrote the data file:
Species present_x present_y absent_x absent_y
A 44 12 38 10
B 28 22 20 18
Read into data frame called prevalence
.
# A tibble: 2 × 5
Species present_x present_y absent_x absent_y
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 44 12 38 10
2 B 28 22 20 18
Lengthen and separate
prevalence %>%
pivot_longer (- Species, names_to = "column" ,
values_to = "freq" ) %>%
separate_wider_delim (column, "_" ,
names = c ("disease" , "location" ))
# A tibble: 8 × 4
Species disease location freq
<chr> <chr> <chr> <dbl>
1 A present x 44
2 A present y 12
3 A absent x 38
4 A absent y 10
5 B present x 28
6 B present y 22
7 B absent x 20
8 B absent y 18
Making longer, the better way
prevalence %>%
pivot_longer (- Species, names_to= c ("disease" , "location" ),
names_sep= "_" ,
values_to= "frequency" ) -> prevalence_longer
prevalence_longer
# A tibble: 8 × 4
Species disease location frequency
<chr> <chr> <chr> <dbl>
1 A present x 44
2 A present y 12
3 A absent x 38
4 A absent y 10
5 B present x 28
6 B present y 22
7 B absent x 20
8 B absent y 18
Making wider, different ways
prevalence_longer %>%
pivot_wider (names_from= c (Species, location), values_from= frequency)
# A tibble: 2 × 5
disease A_x A_y B_x B_y
<chr> <dbl> <dbl> <dbl> <dbl>
1 present 44 12 28 22
2 absent 38 10 20 18
prevalence_longer %>%
pivot_wider (names_from= location, values_from= frequency)
# A tibble: 4 × 4
Species disease x y
<chr> <chr> <dbl> <dbl>
1 A present 44 12
2 A absent 38 10
3 B present 28 22
4 B absent 20 18
Interlude
# A tibble: 20 × 3
pig feed weight
<dbl> <chr> <dbl>
1 1 feed1 60.8
2 1 feed2 68.7
3 1 feed3 92.6
4 1 feed4 87.9
5 2 feed1 57
6 2 feed2 67.7
7 2 feed3 92.1
8 2 feed4 84.2
9 3 feed1 65
10 3 feed2 74
11 3 feed3 90.2
12 3 feed4 83.1
13 4 feed1 58.6
14 4 feed2 66.3
15 4 feed3 96.5
16 4 feed4 85.7
17 5 feed1 61.7
18 5 feed2 69.8
19 5 feed3 99.1
20 5 feed4 90.3
pigs_longer %>%
group_by (feed) %>%
summarize (weight_mean= mean (weight))
# A tibble: 4 × 2
feed weight_mean
<chr> <dbl>
1 feed1 60.6
2 feed2 69.3
3 feed3 94.1
4 feed4 86.2
What if summary is more than one number?
eg. quartiles:
pigs_longer %>%
group_by (feed) %>%
summarize (r= quantile (weight, c (0.25 , 0.75 )))
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'feed'. You can override using the
`.groups` argument.
# A tibble: 8 × 2
# Groups: feed [4]
feed r
<chr> <dbl>
1 feed1 58.6
2 feed1 61.7
3 feed2 67.7
4 feed2 69.8
5 feed3 92.1
6 feed3 96.5
7 feed4 84.2
8 feed4 87.9
Following the hint…
pigs_longer %>%
group_by (feed) %>%
reframe (r= quantile (weight, c (0.25 , 0.75 )))
# A tibble: 8 × 2
feed r
<chr> <dbl>
1 feed1 58.6
2 feed1 61.7
3 feed2 67.7
4 feed2 69.8
5 feed3 92.1
6 feed3 96.5
7 feed4 84.2
8 feed4 87.9
this also works
pigs_longer %>%
group_by (feed) %>%
summarize (r= quantile (weight, c (0.25 , 0.75 )))
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'feed'. You can override using the
`.groups` argument.
# A tibble: 8 × 2
# Groups: feed [4]
feed r
<chr> <dbl>
1 feed1 58.6
2 feed1 61.7
3 feed2 67.7
4 feed2 69.8
5 feed3 92.1
6 feed3 96.5
7 feed4 84.2
8 feed4 87.9
pigs_longer %>%
group_by (feed) %>%
summarize (r= list (quantile (weight, c (0.25 , 0.75 )))) %>%
unnest (r)
# A tibble: 8 × 2
feed r
<chr> <dbl>
1 feed1 58.6
2 feed1 61.7
3 feed2 67.7
4 feed2 69.8
5 feed3 92.1
6 feed3 96.5
7 feed4 84.2
8 feed4 87.9
or, even better, use enframe
:
quantile (pigs_longer$ weight, c (0.25 , 0.75 ))
enframe (quantile (pigs_longer$ weight, c (0.25 , 0.75 )))
# A tibble: 2 × 2
name value
<chr> <dbl>
1 25% 66.0
2 75% 90.2
A nice look
Run this one line at a time to see how it works:
pigs_longer %>%
group_by (feed) %>%
summarize (r= list (enframe (quantile (weight, c (0.25 , 0.75 ))))) %>%
unnest (r) %>%
pivot_wider (names_from= name, values_from= value) -> d
d
# A tibble: 4 × 3
feed `25%` `75%`
<chr> <dbl> <dbl>
1 feed1 58.6 61.7
2 feed2 67.7 69.8
3 feed3 92.1 96.5
4 feed4 84.2 87.9
A hairy one
18 people receive one of three treatments. At 3 different times (pre, post, followup) two variables y
and z
are measured on each person:
# A tibble: 18 × 8
treatment rep pre_y post_y fu_y pre_z post_z fu_z
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 1 3 13 9 0 0 9
2 A 2 0 14 10 6 6 3
3 A 3 4 6 17 8 2 6
4 A 4 7 7 13 7 6 4
5 A 5 3 12 11 6 12 6
6 A 6 10 14 8 13 3 8
7 B 1 9 11 17 8 11 27
8 B 2 4 16 13 9 3 26
9 B 3 8 10 9 12 0 18
10 B 4 5 9 13 3 0 14
11 B 5 0 15 11 3 0 25
12 B 6 4 11 14 4 2 9
13 Control 1 10 12 15 4 3 7
14 Control 2 2 8 12 8 7 20
15 Control 3 4 9 10 2 0 10
16 Control 4 10 8 8 5 8 14
17 Control 5 11 11 11 1 0 11
18 Control 6 1 5 15 8 9 10
# A tibble: 18 × 8
id treatment pre_y post_y fu_y pre_z post_z fu_z
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A.1 A 3 13 9 0 0 9
2 A.2 A 0 14 10 6 6 3
3 A.3 A 4 6 17 8 2 6
4 A.4 A 7 7 13 7 6 4
5 A.5 A 3 12 11 6 12 6
6 A.6 A 10 14 8 13 3 8
7 B.1 B 9 11 17 8 11 27
8 B.2 B 4 16 13 9 3 26
9 B.3 B 8 10 9 12 0 18
10 B.4 B 5 9 13 3 0 14
11 B.5 B 0 15 11 3 0 25
12 B.6 B 4 11 14 4 2 9
13 Control.1 Control 10 12 15 4 3 7
14 Control.2 Control 2 8 12 8 7 20
15 Control.3 Control 4 9 10 2 0 10
16 Control.4 Control 10 8 8 5 8 14
17 Control.5 Control 11 11 11 1 0 11
18 Control.6 Control 1 5 15 8 9 10
repmes %>% pivot_longer (contains ("_" ),
names_to= c ("time" , ".value" ),
names_sep= "_"
)
# A tibble: 54 × 5
id treatment time y z
<chr> <chr> <chr> <dbl> <dbl>
1 A.1 A pre 3 0
2 A.1 A post 13 0
3 A.1 A fu 9 9
4 A.2 A pre 0 6
5 A.2 A post 14 6
6 A.2 A fu 10 3
7 A.3 A pre 4 8
8 A.3 A post 6 2
9 A.3 A fu 17 6
10 A.4 A pre 7 7
# ℹ 44 more rows
Attempt 1
repmes %>% pivot_longer (contains ("_" ),
names_to= c ("time" , "var" ),
names_sep= "_" ,
values_to = "vvv"
)
# A tibble: 108 × 5
id treatment time var vvv
<chr> <chr> <chr> <chr> <dbl>
1 A.1 A pre y 3
2 A.1 A post y 13
3 A.1 A fu y 9
4 A.1 A pre z 0
5 A.1 A post z 0
6 A.1 A fu z 9
7 A.2 A pre y 0
8 A.2 A post y 14
9 A.2 A fu y 10
10 A.2 A pre z 6
# ℹ 98 more rows
This is too long! We wanted a column called y
and a column called z
, but they have been pivoted-longer too.
Attempt 2
repmes %>% pivot_longer (contains ("_" ),
names_to= c ("time" , ".value" ),
names_sep= "_"
) -> repmes3
repmes3
# A tibble: 54 × 5
id treatment time y z
<chr> <chr> <chr> <dbl> <dbl>
1 A.1 A pre 3 0
2 A.1 A post 13 0
3 A.1 A fu 9 9
4 A.2 A pre 0 6
5 A.2 A post 14 6
6 A.2 A fu 10 3
7 A.3 A pre 4 8
8 A.3 A post 6 2
9 A.3 A fu 17 6
10 A.4 A pre 7 7
# ℹ 44 more rows
This has done what we wanted.
make a graph
ggplot (repmes3, aes (x= fct_inorder (time), y= y,
colour= treatment, group= id)) +
geom_point () + geom_line ()
A so-called spaghetti plot. The three measurements for each person are joined by lines, and the lines are coloured by treatment.
or do the plot with means
repmes3 %>% group_by (treatment, ftime= fct_inorder (time)) %>%
summarize (mean_y= mean (y)) %>%
ggplot (aes (x= ftime, y= mean_y, colour= treatment,
group= treatment)) +
geom_point () + geom_line ()
`summarise()` has grouped output by 'treatment'. You can override using the
`.groups` argument.
On average, the two real treatments go up and level off, but the control group is very different.