The pig feed data again
my_url <- "http://ritsokiguess.site/datafiles/pigs1.txt"
pigs <- read_table (my_url)
pigs
Make longer (as before)
pigs %>% pivot_longer (- pig, names_to= "feed" ,
values_to= "weight" ) -> pigs_longer
pigs_longer
Make wider two ways 1/2
pivot_wider
is inverse of pivot_longer
:
pigs_longer %>%
pivot_wider (names_from= feed, values_from= weight)
we are back where we started.
Make wider 2/2
Or
pigs_longer %>%
pivot_wider (names_from= pig, values_from= weight)
Disease presence and absence at two locations
Frequencies of plants observed with and without disease at two locations:
Species Disease present Disease absent
Location X Location Y Location X Location Y
A 44 12 38 10
B 28 22 20 18
This has two rows of headers, so I rewrote the data file:
Species present_x present_y absent_x absent_y
A 44 12 38 10
B 28 22 20 18
Read in
… into data frame called prevalence
:
my_url <- "http://ritsokiguess.site/STAC32/disease.txt"
prevalence <- read_table (my_url)
prevalence
Lengthen and separate
prevalence %>%
pivot_longer (- Species, names_to = "column" ,
values_to = "freq" ) %>%
separate_wider_delim (column, "_" ,
names = c ("disease" , "location" ))
Making longer, the better way
prevalence %>%
pivot_longer (- Species, names_to= c ("disease" , "location" ),
names_sep= "_" ,
values_to= "frequency" ) -> prevalence_longer
prevalence_longer
Making wider, different ways 1/2
prevalence_longer %>%
pivot_wider (names_from= c (Species, location),
values_from= frequency)
Making wider, different ways 2/2
prevalence_longer %>%
pivot_wider (names_from= location, values_from= frequency)
Interlude
Pigs data again:
pigs_longer %>%
group_by (feed) %>%
summarize (weight_mean= mean (weight))
What if summary is more than one number?
eg. quartiles:
pigs_longer %>%
group_by (feed) %>%
summarize (r= quantile (weight, c (0.25 , 0.75 )))
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'feed'. You can override using the
`.groups` argument.
Following the hint (gives no warning)
pigs_longer %>%
group_by (feed) %>%
reframe (r= quantile (weight, c (0.25 , 0.75 )))
this also works
pigs_longer %>%
group_by (feed) %>%
summarize (r= list (quantile (weight, c (0.25 , 0.75 )))) %>%
unnest (r)
Or, even better, use enframe
:
quantile (pigs_longer$ weight, c (0.25 , 0.75 ))
enframe (quantile (pigs_longer$ weight, c (0.25 , 0.75 )))
A nice look
Run this one line at a time to see how it works:
pigs_longer %>%
group_by (feed) %>%
summarize (r= list (enframe (quantile (weight, c (0.25 , 0.75 ))))) %>%
unnest (r) %>%
pivot_wider (names_from= name, values_from= value) -> d
d
A hairy one
18 people receive one of three treatments. At 3 different times (pre, post, followup) two variables y
and z
are measured on each person:
my_url <- "http://ritsokiguess.site/STAC32/repmes.txt"
repmes0 <- read_table (my_url)
repmes0
Create unique ids
repmes0 %>% mutate (id= str_c (treatment, "." , rep)) %>%
select (- rep) %>%
select (id, everything ()) -> repmes
repmes
Attempt 1
repmes %>% pivot_longer (contains ("_" ),
names_to= c ("time" , "var" ),
names_sep= "_" ,
values_to = "vvv"
)
Attempt 2
repmes %>% pivot_longer (contains ("_" ),
names_to= c ("time" , ".value" ),
names_sep= "_"
) -> repmes3
repmes3
Make a graph
ggplot (repmes3, aes (x= fct_inorder (time), y= y,
colour= treatment, group = id)) +
geom_point () + geom_line ()
Or do the plot with means
repmes3 %>% group_by (treatment, ftime= fct_inorder (time)) %>%
summarize (mean_y= mean (y)) %>%
ggplot (aes (x= ftime, y= mean_y, colour= treatment,
group= treatment)) +
geom_point () + geom_line ()
Comment
This has done what we wanted.