When pivot-wider goes wrong

Packages

The inevitable:

library(tidyverse)

Some long data that should be wide

Six observations of variable y, but three measured before some treatment and three measured after.
Really matched pairs, so want column of y-values for pre and for post.
pivot_wider.

What happens here?

d %>% pivot_wider(names_from = time, values_from = y)

Should be three pre values and three post. Why did this happen?
pivot_wider needs to know which row to put each observation in.
Uses combo of columns not named in pivot_wider, here obs (only).

The problem

d %>% pivot_wider(names_from = time, values_from = y)

There are 6 different obs values, so 6 different rows.
No data for obs 2 and pre, so that cell missing (NA).
Not enough data (6 obs) to fill 12 (\(= 2 \times 6\)) cells.
obs needs to say which subject provided which 2 observations.

Fixing it up

d2

column subject shows which subject provided each pre and post.
when we do pivot_wider, now only 3 rows, one per subject.

Coming out right

d2 %>% pivot_wider(names_from = time, values_from = y)

row each observation goes to determined by other column subject, and now a pre and post for each subject.
right layout for matched pairs \(t\) or to make differences for sign test or normal quantile plot.

Another example

Two independent samples this time

These should be arranged like this
but what if we make them wider?

Wider

d3 %>% pivot_wider(names_from = group, values_from = y)

row determined by what not used for pivot_wider: nothing!
everything smooshed into one row!
this time, too much data for the layout.
Four data values squeezed into each of the two cells: “list-columns”.

Get the data out

To expand list-columns out into the data values they contain, can use unnest:

d3 %>% pivot_wider(names_from = group, values_from = y) %>% 
  unnest(c(control, treatment))

in this case, wrong layout, because data values not paired.