Why you would want to do the opposite of counting
You probably know about count
, which tells you how many observations you have in each group:
d <- tribble(
~g, ~y,
"a", 10,
"a", 13,
"a", 14,
"a", 14,
"b", 6,
"b", 7,
"b", 9
)
There are four observations in group a
and three in group b
:
d %>% count(g) -> counts
counts
# A tibble: 2 × 2
g n
<chr> <int>
1 a 4
2 b 3
I didn’t know about this until fairly recently. Until then, I thought you had to do this:
d %>% group_by(g) %>%
summarize(count=n())
# A tibble: 2 × 2
g count
<chr> <int>
1 a 4
2 b 3
which works, but is a lot more typing.
The other day, I had the opposite problem. I had a table of frequencies, and I wanted to get it back to one row per observation (I was fitting a model in Stan, and I didn’t know how to deal with frequencies). I had no idea how you might do that (without something ugly like loops), and I was almost embarrassed to stumble upon this:
counts %>% uncount(n)
# A tibble: 7 × 1
g
<chr>
1 a
2 a
3 a
4 a
5 b
6 b
7 b
My situation was a bit less trivial than that. I had disease category counts of coal miners with different exposures to coal dust:
my_url="https://www.utsc.utoronto.ca/~butler/d29/miners-tab.txt"
miners0 <- read_table(my_url)
miners0
# A tibble: 8 × 4
Exposure None Moderate Severe
<dbl> <dbl> <dbl> <dbl>
1 5.8 98 0 0
2 15 51 2 1
3 21.5 34 6 3
4 27.5 35 5 8
5 33.5 32 10 9
6 39.5 23 7 8
7 46 12 6 10
8 51.5 4 2 5
This needs tidying to get the frequencies all into one column:
miners0 %>%
gather(disease, freq, -Exposure) -> miners
miners
# A tibble: 24 × 3
Exposure disease freq
<dbl> <chr> <dbl>
1 5.8 None 98
2 15 None 51
3 21.5 None 34
4 27.5 None 35
5 33.5 None 32
6 39.5 None 23
7 46 None 12
8 51.5 None 4
9 5.8 Moderate 0
10 15 Moderate 2
# … with 14 more rows
So I wanted to fit an ordered logistic regression in Stan, predicting disease category from exposure, but I didn’t know how to handle the frequencies. If I had one row per miner, I thought…
miners %>% uncount(freq) %>% rmarkdown::paged_table()
… and so I do. (I scrolled down to check, and eventually got past the 98 miners with 5.8 years of exposure and no disease).
From there, you can use this to fit the model, though I would rather have weakly informative priors for their beta
and c
. c
is tricky, since it is ordered, but I used the idea here (near the bottom) and it worked smoothly.
For attribution, please cite this work as
Butler (2019, July 13). Ken's Blog: Un-counting. Retrieved from http://ritsokiguess.site/blogg/posts/2019-07-13-un-counting/
BibTeX citation
@misc{butler2019un-counting, author = {Butler, Ken}, title = {Ken's Blog: Un-counting}, url = {http://ritsokiguess.site/blogg/posts/2019-07-13-un-counting/}, year = {2019} }