Same calculation done three different times, by copying, pasting and editing.
Dangerous: what if you forget to change something after you pasted?
Programming principle: “don’t repeat yourself”.
Hadley Wickham: don’t copy-paste more than twice.
Instead: write a function.
or more simply (“the R way”, better style)
If last line of function calculates value without saving it, that value is returned.
x
. This is the name used inside the function.[1] 4
Error in x - 1: non-numeric argument to binary operator
Call:
lm(formula = y ~ x, data = d)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2 NaN NaN NaN
x 1 NaN NaN NaN
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
By default, the mean of data with a missing value is missing, but if you specify na.rm=TRUE
, the missing values are removed before the mean is calculated.
That is, na.rm
has a default value of FALSE
: that’s what it will be unless you change it.
stopifnot
stopifnot
does nothing.stopifnot
contains one or more logical conditions, and all of them have to be true for function to work. So put in everything that you want to be true.Use pluck
:
x
and a y
.lm
has a lot of options, with defaults, that we might want to change. Instead of intercepting all the possibilities and passing them on, we can do this:...
in the header line means “accept any other input”, and the ...
in the lm
line means “pass anything other than x
and y
straight on to lm
”....
lm
will accept is a vector called subset
containing the list of observations to include in the regression.x
and y
:x
’s to use in regressions, along with the y
we had before:y
from my_df
as the response, and collect together the three different slopes.for
loop.map_dbl
: less coding, but more thinking.i
of data frame d
as d %>% pull(i)
.slopes
to store the slopes.i
goes from 1 to 3 (3 columns, thus 3 slopes):[1] 1.1000000 -1.1000000 0.5140187
lm
s, one at a time.map_dbl
wayd
), run function (slope
) with inputs “it” and y
), and collect together the answers.dbl
), appropriate function-running function is map_dbl
:d
from above: x1 x2 x3
2.50 6.50 5.25
The mean of each column, with the columns labelled.
map
(or map_df
) instead of map_dbl
.$x1
25% 75%
1.75 3.25
$x2
25% 75%
5.75 7.25
$x3
25% 75%
3.50 6.75
Map
in data frames with mutate
map
can also be used within data frames to calculate new columns. Let’s do the square roots of 1 through 10 again:map_
-whatever.hotpo
of it, then find hotpo
of that, and keep going, what happens?x
is not 1”.x
: add to the end of ans
. When I hit 1, I break out of the while
and return the whole ans
. [1] 27 82 41 124 62 31 94 47 142 71 214
[12] 107 322 161 484 242 121 364 182 91 274 137
[23] 412 206 103 310 155 466 233 700 350 175 526
[34] 263 790 395 1186 593 1780 890 445 1336 668 334
[45] 167 502 251 754 377 1132 566 283 850 425 1276
[56] 638 319 958 479 1438 719 2158 1079 3238 1619 4858
[67] 2429 7288 3644 1822 911 2734 1367 4102 2051 6154 3077
[78] 9232 4616 2308 1154 577 1732 866 433 1300 650 325
[89] 976 488 244 122 61 184 92 46 23 70 35
[100] 106 53 160 80 40 20 10 5 16 8 4
[111] 2 1
length
of the vector returned from hotpo_seq
says how long it took to get to 1.tibble(start = 1:100) %>%
mutate(seq_length = map_int(
start, \(start) length(hotpo_seq(start)))) %>%
slice_max(seq_length, n = 10)
sequence
is itself a vector. sequence
is a “list-column”.rowwise
?tibble(start=1:7) %>%
rowwise() %>%
mutate(sequence = 0) %>%
mutate(seq_length = length(sequence)) %>%
mutate(seq_max = max(sequence))
It does.
Comments
d
, according to R’s default definition (see help forquantile
).