── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom) # some regression stuff later
Don’t repeat yourself
See this:
a <-50b <-11d <-3as <-sqrt(a -1)as
[1] 7
bs <-sqrt(b -1)bs
[1] 3.162278
ds <-sqrt(d -1)ds
[1] 1.414214
What’s the problem?
Same calculation done three different times, by copying, pasting and editing.
Dangerous: what if you forget to change something after you pasted?
Programming principle: “don’t repeat yourself”.
Hadley Wickham: don’t copy-paste more than twice.
Instead: write a function.
Anatomy of function
Header line with function name and input value(s).
Body with calculation of values to output/return.
Return value: the output from function. In our case:
sqrt_minus_1 <-function(x) { ans <-sqrt(x -1)return(ans)}
or more simply (“the R way”, better style)
sqrt_minus_1 <-function(x) {sqrt(x -1)}
If last line of function calculates value without saving it, that value is returned.
About the input; testing
The input to a function can be called anything. Here we called it x. This is the name used inside the function.
The function is a “machine” for calculating square-root-minus-1. It doesn’t do anything until you call it:
sqrt_minus_1(50)
[1] 7
sqrt_minus_1(11)
[1] 3.162278
sqrt_minus_1(3)
[1] 1.414214
q <-17sqrt_minus_1(q)
[1] 4
sqrt_minus_1("text")
Error in x - 1: non-numeric argument to binary operator
It works!
Vectorization 1/2
We conceived our function to work on numbers:
sqrt_minus_1(3.25)
[1] 1.5
but it actually works on vectors too, as a free bonus of R:
sqrt_minus_1(c(50, 11, 3))
[1] 7.000000 3.162278 1.414214
or… (over)
Vectorization 2/2
or even data frames:
d <-data.frame(x =1:2, y =3:4)d
x y
1 1 3
2 2 4
sqrt_minus_1(d)
x y
1 0 1.414214
2 1 1.732051
More than one input
Allow the value to be subtracted, before taking square root, to be input to function as well, thus:
sqrt_minus_value <-function(x, d) {sqrt(x - d)}
Call the function with the x and d inputs in the right order:
sqrt_minus_value(51, 2)
[1] 7
or give the inputs names, in which case they can be in any order:
sqrt_minus_value(d =2, x =51)
[1] 7
lm(y ~ x, data = d)
Call:
lm(formula = y ~ x, data = d)
Coefficients:
(Intercept) x
2 1
Defaults 1/2
Many R functions have values that you can change if you want to, but usually you don’t want to, for example:
x <-c(3, 4, 5, NA, 6, 7)mean(x)
[1] NA
mean(x, na.rm =TRUE)
[1] 5
By default, the mean of data with a missing value is missing, but if you specify na.rm=TRUE, the missing values are removed before the mean is calculated.
That is, na.rm has a default value of FALSE: that’s what it will be unless you change it.
Defaults 2/2
In our function, set a default value for d like this:
sqrt_minus_value <-function(x, d =1) {sqrt(x - d)}
If you specify a value for d, it will be used. If you don’t, 1 will be used instead:
sqrt_minus_value(51, 2)
[1] 7
sqrt_minus_value(51)
[1] 7.071068
Catching errors before they happen
What happened here?
sqrt_minus_value(6, 8)
Warning in sqrt(x - d): NaNs produced
[1] NaN
Message not helpful. Actually, function tried to take square root of negative number.
In fact, not even error, just warning.
Check that the square root will be OK first. Here’s how:
sqrt_minus_value <-function(x, d =1) {stopifnot(x - d >=0)sqrt(x - d)}
What happens with stopifnot
This should be good, and is:
sqrt_minus_value(8, 6)
[1] 1.414214
This should fail, and see how it does:
sqrt_minus_value(6, 8)
Error in sqrt_minus_value(6, 8): x - d >= 0 is not TRUE
Where the function fails, we get informative error, but if everything good, the stopifnot does nothing.
stopifnot contains one or more logical conditions, and all of them have to be true for function to work. So put in everything that you want to be true.
Using R’s built-ins
When you write a function, you can use anything built-in to R, or even any functions that you defined before.
For example, if you will be calculating a lot of regression-line slopes, you don’t have to do this from scratch: you can use R’s regression calculations, like this:
my_df <-data.frame(x =1:4, y =c(10, 11, 10, 14))my_df
x y
1 1 10
2 2 11
3 3 10
4 4 14
my_df.1<-lm(y ~ x, data = my_df)summary(my_df.1)
Call:
lm(formula = y ~ x, data = my_df)
Residuals:
1 2 3 4
0.4 0.3 -1.8 1.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.5000 1.8775 4.527 0.0455 *
x 1.1000 0.6856 1.605 0.2498
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.533 on 2 degrees of freedom
Multiple R-squared: 0.5628, Adjusted R-squared: 0.3442
F-statistic: 2.574 on 1 and 2 DF, p-value: 0.2498
tidy(my_df.1)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 8.5 1.88 4.53 0.0455
2 x 1.1 0.686 1.60 0.250
Pulling out just the slope
Use pluck:
tidy(my_df.1) %>%pluck("estimate", 2)
[1] 1.1
Making this into a function
First step: make sure you have it working without a function (we do)
lm has a lot of options, with defaults, that we might want to change. Instead of intercepting all the possibilities and passing them on, we can do this:
If the “for each” part is simple, go ahead and use map_-whatever.
If not, write a function to do the complicated thing first.
Example: “half or triple plus one”: if the input is an even number, halve it; if it is an odd number, multiply it by three and add one.
This is hard to do as a one-liner: first we have to figure out whether the input is odd or even, and then we have to do the right thing with it.
Odd or even?
Odd or even? Work out the remainder when dividing by 2:
6%%2
[1] 0
5%%2
[1] 1
5 has remainder 1 so it is odd.
Write the function
First test for integerness, then test for odd or even, and then do the appropriate calculation:
hotpo <-function(x) {stopifnot(round(x) == x) # passes if input an integer remainder <- x %%2if (remainder ==1) { # odd number ans <-3* x +1 }else { # even number ans <- x %/%2# integer division } ans}
Comments
d
, according to R’s default definition (see help forquantile
).