Durations, intervals, and periods
Packages for this section
Dates and times live in a package called lubridate
, but this is now part of the tidyverse
.
Exact time intervals
We previously got fractional days (of stays in hospital):
my_url <- "http://ritsokiguess.site/datafiles/hospital.csv"
stays <- read_csv (my_url)
stays %>% mutate (stay_days = (discharge - admit) / ddays (1 ))
but what if we wanted days, hours and minutes?
Intervals
stays %>% mutate (stay = admit %--% discharge)
These are called intervals : they have a start point and an end point.
Periods
To work out the exact length of an interval, in human units, turn it into a period
:
stays %>% mutate (stay = as.period (admit %--% discharge))
A period is exact as long as it has a start and an end (accounting for daylight savings, leap years etc).
Completed days
Take day
of the periods:
stays %>% mutate (stay = as.period (admit %--% discharge)) %>%
mutate (days_of_stay = day (stay))
Completed hours 1/2
Not quite what you think:
stays %>% mutate (stay = as.period (admit %--% discharge)) %>%
mutate (hours_of_stay = hour (stay))
These are completed hours within days.
Completed hours 2/2
To get total hours, count each day as 24 hours also:
stays %>% mutate (stay = as.period (admit %--% discharge)) %>%
mutate (hours_of_stay = hour (stay) + 24 * day (stay))
Durations
What’s the difference between duration
and period
?
stays %>% mutate (stay = as.duration (admit %--% discharge))
A duration is always a number of seconds .
Also shown is an approx equivalent on a more human scale (calculated from seconds).
Sometimes it matters
Days and hours are always the same length (as a number of seconds).
Months and years are not always the same length:
months have different numbers of days
years can be leap years or not
the actual length of 2 months depends which 2 months:
tribble (
~ start, ~ end,
ymd ("2020-01-15" ), ymd ("2020-03-15" ),
ymd ("2020-07-15" ), ymd ("2020-09-15" )
) %>% mutate (period = as.period (start %--% end)) %>%
mutate (duration = as.duration (start %--% end))
Manchester United
Sometime in December 2019 or January 2020, I downloaded some information about the players that were then in the squad of the famous Manchester United Football (soccer) Club. We are going to use the players’ ages (as given) to figure out exactly when the download happened.
my_url <- "http://ritsokiguess.site/datafiles/manu.csv"
read_csv (my_url) %>%
select (name, date_of_birth, age) -> man_united
Ages
A player’s age is the number of completed years since their birth
This suggests:
guessing a download date
working out time since birth as period
extracting number of years
After that, see if our calculations of age match actual ages
Guess download date and work out ages
Guess January 10, 2020 as download date (just to pick a date):
guess <- ymd ("2020-01-10" )
man_united %>%
mutate (dob = dmy (date_of_birth)) %>%
mutate (age_period = as.period (dob %--% guess)) %>%
mutate (age_years = year (age_period)) -> d
Results (just the ages)
d %>% select (name, age, age_years)
Which ones are different?
d %>% filter (age != age_years) %>%
select (name, date_of_birth, age, age_years)
these three players were calculated wrong: we got one year too many.
Our guessed date, January 10, was too late .
These three players had a birthday since the actual download date
actual download date must have been before Dec 15.
Try an earlier date
guess <- ymd ("2019-12-05" )
man_united %>%
mutate (dob = dmy (date_of_birth)) %>%
mutate (age_period = as.period (dob %--% guess)) %>%
mutate (age_years = year (age_period)) %>%
filter (age != age_years) %>%
select (name, date_of_birth, age, age_years) -> d2
Results
Dec 5 was too early for the download date
must have been later than Dec 8 (to get McTominay’s age right)
so must have been between Dec 8 and Dec 15 (Lingard’s birthday)
Actually I downloaded the data on Dec 10.
Comments