Worksheet 12

Published

November 21, 2025

This worksheet is for extra practice on the last week of lecture material. It is not attached to a tutorial, and thus my solutions will be available immediately (but you are still encouraged to try this worksheet first, and then look at my solutions to see how you did).

Packages

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(broom)

I use broom later, but you may not need to.

Writing a function to do wide \(t\)-test

The way we know how to run a two-sample \(t\)-test is to arrange the data “long”: that is, to have one column with all the data in it, and a second column saying which of the two groups each data value belongs to. However, sometimes we get data in two separate columns, and we would like to make a function that will run the \(t\)-test on data in this form.

As an example, suppose we have data on student test scores for some students who took a course online (asynchronously), and some other students who took the same course in-person. The students who took the course online scored 32, 37, 35, 28; the students who took the course in-person scored 35, 31, 29, 25. (There were actually a lot more students than this, but these will do to test with.)

Enter these data into two vectors called online and classroom respectively.

Describe what happens when you pass each of your two vectors into enframe.

The function bind_rows glues together two (or more) dataframes (most commonly, with the same column names), one above the other. Its inputs are as many dataframes as you want to give it, and in addition an input .id which is the name of a column that will identify which dataframe each row came from. Use all of this to glue together the two dataframes you made with enframe.

Explain briefly why the dataframe you just created is suitable for running a two-sample \(t\)-test on, and run a two-sided (Welch) two-sample \(t\)-test.

Write a function that takes two vectors as input. Call them x and y. The function should run a two-sample (two-sided, Welch) \(t\)-test on the two vectors as input and return the output from the \(t\)-test.

Test your function on the same data you used earlier, and verify that you get the same P-value.

Modify your function to return just the P-value from the \(t\)-test, as a number. Hint: you will need to find out how to get the P-value out of the t.test output. You might find this page useful.

Test your modified function and demonstrate that it does indeed return only the P-value (and not the rest of the \(t\)-test output).

Modify your function to allow any inputs that t.test accepts. Demonstrate that your modified function works by obtaining a pooled \(t\)-test for the test score data, with a one-sided alternative that the online students had a higher mean mark than the classroom students.