Matched pairs

Matched pairs

Some data:

Matched pairs 1/2

  • Data are comparison of 2 drugs for effectiveness at reducing pain.

    • 12 subjects (cases) were arthritis sufferers
    • Response is #hours of pain relief from each drug.
  • In reading example, each child tried only one reading method.

  • But here, each subject tried out both drugs, giving us two measurements.

  • Possible because, if you wait long enough, one drug has no influence over effect of other.

Matched pairs 2/2

  • Advantage: focused comparison of drugs. Compare one drug with another on same person, removes a lot of variability due to differences between people.

  • Matched pairs, requires different analysis.

  • Design: randomly choose 6 of 12 subjects to get drug A first, other 6 get drug B first.

Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(smmr) # for a sign test later

Reading the data

Values aligned in columns:

my_url <- 
  "http://ritsokiguess.site/datafiles/analgesic.txt"
pain <- read_table(my_url)

── Column specification ────────────────────────────────────────────────────────
cols(
  subject = col_double(),
  druga = col_double(),
  drugb = col_double()
)
pain
# A tibble: 12 × 3
   subject druga drugb
     <dbl> <dbl> <dbl>
 1       1   2     3.5
 2       2   3.6   5.7
 3       3   2.6   2.9
 4       4   2.6   2.4
 5       5   7.3   9.9
 6       6   3.4   3.3
 7       7  14.9  16.7
 8       8   6.6   6  
 9       9   2.3   3.8
10      10   2     4  
11      11   6.8   9.1
12      12   8.5  20.9
glimpse(pain)
Rows: 12
Columns: 3
$ subject <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
$ druga   <dbl> 2.0, 3.6, 2.6, 2.6, 7.3, 3.4, 14.9, 6.6, 2.3, 2.0, 6.8, 8.5
$ drugb   <dbl> 3.5, 5.7, 2.9, 2.4, 9.9, 3.3, 16.7, 6.0, 3.8, 4.0, 9.1, 20.9

Paired t-test

with(pain, t.test(druga, drugb, paired = TRUE))

    Paired t-test

data:  druga and drugb
t = -2.1677, df = 11, p-value = 0.05299
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -4.29941513  0.03274847
sample estimates:
mean difference 
      -2.133333 
  • P-value is 0.053.
  • Not quite evidence of difference between drugs.

t-testing the differences

  • Likewise, you can calculate the differences yourself and then do a 1-sample t-test on them.
pain %>% mutate(diff = druga - drugb) -> pain
pain
# A tibble: 12 × 4
   subject druga drugb    diff
     <dbl> <dbl> <dbl>   <dbl>
 1       1   2     3.5  -1.5  
 2       2   3.6   5.7  -2.1  
 3       3   2.6   2.9  -0.300
 4       4   2.6   2.4   0.200
 5       5   7.3   9.9  -2.6  
 6       6   3.4   3.3   0.100
 7       7  14.9  16.7  -1.80 
 8       8   6.6   6     0.600
 9       9   2.3   3.8  -1.5  
10      10   2     4    -2    
11      11   6.8   9.1  -2.3  
12      12   8.5  20.9 -12.4  

t-test on the differences

  • then throw them into t.test, testing that the mean is zero, with same result as before:
with(pain, t.test(diff, mu = 0))

    One Sample t-test

data:  diff
t = -2.1677, df = 11, p-value = 0.05299
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -4.29941513  0.03274847
sample estimates:
mean of x 
-2.133333 
  • Same P-value (0.053) and conclusion.

Assessing normality

  • 1-sample and 2-sample t-tests assume (each) group normally distributed.
  • Matched pairs analyses assume (theoretically) that differences normally distributed.
  • How to assess normality? A normal quantile plot.

The normal quantile plot (of differences)

ggplot(pain,aes(sample=diff))+stat_qq()+stat_qq_line()
  • Points should follow the straight line. Bottom left one way off, so normality questionable here: outlier.

What to do instead?

  • Matched pairs \(t\)-test based on one sample of differences
  • the differences not normal (enough)
  • so do sign test on differences, null median 0:
sign_test(pain, diff, 0)
$above_below
below above 
    9     3 

$p_values
  alternative    p_value
1       lower 0.07299805
2       upper 0.98071289
3   two-sided 0.14599609

Did we need to worry about that outlier?

Bootstrap sampling distribution of sample mean differences:

tibble(sim = 1:10000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(pain$diff, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(sample = my_mean)) + stat_qq() + stat_qq_line()

Yes we did; this is clearly skewed left and not normal.

Comments

  • no evidence of any difference between drugs (P-value 0.1460)
  • in \(t\)-test, the low outlier difference pulled mean difference downward and made it look more negative than it should have been
  • therefore, there really isn’t any difference between the drugs.