The sign test

Packages

library(tidyverse)
library(smmr)

smmr is new. See later how to install it.

Duality between confidence intervals and hypothesis tests

Tests and CIs really do the same thing, if you look at them the right way. They are both telling you something about a parameter, and they use same things about data.
To illustrate, some data (two groups):

my_url <- "http://ritsokiguess.site/datafiles/duality.txt"
twogroups <- read_delim(my_url," ")

The data

twogroups

95% CI (default)

for difference in means, group 1 minus group 2:

t.test(y ~ group, data = twogroups)


    Welch Two Sample t-test

data:  y by group
t = -2.0937, df = 8.7104, p-value = 0.0668
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -5.5625675  0.2292342
sample estimates:
mean in group 1 mean in group 2 
       13.00000        15.66667

90% CI

t.test(y ~ group, data = twogroups, conf.level = 0.90)


    Welch Two Sample t-test

data:  y by group
t = -2.0937, df = 8.7104, p-value = 0.0668
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
90 percent confidence interval:
 -5.010308 -0.323025
sample estimates:
mean in group 1 mean in group 2 
       13.00000        15.66667

Hypothesis test

Null is that difference in means is zero:

t.test(y ~ group, mu=0, data = twogroups)


    Welch Two Sample t-test

data:  y by group
t = -2.0937, df = 8.7104, p-value = 0.0668
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -5.5625675  0.2292342
sample estimates:
mean in group 1 mean in group 2 
       13.00000        15.66667

Comparing results

Recall null here is \(H_0 : \mu_1 - \mu_2 = 0\). P-value 0.0668.

95% CI from \(-5.6\) to \(0.2\), contains \(0\).
90% CI from \(-5.0\) to \(-0.3\), does not contain \(0\).
At \(\alpha = 0.05\), would not reject \(H_0\) since P-value \(> 0.05\).
At \(\alpha = 0.10\), would reject \(H_0\) since P-value \(< 0.10\).

Test and CI

Not just coincidence. Let \(C = 100(1 - \alpha)\), so C% gives corresponding CI to level-\(\alpha\) test. Then following always true. (Symbol \(\iff\) means “if and only if”.)

Test decision		Confidence interval
Reject \(H_0\) at level \(\alpha\)	\(\iff\)	\(C\%\) CI does not contain \(H_0\) value
Do not reject \(H_0\) at level \(\alpha\)	\(\iff\)	\(C\%\) CI contains \(H_0\) value

Idea: “Plausible” parameter value inside CI, not rejected; “Implausible” parameter value outside CI, rejected.

The value of this

If you have a test procedure but no corresponding CI:
you make a CI by including all the parameter values that would not be rejected by your test.
Use:
- \(\alpha = 0.01\) for a 99% CI,
- \(\alpha = 0.05\) for a 95% CI,
- \(\alpha = 0.10\) for a 90% CI, and so on.

Testing for non-normal data

The IRS (“Internal Revenue Service”) is the US authority that deals with taxes (like Revenue Canada).
One of their forms is supposed to take no more than 160 minutes to complete. A citizen’s organization claims that it takes people longer than that on average.
Sample of 30 people; time to complete form recorded.
Read in data, and do \(t\)-test of \(H_0 : \mu = 160\) vs. \(H_a : \mu > 160\).
For reading in, there is only one column, so can pretend it is delimited by anything.

Read in data

my_url <- "http://ritsokiguess.site/datafiles/irs.txt"
irs <- read_csv(my_url)
irs

Test whether mean is 160 or greater

with(irs, t.test(Time, mu = 160, 
                 alternative = "greater"))


    One Sample t-test

data:  Time
t = 1.8244, df = 29, p-value = 0.03921
alternative hypothesis: true mean is greater than 160
95 percent confidence interval:
 162.8305      Inf
sample estimates:
mean of x 
 201.2333

Reject null; mean (for all people to complete form) greater than 160.

But, look at a graph

ggplot(irs, aes(x = Time)) + geom_histogram(bins = 6)

Comments

Skewed to right.
Should look at median, not mean.

The sign test

But how to test whether the median is greater than 160?
Idea: if the median really is 160 (\(H_0\) true), the sampled values from the population are equally likely to be above or below 160.
If the population median is greater than 160, there will be a lot of sample values greater than 160, not so many less. Idea: test statistic is number of sample values greater than hypothesized median.

Getting a P-value for sign test 1/3

How to decide whether “unusually many” sample values are greater than 160? Need a sampling distribution.
If \(H_0\) true, pop. median is 160, then each sample value independently equally likely to be above or below 160.
So number of observed values above 160 has binomial distribution with \(n = 30\) (number of data values) and \(p = 0.5\) (160 is hypothesized to be median).

Getting P-value for sign test 2/3

Count values above/below 160:

irs %>% count(Time > 160)

17 above, 13 below. How unusual is that? Need a binomial table.

Getting P-value for sign test 3/3

R function dbinom gives the probability of eg. exactly 17 successes in a binomial with \(n = 30\) and \(p = 0.5\):

dbinom(17, 30, 0.5)

[1] 0.1115351

but we want probability of 17 or more, so get all of those, find probability of each, and add them up:

tibble(x=17:30) %>% 
  mutate(prob=dbinom(x, 30, 0.5)) %>% 
  summarize(total=sum(prob))

Aside

pbinom gives the cumulative probability (prob. of less than or equal than the first input):

pbinom(17, 30, 0.5) # prob of <= 17

[1] 0.8192027

and hence (note first input):

pbinom(16, 30, 0.5, lower.tail = FALSE)

[1] 0.2923324

This last is \(P(X \ge 17) = P(X > 16)\).

Using my package `smmr`

I wrote a package smmr to do the sign test (and some other things). Installation is a bit fiddly:
- Install devtools (once) with

install.packages("devtools")

then install smmr using devtools (once):

library(devtools)
install_github("nxskok/smmr")

Then load it:

library(smmr)

`smmr` for sign test

smmr’s function sign_test needs three inputs: a data frame, a column and a null median:

sign_test(irs, Time, 160)

$above_below
below above 
   13    17 

$p_values
  alternative   p_value
1       lower 0.8192027
2       upper 0.2923324
3   two-sided 0.5846647

Comments (1/4)

Testing whether population median greater than 160, so want upper-tail P-value 0.2923. Same as before.
Also get table of values above and below; this too as we got.

Comments (2/4)

P-values are:

Test	P-value
\(t\)	0.0392
Sign	0.2923

These are very different: we reject a mean of 160 (in favour of the mean being bigger), but clearly fail to reject a median of 160 in favour of a bigger one.

Comments 3/4

Why is that? Obtain mean and median:

irs %>% summarize(mean_time = mean(Time), 
                  median_time = median(Time))

Comments (4/4)

The mean is pulled a long way up by the right skew, and is a fair bit bigger than 160.
The median is quite close to 160.
We ought to be trusting the sign test and not the t-test here (median and not mean), and therefore there is no evidence that the “typical” time to complete the form is longer than 160 minutes.
Having said that, there are clearly some people who take a lot longer than 160 minutes to complete the form, and the IRS could focus on simplifying its form for these people.
In this example, looking at any kind of average is not really helpful; a better question might be “do an unacceptably large fraction of people take longer than (say) 300 minutes to complete the form?”: that is, thinking about worst-case rather than average-case.

Confidence interval for the median

The sign test does not naturally come with a confidence interval for the median.
So we use the “duality” between test and confidence interval to say: the (95%) confidence interval for the median contains exactly those values of the null median that would not be rejected by the two-sided sign test (at \(\alpha = 0.05\)).

For our data

The procedure is to try some values for the null median and see which ones are inside and which outside our CI.
smmr has pval_sign that gets just the 2-sided P-value:

pval_sign(160, irs, Time)

[1] 0.5846647

Try a couple of null medians:

pval_sign(200, irs, Time)

[1] 0.3615946

pval_sign(300, irs, Time)

[1] 0.001430906

So 200 inside the 95% CI and 300 outside.

Doing a whole bunch

Choose our null medians first:

(d <- tibble(null_median=seq(100,300,20)))

… and then

“for each null median, run the function pval_sign for that null median and get the P-value”:

d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time))

Make it easier for ourselves

d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time)) %>% 
  mutate(in_out = ifelse(p_value > 0.05, "inside", "outside"))

confidence interval for median?

95% CI to this accuracy from 120 to 200.
Can get it more accurately by looking more closely in intervals from 100 to 120, and from 200 to 220.

A more efficient way: bisection

Know that top end of CI between 200 and 220:

lo <- 200 
hi <- 220

Try the value halfway between: is it inside or outside?

try <- (lo + hi) / 2
try

[1] 210

pval_sign(try,irs,Time)

[1] 0.09873715

Inside, so upper end is between 210 and 220. Repeat (over):

… bisection continued

lo <- try
try <- (lo + hi) / 2
try

[1] 215

pval_sign(try, irs, Time)

[1] 0.06142835

215 is inside too, so upper end between 215 and 220.
Continue until have as accurate a result as you want.

Bisection automatically

A loop, but not a for since we don’t know how many times we’re going around. Keep going while a condition is true:

lo = 200
hi = 220
while (hi - lo > 1) {
  try = (hi + lo) / 2
  ptry = pval_sign(try, irs, Time)
  print(c(try, ptry))
  if (ptry <= 0.05)
    hi = try
  else
    lo = try
}

The output from this loop

[1] 210.00000000   0.09873715
[1] 215.00000000   0.06142835
[1] 217.50000000   0.04277395
[1] 216.25000000   0.04277395
[1] 215.62500000   0.04277395

215 inside, 215.625 outside. Upper end of interval to this accuracy is 215.

Using smmr

smmr has function ci_median that does this (by default 95% CI):

ci_median(irs, Time)

[1] 119.0065 214.9955

Uses a more accurate bisection than we did.
Or get, say, 90% CI for median:

ci_median(irs, Time, conf.level=0.90)

[1] 123.0031 208.9960

90% CI is shorter, as it should be.

Bootstrap

but, was the sample size (30) big enough to overcome the skewness?
Bootstrap, again:

tibble(sim = 1:10000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(irs$Time, 
                                 replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(sample = my_mean)) + stat_qq() + 
    stat_qq_line() -> g

The sampling distribution

Comments

A little skewed to right, but not nearly as much as I was expecting.
The \(t\)-test for the mean might actually be OK for these data, if the mean is what you want.
In actual data, mean and median very different; we chose to make inference about the median.
Thus for us it was right to use the sign test.

The sign test

Packages

Duality between confidence intervals and hypothesis tests

The data

95% CI (default)

90% CI

Hypothesis test

Comparing results

Test and CI

The value of this

Testing for non-normal data

Read in data

Test whether mean is 160 or greater

But, look at a graph

Comments

The sign test

Getting a P-value for sign test 1/3

Getting P-value for sign test 2/3

Getting P-value for sign test 3/3

Aside

Using my package smmr

smmr for sign test

Comments (1/4)

Comments (2/4)

Comments 3/4

Comments (4/4)

Confidence interval for the median

For our data

Doing a whole bunch

… and then

Make it easier for ourselves

confidence interval for median?

A more efficient way: bisection

… bisection continued

Bisection automatically

The output from this loop

Using smmr

Bootstrap

The sampling distribution

Comments

Using my package `smmr`

`smmr` for sign test