Duality between confidence intervals and hypothesis tests
Tests and CIs really do the same thing, if you look at them the right way. They are both telling you something about a parameter, and they use same things about data.
Welch Two Sample t-test
data: y by group
t = -2.0937, df = 8.7104, p-value = 0.0668
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-5.5625675 0.2292342
sample estimates:
mean in group 1 mean in group 2
13.00000 15.66667
90% CI
t.test(y ~ group, data = twogroups, conf.level =0.90)
Welch Two Sample t-test
data: y by group
t = -2.0937, df = 8.7104, p-value = 0.0668
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
90 percent confidence interval:
-5.010308 -0.323025
sample estimates:
mean in group 1 mean in group 2
13.00000 15.66667
Hypothesis test
Null is that difference in means is zero:
t.test(y ~ group, mu=0, data = twogroups)
Welch Two Sample t-test
data: y by group
t = -2.0937, df = 8.7104, p-value = 0.0668
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-5.5625675 0.2292342
sample estimates:
mean in group 1 mean in group 2
13.00000 15.66667
Comparing results
Recall null here is \(H_0 : \mu_1 - \mu_2 = 0\). P-value 0.0668.
95% CI from \(-5.6\) to \(0.2\), contains \(0\).
90% CI from \(-5.0\) to \(-0.3\), does not contain \(0\).
At \(\alpha = 0.05\), would not reject \(H_0\) since P-value \(> 0.05\).
At \(\alpha = 0.10\), would reject \(H_0\) since P-value \(< 0.10\).
Test and CI
Not just coincidence. Let \(C = 100(1 - \alpha)\), so C% gives corresponding CI to level-\(\alpha\) test. Then following always true. (Symbol \(\iff\) means “if and only if”.)
Test decision
Confidence interval
Reject \(H_0\) at level \(\alpha\)
\(\iff\)
\(C\%\) CI does not contain \(H_0\) value
Do not reject \(H_0\) at level \(\alpha\)
\(\iff\)
\(C\%\) CI contains \(H_0\) value
Idea: “Plausible” parameter value inside CI, not rejected; “Implausible” parameter value outside CI, rejected.
The value of this
If you have a test procedure but no corresponding CI:
you make a CI by including all the parameter values that would not be rejected by your test.
Use:
\(\alpha = 0.01\) for a 99% CI,
\(\alpha = 0.05\) for a 95% CI,
\(\alpha = 0.10\) for a 90% CI, and so on.
Testing for non-normal data
The IRS (“Internal Revenue Service”) is the US authority that deals with taxes (like Revenue Canada).
One of their forms is supposed to take no more than 160 minutes to complete. A citizen’s organization claims that it takes people longer than that on average.
Sample of 30 people; time to complete form recorded.
Read in data, and do \(t\)-test of \(H_0 : \mu = 160\) vs. \(H_a : \mu > 160\).
For reading in, there is only one column, so can pretend it is delimited by anything.
with(irs, t.test(Time, mu =160, alternative ="greater"))
One Sample t-test
data: Time
t = 1.8244, df = 29, p-value = 0.03921
alternative hypothesis: true mean is greater than 160
95 percent confidence interval:
162.8305 Inf
sample estimates:
mean of x
201.2333
Reject null; mean (for all people to complete form) greater than 160.
But how to test whether the median is greater than 160?
Idea: if the median really is 160 (\(H_0\) true), the sampled values from the population are equally likely to be above or below 160.
If the population median is greater than 160, there will be a lot of sample values greater than 160, not so many less. Idea: test statistic is number of sample values greater than hypothesized median.
Getting a P-value for sign test 1/3
How to decide whether “unusually many” sample values are greater than 160? Need a sampling distribution.
If \(H_0\) true, pop. median is 160, then each sample value independently equally likely to be above or below 160.
So number of observed values above 160 has binomial distribution with \(n = 30\) (number of data values) and \(p = 0.5\) (160 is hypothesized to be median).
Getting P-value for sign test 2/3
Count values above/below 160:
irs %>%count(Time >160)
17 above, 13 below. How unusual is that? Need a binomial table.
Getting P-value for sign test 3/3
R function dbinom gives the probability of eg. exactly 17 successes in a binomial with \(n = 30\) and \(p = 0.5\):
dbinom(17, 30, 0.5)
[1] 0.1115351
but we want probability of 17 or more, so get all of those, find probability of each, and add them up:
Testing whether population median greater than 160, so want upper-tail P-value 0.2923. Same as before.
Also get table of values above and below; this too as we got.
Comments (2/3)
P-values are:
Test
P-value
\(t\)
0.0392
Sign
0.2923
These are very different: we reject a mean of 160 (in favour of the mean being bigger), but clearly fail to reject a median of 160 in favour of a bigger one.
The mean is pulled a long way up by the right skew, and is a fair bit bigger than 160.
The median is quite close to 160.
We ought to be trusting the sign test and not the t-test here (median and not mean), and therefore there is no evidence that the “typical” time to complete the form is longer than 160 minutes.
Having said that, there are clearly some people who take a lot longer than 160 minutes to complete the form, and the IRS could focus on simplifying its form for these people.
In this example, looking at any kind of average is not really helpful; a better question might be “do an unacceptably large fraction of people take longer than (say) 300 minutes to complete the form?”: that is, thinking about worst-case rather than average-case.
Confidence interval for the median
The sign test does not naturally come with a confidence interval for the median.
So we use the “duality” between test and confidence interval to say: the (95%) confidence interval for the median contains exactly those values of the null median that would not be rejected by the two-sided sign test (at \(\alpha = 0.05\)).
For our data
The procedure is to try some values for the null median and see which ones are inside and which outside our CI.
smmr has pval_sign that gets just the 2-sided P-value:
pval_sign(160, irs, Time)
[1] 0.5846647
Try a couple of null medians:
pval_sign(200, irs, Time)
[1] 0.3615946
pval_sign(300, irs, Time)
[1] 0.001430906
So 200 inside the 95% CI and 300 outside.
Doing a whole bunch
Choose our null medians first:
(d <-tibble(null_median=seq(100,300,20)))
… and then
“for each null median, run the function pval_sign for that null median and get the P-value”:
d %>%rowwise() %>%mutate(p_value =pval_sign(null_median, irs, Time))
Comments