Chapter 4 “Confidence Interval of a Proportion”

Preamble

On proportions, frequencies, and percentages

Like many books, Motulsky starts of by discussing proportional data. Proportional data can also sometimes be called frequencies; a useful, mathematically precise term is binomial proportions. They occur when you have a certain, discrete number of things happen, such as full-term births, and you count the frequency or calculate the proportion of a specific event, such as a child having brown eyes. Mathematically the “things happening” are often called “trials” and the outcome of interest are often called “successes”, though “events” or “outcomes” makes more sense to me. In stats books you will often see the term “Bernouli trial” used to refer to a single binomial trial. Flipping a coin once is a Bernouli trial.

Proportions are often conveyed in terms of percentages, such as “20% of child born in Pittsburgh have brown eyes.” Percentages are tricky in stats because you have to keep in mind whether the percentage is derived from counting up discrete events that can be counted with absolute precision (babies born with blue eyes) or is a continuous quantity (the percentage of a child’s face they’ve covered with splatter from the food they’ve eaten).

Another term commonly used is binary data. Binary indicates that an event can take on one of two values; in practice, the values “0” and “1” are used when doing the underlying math even though “0” might mean “eyes not brown” and 1 means “eyes brown.”

Proportional data are very common in biology: number of fruit flies that are virgin, number of flowers eaten by deer, number of tadpoles surviving a exposure to a toxin. Proportion data are also easy to work with and the math for calculating things like confidence intervals intervals and especially p-values is much easier than for continuous varibles such as the length of a fruit fly wing or the height of a plant.

On counting “events” versus counting things

Imagine you work for the Demography Department at Magee Women’s Hospital in Pittsburgh. Your job every day is to visit everyone room in the hospital and count two things: 1) the number of visitors in each room and 2)ikm 5444155203/the number of brown-eyed babies out of the total number of babies. The first task is so that the hopital can determine how many people are visiting and the second task is to determine the percentage of brown-eyed babies born to the hosptial.

For both tasks you are counting, but there is a very important but subtle difference between these two tasks. For the first task, you are counting up the number of people in each room, which could vary from zero (un-occupied) to a potentially large number if many people are visiting a newborn. Each data point is a number, either zero or something larger.

For the second task you can think of it as counting up the number of brown-eyed babies, and this count could take on many different values depending on how many babies are born. However, this count is bounded by the total number of babies born. Similarly, key to what you want to know is the number of brown-eyed babies out of the total number of babies; you are therefore setting up a proportion. The former example doesn’t invovle a proportion and is simply an un-bounded count.

I’m belaboring this because this because I have frequently seen where misunderstanding the different statistical uses of the term “count” has resulted in biologists (to be fair, only ecologists so far) selecting an incorrect statistical procedure.

On confidence intervals versus p-values

Most books start with p-values then move on to confidence intervals; while the two things are intimately linked and derived from the same calculations, confidence intervals convey much more information. Motulsky starts off with confidence intervals in this chapter.

Vocabulary

Motulsky vocab

  • Bias
  • Binomial variable
  • Binomial distribution
  • Confidence interval (CI)
  • Confidence level
  • Credible interval
  • Point estimate
  • Random sampling
  • Sampling error
  • Simulation
  • Uncertainty interval

Additional vocab

  • binary data
  • binomial proportion
  • frequency
  • proportion

Chapter Notes

4.1 “Data Expressed as Proportions”

4.2 “The Binomial Distribution: From Population to Sample”

4.3 “Example: Free Throws in Basketball”

4.4 (“Example: Deaths of Premature Babies”)

Alive vs. dead is one of the most basic binary conditions in biology.

4.5 “Example: Polling Voters”

4.6 “Assumptions: Confidence Interval of a Proportion”

4.6.1 “Assumption: Random (or representative) sample”

4.6.2 “Assumption: Independent observations”

Proportional data can be tricky because the key to the statistics used to analyze them in most cases is that all of the events are independent. For example, each non-twin child born in a hosptial on the first day of month is essentially an independent trial. Each child has different parents, a different gestational environment, and most relevant if you are counting up the number with brown hair, different genetics. So, the hair color of one child born on the first day of the month has no impact on the hair color of another child; they are unlinked and unrelated.

In contrast, there its possible that the fates of mothers while giving birth are not independnet. For example, what if we want to know the number of women who originally intended to give birth vaginally but ended up having a cecarian (c-section)? Each women is different, but they are likely to be attended to by the same attending physician, who can vary in their approach to delivery and when they recommend a c-section. So if 20 women give birth on the first day of the month, they hair color of their babies are all independent data points, but whether these women had c-sections or not is potentially not independent.

4.7 “Assumption: Accurate Data”

4.8 “What Does 95% Confidence Really Mean”

4.8.1 “A simulation”

[ ] Figure 4.2: What would happen if you collected many samples and computed a 95% CI for each

This figure is very important. The thought experiment where you hypothetically re-run your study or experiment many times is central to the concept of what confidence intervals and p-values are.

4.8.2 “95% Change of What?”"

4.8.3 “What is Special About 95%”

[ ] Nothing. Absolutely nothing. This cannot be repeated enough. There is nothing sacred scientifically or mathematically about 95% or a p-value that is less than 0.05.

[ ] This is so important I will repeate it again. There is nothing sacred scientifically or mathematically about 95% or a p-value that is less than 0.05.

[ ] There has even recently been a call to try to get people to not call something “significant” unless is <0.005 (equivalent to using a 99.5 % CI). This has resulted in a lot of discussion in journals, blogs, and twitter, with frequentists arguing with frequentists, some Bayesians offering their alternatives to signficance tests (eg Wagenmakers) and other Bayesian saying we need to get rid of hypothesis testing entirely (Gelman). There are many interesting blog posts and published opinion pieces on this now.

Like most stats books Motulsky mentions the possiblity of calculating 90% CIs that are more lax, or 99% CIs that are more stringent. Most books have you do exercises where you calcualte different CIs. In the biological sciences I have never seen anything but a 95% CI. I think in manufacturing applications of statistics and other fields perhaps this is more common.

[ ] There has been some discussion that is should be more common to think about what level of “confidence” you want or need to make a descision and adjucting your CI accordingly. This is discussed in print and via the blogs by the pyschologist Daniel Lakens, and I believe Richard Morey.

4.8.4 (“What If The Assumption Are Violated”)

THis section appeared in previous editions of the book

4.9 “Are You Quantifying the Event You Really Care About?”

4.10 Lingo

4.10.1 CI versuse confidence limits

“confidence limit” isn’t used too much in practice.

4.10.2 Estimate

4.10.3 Confidence level.

[ ] Again, there is nothing special about 95%.

4.10.4 Uncertainty Interval

Uncertainty interval is a proposed replacement term for confidence interval. I have never seen it used, except by those who have proposed it.

4.11 “Calculating The CI of a Proportion”"

4.11.1 “Several methods are commonly used”"

There are many ways to deal with binomial data in general, and in R. The basic ones usually show up in an intro stats course are

  • Binomial test: binom.test()
  • Test for equal proportions: prop.test()
  • Chi^2 test: chisq.test()

All of these produce similar result and are probably mathematically related if you start to dig into them, which I haven’t done lately.

This profusion of different tests is one annoying feature of the traditional way statistics is typically taught and the way most intro-level stats books are written. In contemporary applided statistics, binomial data like this are likely to be analyzed using something called “logistic regression” or a “binomial general linear model”. A general linear model is often called a GLM for short. Motulsky doesn’t go all the way into developing GLMs but he is generally oriented in that direction, which is good.

To be more precise, there are both

  • Multiple ways to analyze these data to get a p-value
  • Multiple ways to calcualte a confidence interval

The confidence interval issue is what Motulsky focuses on here. I will work through these calculations in R by hand and also how to use common functions to get them, which also yield p-values.

4.11.2 “How to compute the modified Wald method by Hand”

This is a good computational exercise and so we’ll work through the details. For published papers use a computer to do the work!

We’ll work with the following quantities

  • S = the observed number of “successes”
  • n = number of binomial “trials”

Is this the same as gets repeated below? What was I doing … :(

# Step 0: the data
S <- 31   #S = successes = num. infants surviving to 6 months
n <- 39   #n = number of trials = total num. infants in study
z <- 1.96 #z = 

#calculate z^2
z2 <- 1.96^2

#round it off to 3 decimal places
z2 <- round(z2, 3)

# Step 1: calculate the Wald-corrected proportion
## observed proportion
P.obs  <- S/n
P.obs  <-  round(P.obs, 3)           

## "corrected" proportion
### Set up whole formula
P.corr <-  (S+z)/(n+z^2)

### might be easier to see if done in steps
numerator <- S+z
denominator <- n+z^2
P.corr <- numerator/denominator

### round off
P.corr <- round(P.corr, 3)

# Compute half-width of the CI
## In one step
W <- z*sqrt(P.corr*(1-P.corr)/(n+z^2))

## in parts
### calculate numerator and denominator
W.numerator   <- P.corr*(1-P.corr)
W.denominator <- n+z^2

### calcualte W
W             <- z*sqrt(W.numerator/W.denominator)
  
## round W off 
W <- round(W, 3)

# Calculate CI bounds
lower.CI <- P.corr - W
upper.CI <- P.corr + W

P.obs.calc  <- "=31/39"
P.corr.calc <- "=(31+1.96)/(39+1.96^2)"
The calculations can be laid out in a table like this
A B C D E
Data Calcualted.val
survived 31 P.observed = 0.795
deceased 8 P.corrected = 0.769
n.total 39 W.numerator = 32.96
z 1.96 W.denominator= 42.8416
z.squared 3.842 W = 0.126
CI.lower = 0.643
CI.upper = 0.895

We can code Motulsky’s analysis using the Wald method like this:

# Step 0: the data
S <- 31   #S = successes = num. infants surviving to 6 months
z <- 1.96 #z = 
n <- 39   #n = number of trials = total num. infants in study

# Step 1: calculate the Wald-corrected proportion
P.obs  <-  S/n           #observe proportion
P.corr <-  (S+z)/(n+z^2) #corrected proportion

#or, approximately, since 1.96^2 requires a calcualte
## here what he is doing is just rounding 1.96 up to 2.
## this if any makes the confidence a bit wider and
## therefore a bit more conservative
## of course, 33/43 typically will require a calculator
P.corr.alt <-  (S+2)/(n+4)

#If you get confused by the parentheses you can always break things up

numerator <- S+z
denominator <- n+z^2

P.corr <- numerator/denominator

# Compute half-width of the CI
W <- z*sqrt(P.corr*(1-P.corr)/(n+z^2))

lower.CI <- P.corr - W
upper.CI <- P.corr + W

Here is the output of the three statistical “tests” can can be applied to these data.

#packages to clean up the output
library(tidyr)
library(broom)
library(pander)
estimate p.value conf.low conf.high method
0.7949 0.0002941 0.6354 0.907 binom.test
0.7949 0.000427 0.6306 0.9013 prop.test
NA 0.0002306 NA NA chi^2

Here is the output using a binomial GLM (aka logistic regression)

## Loading required package: MASS
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
## Loading required package: lme4
## 
## arm (Version 1.10-1, built: 2018-4-12)
## Working directory is C:/Users/lisanjie/Documents/1_R/git/git-teaching/teaching_2018_2019/2018_fall/biostats_fall_2018/4_biostats_bks_pkg/IBS/IBSguide
estimate std.error p.value
0.775 0.3786 0.00109

4.11.3 Shortcut for proportion near 50% (OPTIONAL)

4.11.4 Shortcut for proportion far from 50% (OPTIONAL)

4.11.5 Shortcut when the numerator is zero: The rule of three (OPTIONAL)

4.12 “Ambiguity if The Proportion Is 0% or 100%”

4.13 “An Alternative Approach: Bayesian Credible Intervals”

4.14 “Common Mistakes: CI of A Proportion”

4.14.1 “Mistake: Using 100 as the denominator when the value is a percentage”

4.14.2 “Mistake: Computing binomial ICs from percentage change in a continous variable”

4.14.3 “Mistake: Computating a CI from data that look like a proportion but really is not”

4.14.4 “Mistake: Interpretting a Bayesin credible interval wihtout knowing what prior probabilities (or probabilitiey distribuiton) were assumed for the analysis”

4.15 Q & A

All of these are good points

[ ] Figure 4.3. Effect of sample size on the width of a CI This is a very important idea. Sample size is key to increasing the confidence we have in a result

[ ] Figure 4.4. Asymmetrical CI Proportions, percentages, etc are all bounded between 0 and 1, or 0% and 100%. Most methods of calculating confidence intervals for this type of data (but not all!) will produce assymmetric CI. If you see a CI for this type of data that crosses 0% or 100%, there’s a good chance the authors did not use an appropriate method for calculating the confidence intervals. I see this most often when data are percentages, like mean percentage of the ground covered by an invasive species.