Tidy Binomial Calculator Using the probs Package

Danny Morris

2019/04/08

This short post introduces a simple R package called probs that I created for tidy binomial probability calculations.

Install

The probs package lives on GitHub.

library(devtools)

devtools::install_github("dannymorris/probs")

Simulated Data

Recently, I needed to communicate to a larger audience the probability of detecting a particular event. Specifically, a pre-existing process was in place that required humans to sift through a sample of 100 PDF documents out of 40,000 documents. The intent of this process was determine whether or not the entire batch of documents was faulty or passable.

Based on previous data, there was reason to believe that a batch of 40,000 documents may contain 50 faulty documents. So the humans would manually inspect a sample of 100 documents, hoping to find at least 1 faulty document.

I created the probs package to quickly determine the probability of detecting 1 or more faulty documents in a sample of 100, given that the batch of 40,000 documents may contain 50 faulty documents.

# probability of detecting a single faulty document in a batch of 40,000 documents, assuming there are 50 faulty documents in total.
prob_faulty <- 50/40000

paste("The probability that any randomly selected document is faulty is", prob_faulty)
## [1] "The probability that any randomly selected document is faulty is 0.00125"

In other words, the probability of detecting a single faulty document in this case is .00125.

The question is now what is the probability that no faulty documents will be found in a sample of 100 documents, assuming there are 50 faulty documents in a batch of 40,000?

library(probs)

probs::cumul_binom_prob(
  n_success = 0,
  prob_success = prob_faulty,
  sample_size = 100
)
## $results
## # A tibble: 1 x 5
##   lower_tail upper_tail n_success prob_success sample_size
##        <dbl>      <dbl>     <dbl>        <dbl>       <dbl>
## 1      0.882      0.118         0      0.00125         100
## 
## $explanation
## [1] "The probability of 0 or fewer successes is 0.8824" 
## [2] "The probability of more than 0 successes is 0.1176"

The printout from the function explains that .88 is the probability that someone would find 0 faulty documents in a sample of 100. In other words, the humans are wasting their time if they are expecting to discover even a single faulty document.