The New York Times has reported that the Internal Revenue Service has given one of its most rigorous types of audits to James B. Comey, the former FBI director, and Andrew G. McCabe, his former deputy.
This has led to a lot of perfectly reasonable questions, most variants of: What are the odds? As the article noted, the chances that two high-ranking political enemies of President Donald J. Trump were controlled by sheer coincidence are minuscule.
But minuscule is not zero.
If we were to believe this was a coincidence, how unlikely would we say it was? Here we try to estimate that chance as seriously as possible.
The facts first: Both men were chosen for audits under the National Research Program (NRP), a small subset of all audits the IRS conducts each year. These audits examine a sample of returns to collect tax compliance data.
According to the IRS, there were about 5,000 such audits in 2017, 4,000 in 2018 and 8,000 in 2019 — chosen from about 154 million individual tax returns each year. Mr Comey’s audit was for his 2017 tax return; mr. McCabe’s was ahead of his return in 2019.
Many aspects of the NRP complicate our calculations, including the sampling method used by IRS auditors and the different years of the audits themselves. We will come back to these matters later. For now, we assume that all taxpayers have an equal chance of being audited and that both men were audited in 2017.
If this problem were to appear in a textbook on probability, it might read:
If there are 154 million marbles (the estimated number of tax returns filed each year) in a giant urn, and a small number of them are red (including those of Mr. Comey and Mr. McCabe), what is the probability that you will pull two or more red marbles if you randomly draw a few thousand from the urn (the number of checks in that year)?
It may sound complicated, but it is a relatively well-studied problem, something many math or statistics majors would encounter in their college courses. People have already derived equations to estimate these probabilities, with names like the hypergeometric distributionwhich has applications such as election control and card counting.
We can just enter our estimates for the total number of marbles, the number of red marbles, and the number of draws, and we’ll have a shot. If we think there are only two red marbles – that is, if we limit the exercise to nothing but mr. McCabe and Mr. Comey – this equation gives a probability of about one in 950 million.
Those are significantly steeper odds than your chances of winning the Powerball. It is also an almost meaningless result. At best, it’s the right answer to the wrong question.
To understand why, we must recognize an absurdity inherent in our exercise: to estimate as best we can the probability of an unlikely event, we must set aside the fact that we know it has already happened. (The odds of it happening are 100 percent.)
Jordan Ellenberga professor at the University of Wisconsin who has written books on math and reasoning described it this way: “In some counterfactual universe, what are the chances of this thing that has already happened in our universe?”
It may seem strange, but the same problems arise even with probabilistic exercises as simple as flipping a coin.
If you flip a coin 20 times, your particular order of heads and tails is extremely rare, about one in a million, but it happened. And some sequence of flips will always happen. It’s only a surprising coincidence if that’s the order you envisioned before you flipped.
Likewise, limiting our search to Mr. Comey and mr. McCabe, because it’s likely we’d explore these opportunities if we found that two other notable political enemies of a government were checked in place of these two men.
A better question is: what are the chances of two or more people? Like it Would Mr. Comey and mr. McCabe be monitored during this period?
Should this group of people include two top FBI officials? Two top officials from the Department of Justice? It is this framing – a subjective decision rather than an actual one – that drives the most probability estimation, more than any choice of statistical distribution or sample weights.
Here’s a graph of the probability our comparison yields over several choices for the number of red marbles, ranging from two (Mr. Comey and Mr. McCabe and no one else) to 400 (a conservative estimate of the number of Americans Mr. Trump insulted by name on Twitter since the beginning of his run for president).
The probability increases dramatically with the choice of who next to Mr. Comey and mr. McCabe should be considered a red marble.
The point is not to decide a number, but to recognize that our choice of group size drives our response. While some guesses are certainly better than others, many choices are defensible.
Tackle the details
Now let’s try to refine something more realistic and return to some of the things we ignored in our simple interpretation of this problem.
First, the two men were not checked for the same year. Extending our scope to the three-year period from 2017 to 2019 significantly increases our resulting opportunities. This is simple: if a person has a certain probability of being audited in a particular year, more years means more opportunities to be audited.
Second, we are only interested in the probability of at least two people being chosen† We do not consider the probability of the same person being chosen twice; it seems unlikely given that the audits could extend over a year, according to Mr Comey’s story. Note that we are looking at the probability of at least two people being selected, not exactly two, because it would also be significant if three or more individuals were selected from a group.
Finally, the IRS doesn’t really select people at random. Instead, the agency tends to select certain types of taxpayers, including: high earners, more often than others. For the 2001 tax year, including the NRP sample gives back of people around the 90th percentile of income at about 1.7 times the percentage one would expect if return were chosen independent of income. That percentage peaked through the highest income ranks, so those with incomes in the top 0.5 percent were more than 10 times more likely to be in the sample than someone closer to the median income.
We can probably assume that any group of enemies of Mr. Trump would earn more than a random sample of Americans. But we can’t realistically estimate the full income of everyone in our group in each year. We also know that the IRS has taken into account other factors in its sampling, such as the kind of returns that taxpayers file, and that sampling methods may change from year to year. This leaves us little guidance on how to match the IRS’s methods. Therefore, we leave our estimates unweighted by income. As a back-of-the-envelope exercise, if you’re concerned about how income affects these results, you can double the resulting probability if you think the members of a group have very high income, and multiply it by 10 if you thinks they are extraordinarily rich.
Put them all together
With these choices, the table below lists some estimated odds depending on the group size being considered†
Alternatively, if our picks aren’t satisfying, we’ve created a simple calculator for you to make your own odds:
So which estimate is “correct”?
Most realistic results of this comparison can be accurately described as “very rare” or even “extremely rare”, but none are evidence of wrongdoing.
“It’s a bit like the irresistible force and the immovable object,” said Andrew Gelman, a professor of statistics and political science at Columbia University, when he talked about this exercise in abstracto. “On the one hand, you say it’s completely random. On the other hand, you suspect not.”
Mr Gelman, like every other statistician who spoke to The Times on this issue, said the biggest hurdle was not the details, but defining the question itself.
When we try to calculate the probability of a particular event because we suspect it may not be random, we end up in the complicated position of imagining how we would have predicted the probability of the event before it happened, said David Spiegelhalter† He leads the Winton Center for Risk and Evidence Communication at the University of Cambridge, an organization dedicated to improving the way quantitative evidence is used in society.
The math is simple, he said, but formulating the question is tricky, bordering on “meaningless,” largely because of how difficult it is to determine the group we care about.
“What are the chances of this happening?” is an easy statement to make,” he said. “It’s a well-known statement to make. But actually it is a very difficult question to answer.”
Mathematics has its limits. The point of trying to estimate a probability like this, Mr Gelman said, isn’t to overvalue the numbers, but to get the result to prompt you to learn more.
In this case, the best question is not one with an answer that you can look up in a stat book.
Instead, said Mr. Gelman, the question to be asked is, “What’s going on?”
Matthew Cullen reporting contributed.