
Suppose you have a coin that you suspect might be biased. Here’s how you can debias it so that there’s a 50-50 chance of heads or tails, thanks to a neat idea often attributed to John von Neumann (1951/1963, p. 768):
“… in tossing a coin it is probably easier to make two consecutive tosses independent than to toss heads with probability exactly one-half. If independence of successive tosses is assumed, we can reconstruct a 50-50 chance out of even a badly biased coin by tossing twice. If we get heads-heads or tails-tails, we reject the tosses and try again. If we get heads-tails (or tails-heads), we accept the result as heads (or tails). The resulting process is rigorously unbiased, although the amended process is at most 25 percent as efficient as ordinary coin-tossing.”
Why does this work? First, the probability of heads followed by tails (\(HT\)) is the following product, since coin flips are independent:
\(P(HT) = P(H) [1-P(H)]\)
We get the same answer for tails followed by heads (\(TH\)):
\(P(TH) = [1-P(H)] P(H) = P(H) [1-P(H)]\)
So, \(P(HT) = P(TH)\), which already hints at why this works.
For example, if a coin is so ridiculously biased that it only has a 10% chance of a heads outcome, then
\(\displaystyle P(HT) = P(TH) = \frac{1}{10} \frac{9}{10} = \frac{9}{100}\)
We really want a debiased coin to have a 50-50 chance of a heads or tails outcome. That’s where ignoring \(HH\) and \(TT\) outcomes helps: we condition on \(HT\) or \(TH\).
Assuming \(0 < P(H) < 1\), then
\(\displaystyle P(HT|HT \lor TH) = \frac{P(HT)}{P(HT) + P(TH)}\)
\(\displaystyle = \frac{P(HT)}{P(HT) + P(HT)}\)
\(\displaystyle = \frac{1}{2}\)
Same sums for \(P(TH|HT \lor TH)\). Et voila, a fair coin from two or more tosses of a biased one!
Let’s simulate it using rbinom in R.
Here’s a biased coin, with 1/10 chance of heads, flipped 20,000 times (\(H = 1\) and \(T = 0\)):
test <- rbinom(20000, 1, .1) table(test)
This gives:
test 0 1 17982 2018
Around 10% of outcomes were heads.
Now the debiaser:
debias_iid <- function(x) { stopifnot(length(x) >= 2) stopifnot(length(x) %% 2 == 0) res <- rep(NA, length(x)/2) j <- 1 for (i in seq(1,length(x), 2)) { res[j] <- case_when( x[i] == 1 && x[i+1] == 0 ~ 1, x[i] == 0 && x[i+1] == 1 ~ 0 ) j <- j+1 } res }
Try it:
debias_iid(test) |> table()
That looks better:
0 1 900 924
50.7% of outcomes were heads, which is the sort of value we would expect, given sampling error, for a fair coin.
References
von Neumann, J. (1951) Various Techniques Used in Connection with Random Digits, Notes by G. E. Forsythe, National Bureau of Standards Applied Math Series, 12, 36-38. Reprinted in von Neumann’s Collected Works (1963), Pergamon Press, pp. 768-770.