Debiasing a biased coin

Suppose you have a coin that you suspect might be biased. Here’s how you can debias it so that there’s a 50-50 chance of heads or tails, thanks to a neat idea often attributed to von Neumann (1951/1963, p. 768):

“… in tossing a coin it is probably easier to make two consecutive tosses independent than to toss heads with probability exactly one-half. If independence of successive tosses is assumed, we can reconstruct a 50-50 chance out of even a badly biased coin by tossing twice. If we get heads-heads or tails-tails, we reject the tosses and try again. If we get heads-tails (or tails-heads), we accept the result as heads (or tails). The resulting process is rigorously unbiased, although the amended process is at most 25 percent as efficient as ordinary coin-tossing.”

Why does this work? First, the probability of heads followed by tails ($$HT$$) is the following product, since coin flips are independent:

$$P(HT) = P(H) [1-P(H)]$$

We get the same answer for tails followed by heads ($$TH$$):

$$P(TH) = [1-P(H)] P(H) = P(H) [1-P(H)]$$

So, $$P(HT) = P(TH)$$, which already hints at why this works.

For example, if a coin is so ridiculously biased that it only has a 10% chance of a heads outcome, then

$$\displaystyle P(HT) = P(TH) = \frac{1}{10} \frac{9}{10} = \frac{9}{100}$$

We really want a debiased coin to have a 50-50 chance of a heads or tails outcome. That’s where ignoring $$HH$$ and $$TT$$ outcomes helps; we condition on $$HT$$ or $$TH$$.

Assuming $$0 < P(H) < 1$$, then

$$\displaystyle P(HT|HT \lor TH) = \frac{P(HT)}{P(HT) + P(TH)}$$

$$\displaystyle = \frac{P(HT)}{P(HT) + P(HT)}$$

$$\displaystyle = \frac{1}{2}$$

Same sums for $$P(TH|HT \lor TH)$$. Et voila, a fair coin from two or more tosses of a biased one!

Let’s simulate it using rbinom in R.

Here’s a biased coin, with 1/10 chance of heads, flipped 20,000 times ($$H = 1$$ and $$T = 0$$):

test <- rbinom(20000, 1, .1)
table(test)


This gives:

test
0     1
17982  2018

Around 10% of outcomes were heads.

Now the debiaser:

debias_iid <- function(x) {
stopifnot(length(x) >= 2)
stopifnot(length(x) %% 2 == 0)

res <- rep(NA, length(x)/2)
j <- 1

for (i in seq(1,length(x), 2)) {
res[j] <- case_when(
x[i] == 1 && x[i+1] == 0 ~ 1,
x[i] == 0 && x[i+1] == 1 ~ 0
)
j <- j+1
}
res
}


Try it:

debias_iid(test) |> table()

That looks better:

  0   1
900 924

50.7% of outcomes were heads, which is the sort of value we would expect, given sampling error, for a fair coin.

References

von Neumann, J. (1951) Various Techniques Used in Connection with Random Digits, Notes by G. E. Forsythe, National Bureau of Standards Applied Math Series, 12, 36-38. Reprinted in von Neumann’s Collected Works (1963), Pergamon Press, pp. 768-770.