Debiasing a biased coin

GIF from Tenor

Suppose you have a coin that you suspect might be biased. Here’s how you can debias it so that there’s a 50-50 chance of heads or tails, thanks to a neat idea often attributed to John von Neumann (1951/1963, p. 768):

“… in tossing a coin it is probably easier to make two consecutive tosses independent than to toss heads with probability exactly one-half. If independence of successive tosses is assumed, we can reconstruct a 50-50 chance out of even a badly biased coin by tossing twice. If we get heads-heads or tails-tails, we reject the tosses and try again. If we get heads-tails (or tails-heads), we accept the result as heads (or tails). The resulting process is rigorously unbiased, although the amended process is at most 25 percent as efficient as ordinary coin-tossing.”

Why does this work? First, the probability of heads followed by tails (\(HT\)) is the following product, since coin flips are independent:

\(P(HT) = P(H) [1-P(H)]\)

We get the same answer for tails followed by heads (\(TH\)):

\(P(TH) = [1-P(H)] P(H) = P(H) [1-P(H)]\)

So, \(P(HT) = P(TH)\), which already hints at why this works.

For example, if a coin is so ridiculously biased that it only has a 10% chance of a heads outcome, then

\(\displaystyle P(HT) = P(TH) = \frac{1}{10} \frac{9}{10} = \frac{9}{100}\)

We really want a debiased coin to have a 50-50 chance of a heads or tails outcome. That’s where ignoring \(HH\) and \(TT\) outcomes helps: we condition on \(HT\) or \(TH\).

Assuming \(0 < P(H) < 1\), then

\(\displaystyle P(HT|HT \lor TH) = \frac{P(HT)}{P(HT) + P(TH)}\)

\(\displaystyle = \frac{P(HT)}{P(HT) + P(HT)}\)

\(\displaystyle = \frac{1}{2}\)

Same sums for \(P(TH|HT \lor TH)\). Et voila, a fair coin from two or more tosses of a biased one!

Let’s simulate it using rbinom in R.

Here’s a biased coin, with 1/10 chance of heads, flipped 20,000 times (\(H = 1\) and \(T = 0\)):

test <- rbinom(20000, 1, .1)
table(test)

This gives:

test
    0     1 
17982  2018

Around 10% of outcomes were heads.

Now the debiaser:

debias_iid <- function(x) {
  stopifnot(length(x) >= 2)
  stopifnot(length(x) %% 2 == 0)
  
  res <- rep(NA, length(x)/2)
  j <- 1
  
  for (i in seq(1,length(x), 2)) {
    res[j] <- case_when(
      x[i] == 1 && x[i+1] == 0 ~ 1,
      x[i] == 0 && x[i+1] == 1 ~ 0
    )
    j <- j+1
  }
  res
}

Try it:

debias_iid(test) |> table()

That looks better:

  0   1 
900 924

50.7% of outcomes were heads, which is the sort of value we would expect, given sampling error, for a fair coin.

References

von Neumann, J. (1951) Various Techniques Used in Connection with Random Digits, Notes by G. E. Forsythe, National Bureau of Standards Applied Math Series, 12, 36-38. Reprinted in von Neumann’s Collected Works (1963), Pergamon Press, pp. 768-770.

True random numbers in R

Random numbers are important for lots of things including statistical analysis (e.g., using bootstrapping) and cryptography (e.g., producing a one-time pad). But computers can’t produce random numbers. They produce pseudorandom numbers, which are deterministic and hence predictable given the random seed.

Pseudorandom numbers more than suffice for most purposes (so long as everyone isn’t using the same seeds like 1234 or 42); however, if you want or need the real thing there are various ways to pipe true randomness into computers. Two are available via R packages.

The best known is {random}, which exports pure randomness from Ireland. The basic idea is that a radio receiver is tuned to a frequency where no station broadcasts. This is continuously recorded and sampled to generate random numbers.

Another is {qrandom}, provided by the Australian National University. This works by measuring the quantum fluctuations of the vacuum.

01001110 01101111 00101100 00100000 01110100 01101000 01101001 01110011 00100000 01101001 01110011 01101110 11100010 10000000 10011001 01110100 00100000 01110010 01100001 01101110 01100100 01101111 01101101