Blog

“This is exactly what gentleness is”

“Another day, in the rain, we’re waiting for the boat at the lake; from happiness, this time, the same outburst of annihilation sweeps through me. This is how it happens sometimes, misery or joy engulfs me, without any particular tumult ensuing: nor any pathos: I am dissolved, not dismembered; I fall, I flow, I melt. Such thoughts—grazed, touched, tested (the way you test the water with your foot)—can recur. Nothing solemn about them. This is exactly what gentleness is.”

– Roland Barthes (1977), A lover’s discourse: fragments

Gorbachev

“Gorbachev’s desperate attempts to preserve socialism and the Soviet Union eventually failed utterly, turning him into an accidental hero in the West. I won’t even give him the minimal credit some offer for not sending in the proverbial tanks to crush the anti-Communist uprisings that were taking place all across the Soviet Bloc, especially since Gorbachev did send in military to Latvia and Lithuania, where he believed he could get away with it. He was hardly a risk taker where his own neck was concerned and didn’t want to end up like Romanian Communist dictator Nicolae Ceaușescu, whose rapid overthrow and execution in December 1989 was still fresh in everyone’s mind.”

– Garry Kasparov (2015), Winter Is Coming

“What is the actual difference between Russia and Ukraine?”

‘As the Director of the Ukrainian Institute London and a historian, I received numerous requests for commentary in the context of Russia’s war against Ukraine. Most began with a question asking me to elaborate on the actual difference between Russia and Ukraine. The question was well meant; it was intended to debunk Putin’s weaponised mythology. But the interviewers were oblivious to their own entrapment in the imperialist framework even as they attempted to give Ukraine a voice. […] Weary of giving a “proper” answer (starting with Volodymyr the Great and ending with Volodymyr Zelenskiy) for the umpteenth time, I asked one journalist a question in return: “What, exactly, is the difference between Ireland and England?” Instead of an answer, I heard a nervous giggle.’

– Olesya Khromeychuk (2022), Where is Ukraine?

The arithmetic of the Deutsch algorithm

The Deutsch algorithm (named after its originator David Deutsch) is a vivid and simple illustration of the power of quantum computing. Beyond achieving this admirable goal, it has no practical purpose, though it does seem to have inspired more useful algorithms. This blog post works through the arithmetic to show that the algorithm works (I did the sums to convince myself as I couldn’t grasp what was going on in Dirac notation). I don’t attempt to explain why it works.

The problem

Here’s the problem the Deutsch algorithm solves.

Suppose you are presented with a function \(f : \{0, 1\} \rightarrow \{0, 1\}\). You don’t know how \(f\) is defined, but you can call it with a \(0\) or a \(1\) and see what result it gives. The result will always be a \(0\) or a \(1\).

Your task is to decide whether or not \(f\) is constant, that is, \(f(0) = f(1)\). If \(f\) isn’t constant, it is said to be balanced since for one input the answer is \(0\) and for the other it is \(1\).

There are four possible functions, \(f_i\), as follows:

Function \(f_i(0)\) \(f_i(1)\) Type
\(f_0\) 0 0 Constant
\(f_1\) 0 1 Balanced
\(f_2\) 1 0 Balanced
\(f_3\) 1 1 Constant

Classically, to distinguish between the constant and balanced functions we would simply call \(f(0)\) and \(f(1)\), look at the results, and see which function it is. The Deutsch quantum algorithm can distinguish between a constant and balanced function by calling the function only once. It can’t tell us which function it is, just whether it is constant or balanced.

The circuit

Here is the circuit that solves it (diagram mildly simplified from Alice Flarend and Bob Hilborn, 2022, Fig 11.3, p. 163):

The gate \(U_f\) is a representation of the unknown function, \(H\) is the Hadamard gate (named after Jacques Hadamard), and \(M\) is measurement. Let’s work through the arithmetic. I’ll start with the easy bits at the input and output before tackling \(U_f\) in the middle.

The inputs

To get it going, first we need to apply \(H\) to the basis state \(|1\rangle\).

\(\displaystyle H = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}\),

\(\displaystyle |1\rangle = \begin{bmatrix} 0 \\ 1  \end{bmatrix}\), so

\(\displaystyle H|1\rangle = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ -1  \end{bmatrix}\).

Next, since the circuit starts with two qubits both in the state \(H|1\rangle\), we need to calculate the tensor product \(H|1\rangle \otimes H|1\rangle\):

\(\displaystyle H|1\rangle \otimes H|1\rangle = \frac{1}{2} \begin{bmatrix} 1 \\ -1 \\ -1 \\ 1  \end{bmatrix}\).

In ket notation, and reading top to the bottom of the column matrix, this is equivalent to:

\(\displaystyle \frac{1}{2}|00\rangle-\frac{1}{2}|01\rangle-\frac{1}{2}|10\rangle+\frac{1}{2}|11\rangle\).

\(|00\rangle\) means the inputs (reading the circuit top to bottom) were \(0\) and \(0\), \(|01\rangle\) means \(0\) and \(1\), and so on.

This pair of qubits is in superposition. If we measured after this step, then each of the four input possibilities would be observed with equal probability. (Use Born’s rule to square the state magnitudes; each is \(1/4\).)

The output

After the \(U_f\) magic happens (next section), we undo the Hadamard on the top path (register) of the circuit. Another tensor product gives us the appropriate matrix, this time of Hadamard and the identity matrix, \(I\), defined:

\(\displaystyle I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\).

So, \(\displaystyle H \otimes I = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 \end{bmatrix}\).

The middle

Now we are left with the four implementations of \(U_f\), the overall structure of which is as follows (Mermin, 2007, p. 42):

Note that the first input, \(|x\rangle\) to the gate remains unchanged by the output and the second out is the exclusive-OR (XOR) of \(y\) and \(f(x)\). (In a future blog post I explain why this helps.) Here are the sums for each \(U_f\) (the pictures are from Mermin, 2007, p. 42; I did the ‘rithmetic):

\(U_{f_0}\):

This is the easiest – it’s just the identity matrix. In the spirit of how we typically create a product of inputs, this is the following tensor product:

\(\displaystyle U_{f_0} = I \otimes I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\),

the structure of which pleases me for some reason (how the \(4 \times 4\) identity matrix has a couple of \(2 \times 2\) identity matrices in top-left and bottom-right corners, and how \(\otimes\) makes it happen!).

\(U_{f_1}\):

This is the controlled NOT gate:

\(U_{f_1} = \mbox{CNOT} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix}\).

\(U_{f_2}\):

This takes the NOT (a.k.a. \(X\)) of the second input and then applies CNOT to both inputs. Let’s take the first step first:

\(I \otimes X = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \otimes \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix}\).

(This is mildly pleasing: note the \(2 \times 2\) NOT gate at the top left and bottom right.)

Now multiply CNOT by it:

\(U_{f_2} = \mbox{CNOT} (I \otimes X) = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\)

\(U_{f_3}\):

Another easier one: just NOT the second input, so we need \(I \otimes X\) and we already calculated it above:

\(U_{f_3} = I \otimes X = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \otimes \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix}\).

Gluing it all together

Now the fun bit in a big table. Let

\(\mbox{Inp} = H|1\rangle \otimes H|1\rangle\) and
\(\mbox{Undo} = H \otimes I\).

That is, \(\mbox{Inp}\) is the input and \(\mbox{Undo}\) is the step in the Deutsch circuit before measurement which applies the Hadamard gate.

Function \(\mbox{Inp}\ U_{f_i}\) \(\mbox{Inp}\ U_{f_i}\ \mbox{Undo}\)
\(f_0\)
(constant)
\(\displaystyle \frac{1}{2} \begin{bmatrix} 1 \\ -1 \\ -1 \\ 1  \end{bmatrix}\) \(\displaystyle \frac{1}{\sqrt{2}} \begin{bmatrix} 0 \\ 0 \\ 1 \\ -1  \end{bmatrix}\)
\(f_1\)
(balanced)
\(\displaystyle \frac{1}{2} \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1  \end{bmatrix}\) \(\displaystyle \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ -1 \\ 0 \\ 0  \end{bmatrix}\)
\(f_2\)
(balanced)
\(\displaystyle \frac{1}{2} \begin{bmatrix} -1 \\ 1 \\ -1 \\ 1  \end{bmatrix}\) \(\displaystyle \frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 1 \\ 0 \\ 0  \end{bmatrix}\)
\(f_3\)
(constant)
\(\displaystyle \frac{1}{2} \begin{bmatrix} -1 \\ 1 \\ 1 \\ -1  \end{bmatrix}\) \(\displaystyle \frac{1}{\sqrt{2}} \begin{bmatrix} 0 \\ 0 \\ -1 \\ 1  \end{bmatrix}\)

To see what we would measure, use Born’s rule and square the amplitudes. This gives the following probabilities for each qubit pair:

\(P(|00\rangle)\) \(P(|01\rangle)\) \(P(|10\rangle)\) \(P(|11\rangle)\)
\(f_0\)
(constant)
\(0\) \(0\) \(\frac{1}{2}\) \(\frac{1}{2}\)
\(f_1\)
(balanced)
\(\frac{1}{2}\) \(\frac{1}{2}\) \(0\) \(0\)
\(f_2\)
(balanced)
\(\frac{1}{2}\) \(\frac{1}{2}\) \(0\) \(0\)
\(f_3\)
(constant)
\(0\) \(0\) \(\frac{1}{2}\) \(\frac{1}{2}\)

If the function is constant, the probability that the first qubit is a \(1\), which I’ll write \(P(|1\_\rangle)\), is:

\(\displaystyle P(|1\_\rangle) = P(|10\rangle) + P(|11\rangle) = \frac{1}{2} + \frac{1}{2} = 1\).

We observe either \(|10\rangle\) or \(|11\rangle\) with equal probability but the first qubit is always \(1\).

If the function is balanced, the probability that the first qubit is a \(0\) is:

\(\displaystyle P(|0\_\rangle) = P(|00\rangle) + P(|01\rangle) = \frac{1}{2} + \frac{1}{2} = 1\).

For this possibility, we observe either \(|00\rangle\) or \(|01\rangle\) with equal probability but the first qubit is always a \(0\).

It worked! Looking at this first qubit tells us whether the function was constant or balanced. The second qubit has a 50-50 chance of being a \(1\) or \(0\) so doesn’t tell us anything helpful.

See this post for the output when I ran the algorithm on an actual quantum computer, ibmq_quito.

But why does it work?!

As I mentioned at the outset, I am not going to attempt to explain why it works, but Mermin’s (2007, p. 41) potted history of the algorithm may be of interest:

“Deutsch’s problem is the simplest example of a quantum tradeoff that sacrifices particular information to acquire relational information. A crude version of it appeared in a 1985 paper by David Deutsch that, together with a 1982 paper by Richard Feynman, launched the whole field. In that early version the trick could be executed successfully only half the time. It took a while for people to realize that the trick could be accomplished every single time.”

References

Flarend, A., & Hilborn, B. (2022). Quantum computing: from Alice to Bob. Oxford University Press.

Mermin, N. D. (2007). Quantum computer science: an introduction. Cambridge University Press.

Quantum computing in R

Quantum computing is developing rapidly. You can now write and run programs on real quantum computers(!) via the web(!!) for free(!!!), e.g., on IBM Quantum. The most powerful quantum computer open to the general public only has 7 qubits. But that’s enough to play with quantum goodies like superposition and entangled states.

All the cool quantum kids use Qiskit, an open-source Python package that simulates programs and talks to IBM’s actual quantum computers so you can run them for real. I quite like R, so was wondering whether it has any quantum computing packages. qsimulatR caught my eye; it simulates programs with up to 24 qubits and automagically translates R descriptions of programs to Qiskit so they can run on IBM’s machines.

I have started reading Alice Flarend and Bob Hilborn’s (2022) intro to quantum computing, so thought I’d give qsimulatR a go to see how easy it is to use it to answer problems on the Bell circuit (pp. 140-143). The Bell circuit is interesting because it demonstrates both superposition and  entanglement.

Here’s a picture of the Bell circuit (drawn automatically by qsimulatR – I’ll get to that shortly):

The H is the Hadamard gate which converts a computational basis state into a superposition state.

\(\displaystyle H = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}\)

Let’s start with that. First, here’s the qsimulatR code for \(|0\rangle\):

> qstate(nbits = 1, coef = c(1,0))
( 1 ) * |0>

And for \(|1\rangle\):

> qstate(nbits = 1, coef = c(0,1))
( 1 ) * |1>

To apply Hadamard to these, we just multiply. Here is \(H|0\rangle = \frac{1}{\sqrt{2}} |0\rangle + \frac{1}{\sqrt{2}} |1\rangle\):

> H(1) * qstate(nbits = 1, coef = c(1,0))
  ( 0.7071068 ) * |0> 
+ ( 0.7071068 ) * |1>

And here is \(H|1\rangle = \frac{1}{\sqrt{2}} |0\rangle- \frac{1}{\sqrt{2}} |1\rangle\):

> H(1) * qstate(nbits = 1, coef = c(0,1))
   ( 0.7071068 ) * |0> 
+ ( -0.7071068 ) * |1>

The second stage of the Bell circuit is the controlled NOT gate (CNOT):

\(\displaystyle \mbox{CNOT} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix}\)

We will need two qubits for CNOT. Here’s how to make the four combinations of computational basis states, by default labelled \(|00\rangle\), \(|01\rangle\), \(|10\rangle\), and \(|11\rangle\):

> qstate(nbits = 2, coefs = c(1,0,0,0))
( 1 ) * |00> 
> qstate(nbits = 2, coefs = c(0,1,0,0))
( 1 ) * |01> 
> qstate(nbits = 2, coefs = c(0,0,1,0))
( 1 ) * |10> 
> qstate(nbits = 2, coefs = c(0,0,0,1))
( 1 ) * |11>

Gluing it all together is easy. The H function has a parameter that tells it which qubit it should be applied to (we want the first, as per the circuit diagram). Suppose the whole circuit is called \(\mbox{BC}\), then \(\mbox{BC}|00\rangle\) is setup using:

> CNOT() * (H(1) *
+           qstate(nbits = 2, coefs = c(1,0,0,0)))
  ( 0.7071068 ) * |00> 
+ ( 0.7071068 ) * |11>

It worked!

\(\displaystyle \mbox{BC}|00\rangle = \frac{1}{\sqrt{2}} |00\rangle+ \frac{1}{\sqrt{2}} |11\rangle\)

Given a single qubit in superposition, we can’t peek inside to see its state since measuring a qubit collapses its state to one of its basis states. Given the entangled pair \(\frac{1}{\sqrt{2}} |00\rangle+ \frac{1}{\sqrt{2}} |11\rangle\), there’s a 50% chance we will observe \(|00\rangle\), 50% chance we will observe \(|11\rangle\), and a 0% chance we will observe the other basis states \(|01\rangle\) or \(|10\rangle\). The probabilities come from Born’s rule – square the amplitudes (actually the absolute value of the amplitudes since in general they’re complex numbers):

\(\left( \frac{1}{\sqrt{2}}\right)^2 = \frac{1}{2}\).

However, we can measure the states of a sequence of qubits that have been prepared in the same way and use the frequencies to estimate superposition states. This is quantum tomography, and qsimulatR allows us to simulate it. Here I have simulated 10,000 trials.

> measure(bc00, rep = 10000) |> summary()
All bits have been measured 10000 times with the outcome:
  ( 4996 ) * |00> 
+ ( 5004 ) * |11>

As expected (with some sampling error), \(P(|00\rangle) \approx P(|11\rangle) \approx \frac{1}{2}\).

We can wrap the Bell circuit in a function like so:

bell <- function(coefs) {
  inputs <- qstate(nbits = 2,
                   coefs = coefs,
                   basis = c("|00>","|10>","|01>","|11>"))
  
  cat("Input: ")
  print(inputs)
  cat("\n")

  CNOT() * (H(1) * inputs)
}

Note, I have explicitly named the basis states to match the bit order used in the book, which differs from the default order in the package (the package numbers bits right to left, the book uses left to right).

Here are all four combinations:

> bell(c(1,0,0,0))
Input:    ( 1 )	* |00> 

   ( 0.7071068 )	* |00> 
 + ( 0.7071068 )	* |11> 
 
> bell(c(0,1,0,0))
Input:    ( 1 )	* |10> 

   ( 0.7071068 )	* |00> 
 + ( -0.7071068 )	* |11> 
 
> bell(c(0,0,1,0))
Input:    ( 1 )	* |01> 

   ( 0.7071068 )	* |10> 
 + ( 0.7071068 )	* |01> 
 
> bell(c(0,0,0,1))
Input:    ( 1 )	* |11> 

   ( -0.7071068 )	* |10> 
 + ( 0.7071068 )	* |01>

Exercise (“Try It”) 10.6 asks us to reverse \(\mbox{BC}|00\rangle\), i.e., the reverse Bell circuit. That’s easy to do:

> RBC <- H(1) * (CNOT() * bell(c(1,0,0,0)))
> RBC
   ( 1 )	* |00>

So we’re back to \(|00\rangle = |0\rangle \otimes |0\rangle\).

The very cool thing about qsimulatR is that it draws circuit diagrams for us. Here’s the Bell circuit piped into the reverse Bell circuit:

RBC |> plot()

If that was very cool, then this is surely exceedingly cool – a one-liner to export the R object to IBM’s Qiskit Python format, so you can run it on one of their quantum computers:

export2qiskit(RBC, filename = "rbc.py")

This gives:

# automatically generated by qsimulatR
qc = QuantumCircuit(2)
qc.h(0)
qc.cx(0, 1)
qc.cx(0, 1)
qc.h(0)

A bit of trial and error and copy and paste and ta-da, it worked. IBM Quantum Lab parsed the code and drew the circuit diagram. A small tweak was needed to tell it the name of the circuit:

Now, I’m trying to wrap my head around the Deutsch–Jozsa algorithm. Watch this space…

Edited to add: here’s what happened next.

References

Flarend, A., & Hilborn, B. (2022). Quantum computing: from Alice to Bob. Oxford University Press.

Stock photos – beyond white cishet abled bodies

From a LinkedIn post by Lennart Nacke (2022)

  1. Queer in Tech
  2. CreateHerStock
  3. Disabled And Here
  4. The Gender Spectrum
  5. Photoability
  6. Canva Natural Women
  7. Getty Images Lean-In Collection
  8. UK Black Tech
  9. WOCinTechChat (also on Unsplash)
  10. Jopwell
  11. Shopify Burst Women Collection
  12. TONL collections
  13. Nappy collection
  14. People of Colour on Unsplash
  15. AllGo Plus Size collection (also on Unsplash)
  16. Elevate and represent a diverse Internet
  17. Disability Inclusive Stock Photography
  18. LGBTQ+ on Pexels
  19. Iwaria
  20. Mocha stock (this one didn’t work when I tried 9 Aug 2022 – maybe just overloaded)
  21. POCStock
  22. PICNOI
  23. Diversity photos
  24. Eye for Ebony
  25. ColorJoy Stock

True random numbers in R

Random numbers are important for lots of things including statistical analysis (e.g., using bootstrapping) and cryptography (e.g., producing a one-time pad). But computers can’t produce random numbers. They produce pseudorandom numbers, which are deterministic and hence predictable given the random seed.

Pseudorandom numbers more than suffice for most purposes (so long as everyone isn’t using the same seeds like 1234 or 42); however, if you want or need the real thing there are various ways to pipe true randomness into computers. Two are available via R packages.

The best known is {random}, which exports pure randomness from Ireland. The basic idea is that a radio receiver is tuned to a frequency where no station broadcasts. This is continuously recorded and sampled to generate random numbers.

Another is {qrandom}, provided by the Australian National University. This works by measuring the quantum fluctuations of the vacuum.

01001110 01101111 00101100 00100000 01110100 01101000 01101001 01110011 00100000 01101001 01110011 01101110 11100010 10000000 10011001 01110100 00100000 01110010 01100001 01101110 01100100 01101111 01101101

>1 million z-values

The distribution of more than one million z-values from Medline (1976–2019).

You need \(|z| > 1.96\) for “statistical significance” at the usual 5% level. This picture suggests a significant problem of papers not being published if that threshold isn’t crossed.

Source: van Zwet, E. W., & Cator, E. A. (2021). The significance filter, the winner’s curse and the need to shrink. Statistica Neerlandica, 75(4), 437–452.

Enumerating permutations – an example of recursion

One way to solve a problem of size \(n\) is to pretend you can solve it for \(n-1\), and use that imaginary solution to solve it for \(n\). The approach is called recursion. Here’s an example showing how it works: enumerating all the ways to rearrange the elements of a list.

Given a list with \(n\) elements, \(\langle x_1, x_2, x_3, \ldots, x_n \rangle\), where the order of the elements matters, there are

\(n! = n \times (n-1) \times (n-2) \times  \ldots \times  1\)

(“\(n\) factorial”) different ways to rearrange them (permutations). We can also write this out more formally as follows:

\(0! = 1\);
\(n! = n \times (n-1)!\), for \(n > 0\).

For example, consider the list \(\langle 1, 2,3 \rangle\), which I’ll just write 123 to save keystrokes. There are 3 elements and \(n! = 3 \times 2 \times 1 = 6\) permutations:

  1. 123
  2. 213
  3. 231
  4. 132
  5. 312
  6. 321

Using the more formal definition, the arithmetic goes:

\(3!\)
\(= 3 \times 2!\)
\(= 3 \times (2 \times 1!)\)
\(= 3 \times (2 \times (1 \times 0!))\)
\(= 3 \times (2 \times (1 \times 1))\)
\(= 6\)

To see how to enumerate all these, first start with the simplest list, \(\langle \ \rangle\). This is the empty list, the zero of the list world. There is exactly one way to arrange emptiness (\(0! = 1\)):

  1. \(\langle \ \rangle\)

That’s the problem solved for one list. There are infinitely many more, so we have a bit of work to do.

Now let’s return to the list 123, a list of length 3. Suppose (i.e., pretend) we knew all the permutations of 23 (a list of length 2). How could we use those to work out the permutations of 123.

It’s easy to see that there are two permutations of 23:

  1. 23
  2. 32

We go from those solutions to the solution for 123 by inserting 1 in all positions (beginning, middle, and end) of 23 and 32 in the following way:

  1. 123
  2. 213
  3. 231
  4. 132
  5. 312
  6. 321

I cheated a bit here, though. We haven’t developed a way to enumerate the permutations of 23. In a moment or two I want to write an R program to enumerate permutations of any list and R will need more explicit instructions.

Let’s go back a step and suppose we don’t know the permutations of 23 and want to enumerate them. Assume we know what the permutations are of a list with a single item, 3.

Now go back yet another step. How do we get the permutations of 3, assuming we know what they are for the next smallest list, the empty list, \(\langle \ \rangle\)?

We finally have an answer. As we saw above, there is one way to arrange emptiness: \(\langle \ \rangle\).

Following the same logic that allowed us to go from permutations of 23 to permutations of 123, we can go from from permutations of \(\langle \ \rangle\) to permutations of 3. We just have to insert 3 in all positions of the empty list. There is only one place to put it, giving a list with a single element, 3.

Now how do we go from the permutations of 3 to the permutations of 23? Simply place 2 in all positions of the one item list 3:

  1. 23
  2. 32

And so now our imagined solution actually exists.

We needed two steps:

  1. The base step, which solved the problem for the simplest list, the empty list.
  2. The inductive step, which uses permutations of the list \(\langle x_2, x_3, \ldots, x_n \rangle\) and inserts \(x_1\) in all positions of those permutations to give us the permutations of \(\langle x_1, x_2, x_3, \ldots, x_n \rangle\).

R

This translates straightforwardly to R as follows.

First, a function to insert an element \(x\) into all positions of a vector \(\mathit{xs}\):

insert_all <- function(x, xs) {
  res <- list()
  
  for (i in 0:length(xs))
    res <- append(res, list(append(xs, x, i)))

  res  
}

For example:

> insert_all(1, 2:4)
[[1]]
[1] 1 2 3 4

[[2]]
[1] 2 1 3 4

[[3]]
[1] 2 3 1 4

[[4]]
[1] 2 3 4 1

Now the actual permutation function, which takes a vector \(\mathit{xs}\) and returns a list of all its permutations:

perms <- function(xs) {
  res <- list()
  
  if (length(xs) == 0)
    res <- list(c())
  else  {
    ys <- perms(tail(xs, length(xs) - 1))
    for (y in ys)
      res <- append(res, insert_all(xs[1], y))
  }
    
  res
}

Note how perms is defined in terms of itself. The line

perms(tail(xs, length(xs) - 1))

removes the first element of the list and calls itself again. It keeps doing this until it is called with an empty vector, when it returns a list with an empty vector,

list(c())

Then insert_all pieces everything together.

Let’s give perms a go with the list 1234. There should be \(4! = 24\) permutations:

> perms(1:4)
[[1]]
[1] 1 2 3 4

[[2]]
[1] 2 1 3 4

[[3]]
[1] 2 3 1 4

[[4]]
[1] 2 3 4 1

[[5]]
[1] 1 3 2 4

[[6]]
[1] 3 1 2 4

[[7]]
[1] 3 2 1 4

[[8]]
[1] 3 2 4 1

[[9]]
[1] 1 3 4 2

[[10]]
[1] 3 1 4 2

[[11]]
[1] 3 4 1 2

[[12]]
[1] 3 4 2 1

[[13]]
[1] 1 2 4 3

[[14]]
[1] 2 1 4 3

[[15]]
[1] 2 4 1 3

[[16]]
[1] 2 4 3 1

[[17]]
[1] 1 4 2 3

[[18]]
[1] 4 1 2 3

[[19]]
[1] 4 2 1 3

[[20]]
[1] 4 2 3 1

[[21]]
[1] 1 4 3 2

[[22]]
[1] 4 1 3 2

[[23]]
[1] 4 3 1 2

[[24]]
[1] 4 3 2 1

It works!

Factorial

Understanding how to enumerate permutations also shows us how to count them. In other words, it shows us how to define factorial.

Pretend we don’t know what factorial is. We’re making a new one called “factorial?” (pronounce with rising intonation), such that \(n?\) is the number of permutations of a list of length \(n\). We hope that by the end of process, \(n? = n!\).

Recall again that there’s only one way to arrange emptiness, so

\(0? = 1\).

Suppose we know how to count the number of permutations of a list of length \(n-1\), that is, we know what \((n-1)?\) is. How would we use the answer to that to tell us what \(n?\) is?

If we know \((n-1)?\), that means we know the number of permutations of lists of length \(n-1\).

The length of each list counted by \((n-1)?\) must be \(n-1\). We saw when enumerating permutations that we have to add an extra element to all positions of each element in those lists. That’s \(n\) positions to add the extra element to, and we need to do it \((n-1)?\) times, so we just multiply:

\(n? = n \times (n-1)?\), for \(n > 0\).

It turns out that indeed \(n? = n!\), as defined at the top of this post.