Time for counterfactuals

I have just discovered Scriven’s stimulating (if grim) challenge to a counterfactual understanding of causation (see the debate recorded in Cook et al., 2010, p. 108):

“The classic example of this is the guy who has jumped off the top of a skyscraper and as he passes the 44th floor somebody shoots him through the head with a .357 magnum. Well, it’s clear enough that the shooter killed him but it’s clearly not true that he would not have died if the shooter hadn’t shot him; so the counterfactual condition does not apply, so it can’t be an essential part of the meaning of cause.”

I love this example because it illustrates a common form of programme effect and summarises the human condition – all in a couple of sentences. Let’s reshape it into an analogous example that extends the timeline by a couple of decades:

“A 60 year old guy chooses not to get a Covid vaccine. A few months later, he gets Covid and dies. Average male life expectancy is about 80 years.”

(Implicitly I guess jumping is analogous to being born!)

By the end of the first sentence, I reason that if he had got the vaccine, he probably wouldn’t have died. By the end of the second sentence, I am reminded of the finiteness of life. So, the vaccine didn’t prevent death – similarly to an absence of a gunshot in the skyscraper example. How can we think about this using counterfactuals?

In a programme evaluation, it is common to gather data at a series of fixed time points, for instance a few weeks, months, and, if you are lucky, years after baseline. We are often happy to see improvement, even if it doesn’t endure. For instance, if I take a painkiller, I don’t expect its effects to persist forevermore. If a vaccine extends life by two decades, that’s rather helpful. Programme effects are defined at each time point.

To make sense of the original example, we need to add in time. There are three key timepoints:

  1. Jumping (T0).
  2. Mid-flight after the gunshot (T1).
  3. Hitting the ground (T2).

When considering counterfactuals, the world may be different at each of these times, e.g., at T0 the main character might have decided to take the lift.

Here are counterfactuals that make time explicit:

  • If the guy hadn’t jumped at T0, then he wouldn’t have hit the ground at T2.
  • If the guy hadn’t jumped at T0, then he wouldn’t have been shot with the magnum and killed at T1.
  • If the guy had jumped, but hadn’t been shot by the magnum, he would still have been alive at T1 but not at T2.

To assign truth values or probabilities to each of these requires a model of some description, e.g., a causal Bayesian network, which formalises your understanding of the intentions and actions of the characters in the text – something like the DAG below, with conditional probabilities filled in appropriately.

So for instance, the probability of being dead at T2 given jumping at T0 is high – if you haven’t added variables about parachutes. What happens mid-flight governs T1 outcomes. Alternatively, you could just use informal intutition. Exercise to the reader: give it a go.

Back then to the vaccine example, the counterfactuals rewrite to something like:

  • If the guy hadn’t been born at T0, then he wouldn’t have died at T2.
  • If the guy hadn’t been born at T0, then he wouldn’t have chosen not to get a vaccine and died at T1.
  • If the guy had been born, but had decided to get the vaccine, he would still have been alive at T1 aged 60, but possibly not at T2 aged 80.


Cook, T. D., Scriven, M., Coryn, C. L. S., & Evergreen, S. D. H. (2010). Contemporary Thinking About Causation in Evaluation: A Dialogue With Tom Cook and Michael Scriven. American Journal of Evaluation, 31(1), 105–117.

Russian state disinformation campaigns

Two interesting reports:

European Commission, Directorate-General for Communications Networks, Content and Technology (2023). Digital Services Act: Application of the risk management framework to Russian disinformation campaigns. Publications Office of the European Union.

“During the first year of Russia’s illegal war in Ukraine, social media companies enabled the Kremlin to run a large-scale disinformation campaign targeting the European Union and its allies, reaching an aggregate audience of at least 165 million and generating at least 16 billion views. Preliminary analysis suggests that the reach and influence of Kremlin-backed accounts has grown further in the first half of 2023, driven in particular by the dismantling of Twitter’s safety standards.”

Microsoft Threat Analysis Center (2023). Russia’s African coup strategy.

“Today we are sharing a report from the Microsoft Threat Analysis Center (MTAC) on Russian influence operations in Africa, principally focused on the Niger coup. We believe it is vital there is wider understanding of the ways in which the internet is being used to stoke political instability around the world.”

Carol Fitz-Gibbon (1938 – 2017), author of first description of theory-based evaluation, on importance of RCTs

“[…] I produced the first description of theory based evaluation […]. The point of theory based evaluation is to see, firstly, to what extent the theory is being implemented and, secondly, if the predicted outcomes then follow. It is particularly useful as an interim measure of implementation when the outcomes cannot be measured until much later. But most (if not all) theories in social science are only sets of persuasively stated hypotheses that provide a temporary source of guidance. In order to see if the hypotheses can become theories one must measure the extent to which the predicted outcomes are achieved. This requires randomised controlled trials. Even then the important point is to establish the direction and magnitude of the causal relation, not the theory. Many theories can often fit the same data.”

Fitz-Gibbon, C. T. (2002). Researching outcomes of educational interventions. BMJ, 324(7346), 1155.

Asking counterfactuals to think about controversial policies

“As an anecdote: one of us lives in a 1930s neighbourhood. The local municipality proposed converting a road connecting the neighbourhood to other neighbourhoods from a two-direction road to a one-way road. This would apply to motorized traffic only, not to bicycles. Many people protested because of the detour they would have to take by car. We asked the counterfactual question: suppose the city would have introduced the one-way road decades ago, would they then heavily support a policy to make it a two-direction road? Several people we asked did not know, the main reason for not supporting that change being that it would result in more traffic, noise and pollution, and a reduction in safety.”

– Van Wee et al. (2023, p. 84)


Van Wee, B., Annema, J. A., & Van Barneveld, S. (2023). Controversial policies: Growing support after implementation. A discussion paper. Transport Policy, 139, 79–86.

Distinguishing between = and <-

‘Perhaps because of the use of = for assignment in FORTRAN […] assignment is often read as “x equals E“. This causes great confusion. The first author learned to distinguish between = and := while giving a lecture in Marktoberdorf, Germany, in 1975. At one point, he wrote “:=” on the board but pronounced it “equals”. Immediately, the voice of Edsger W. Dijkstra boomed from the back of the room: “becomes!”. After a disconcerted pause, the first author said, “Thank you; if I make the same mistake again, please let me know.”, and went on. Once more during the lecture the mistake was made, followed by a booming “becomes” and a “Thank you”. The first author has never made that mistake again! The second author, having received his undergraduate education at Cornell, has never experienced this difficulty.’

– David Gries and Fred B. Schneider (1993, p. 17, footnote 6) [A logical approach to discrete math. Springer.]

R lets you use both = and <- for assignment, FYI.

Little Miss and Mr Men name binariness

Pownall and Heflick (2023) investigated gender stereotypes in all 47 Mr Men and 34 Little Miss books. One of the studies asked (adult) participants to rate the masculinity/femininity of the character names on a scale from 1 (“entirely feminine”) to 5 (“entirely masculine”). The “Mr” and “Little Miss” were stripped off, so, e.g., participants were asked about the word “Quick”, “Princess”, “Greedy”.

I created a name binariness index for each mean rating, \(r\); the distance between the mean rating and 3 (the mid-point), as a proportion of the maximum that distance could be (2):

\(\displaystyle \frac{|r-3|}{2}\)

Here’s a plot of the names, sorted by name binariness. So good candidates for nonbinary characters would be Mx Quick or Mx Lucky. Alternatively, you could rightly reject the premise that any of them are gendered and go for Mx Princess.


Pownall, M., & Heflick, N. (2023). Mr. Active and Little Miss Passive? The Transmission and Existence of Gender Stereotypes in Children’s Books. Sex Roles.

Experiments to protect against harvest now, decrypt later

Quantum computing needs thousands of qubits to crack the public key encryption we currently rely on and at present the largest (publicly announced) quantum computer only has 433 qubits. The sales pitch for protecting against quantum attacks now is that baddies could be quietly harvesting encrypted data, so that when quantum computers are ready, off they go and decrypt.

There are two main approaches to protect against quantum attacks.

One approach, quantum encryption key distribution (QKD), can be implemented one qubit at a time, e.g., by sending photons along fibre optic cables. Each photon is a realisation of a qubit. HSBC is trialling a QKD approach developed by BT and Toshiba, which builds on BB84 from forty years ago. I’ve attempted to explain BB84 in a previous blog post.

Vodafone has opted for a non-quantum approach to defend against attacks on crypto by trialling new public key encryption algorithms, on classical computers, that are thought to be quantum-safe: post-quantum cryptography (PQC).

The next major revision of Google Chrome will include a hybrid approach combining X25519 , a standard method already used by Chrome and others browsers, with Kyber-768, a PQC method that has so far resisted attack.

Since PQC methods are new, the hybrid approach offers protection while the maths continues to be stress-tested. This is important since there has already been an embarrassingly quick crack of a supposedly quantum-safe approach.

What is a counterfactual?

What’s a counterfactual? Philosophers love the example, “If Oswald hadn’t killed Kennedy, someone else would have”. More generally, Y would be y had X been x in situation U = u (Judea Pearl’s, 2011, rendering).


Pearl, J. (2011). The structural theory of causation. In P. McKay Illari, F. Russo, & J. Williamson (Eds.), Causality in the Sciences (pp. 697–727). Oxford University Press.

It’s all theory-based and counterfactual

Two of my favourite articles on #evaluation are Cook’s (2000) argument that all impact evaluations, RCTs included, are theory-based and Reichardt’s (2022) argument that there’s always a counterfactual, if not explicitly articulated then not far beneath the surface. I think both arguments are irrefutable, but how we can build on theirs and others’ work to improve evaluation commissioning and delivery seems a formidable challenge given the fiercely defended dichotomies in the field.

If all impact evaluation really is theory-based then it’s clear there’s huge variation in the quality of theories and theorising. If all impact evaluation depends on counterfactuals then there is huge variation in how compelling the evidence is for the counterfactual outcomes, particularly when there is no obvious comparison group.

Clarifying these kinds of distinctions is, I think, important for improving evaluations and the public services and other programmes they evaluate.


Cook, T. D. (2000). The false choice between theory-based evaluation and experimentation. In A. Petrosino, P. J. Rogers, T. A. Huebner, & T. A. Hacsi (Eds.), New directions in evaluation: Program Theory in Evaluation: Challenges and Opportunities (pp. 27–34). Jossey-Bass.

Reichardt, C. S. (2022). The Counterfactual Definition of a Program Effect. American Journal of Evaluation, 43(2), 158–174.