## Time for counterfactuals I have just discovered Scriven’s stimulating (if grim) challenge to a counterfactual understanding of causation (see the debate recorded in Cook et al., 2010, p. 108):

“The classic example of this is the guy who has jumped off the top of a skyscraper and as he passes the 44th floor somebody shoots him through the head with a .357 magnum. Well, it’s clear enough that the shooter killed him but it’s clearly not true that he would not have died if the shooter hadn’t shot him; so the counterfactual condition does not apply, so it can’t be an essential part of the meaning of cause.”

I love this example because it illustrates a common form of programme effect and summarises the human condition – all in a couple of sentences! Let’s reshape it into an analogous example that extends the timeline by a couple of decades:

“A 60 year old guy chooses not to get a Covid vaccine. A few months later, he gets Covid and dies. Average male life expectancy is about 80 years.”

(I guess jumping is analogous to being born!)

By the end of the first sentence, I reason that if he had got the vaccine, he probably wouldn’t have died. By the end of the second sentence, I am reminded of the finiteness of life. So, the vaccine didn’t prevent death – similarly to an absence of a gunshot in the skyscraper example. How can we think about this using counterfactuals?

In a programme evaluation, it is common to gather data at a series of fixed time points, for instance a few weeks, months, and, if you are lucky, years after baseline. We are often happy to see improvement, even if it doesn’t endure. For instance, if I take a painkiller, I don’t expect its effects to persist forevermore. If a vaccine extends life by two decades, that’s rather helpful. Programme effects are defined at each time point.

To make sense of the original example, we need to add in time. There are three key timepoints:

1. Jumping (T0).
2. Mid-flight after the gunshot (T1).
3. Hitting the ground (T2).

When considering counterfactuals, the world may be different at each of these times, e.g., at T0 the main character might have decided to take the lift.

Here are counterfactuals that make time explicit:

• If the guy hadn’t jumped at T0, then he wouldn’t have hit the ground at T2.
• If the guy hadn’t jumped at T0, then he wouldn’t have been shot with the magnum and killed at T1.
• If the guy had jumped, but hadn’t been shot by the magnum, he would still have been alive at T1 but not at T2.

To assign truth values or probabilities to each of these requires a model of some description, e.g., a causal Bayesian network, which formalises your understanding of the intentions and actions of the characters in the text – something like the DAG below, with conditional probabilities filled in appropriately. So for instance, the probability of being dead at T2 given jumping at T0 is high – if you haven’t added variables about parachutes. What happens mid-flight governs T1 outcomes. Alternatively, you could just use informal intutition. Exercise to the reader: give it a go.

Using the Halpern-Pearl definitions of causality on this model (Halpern, 2016), jumping caused death at both T1 and T2. The shooting caused death at T1 but not T2. (R code here – proper explanation to be completed, but you could try this companion blog post and citation therein.)

Back then to the vaccine example, the counterfactuals rewrite to something like:

• If the guy hadn’t been born at T0, then he wouldn’t have died at T2.
• If the guy hadn’t been born at T0, then he couldn’t have chosen not to get a vaccine and died at T1.
• If the guy had been born, but had decided to get the vaccine, he would still have been alive at T1 aged 60, but possibly not at T2 aged 80.

### References

Cook, T. D., Scriven, M., Coryn, C. L. S., & Evergreen, S. D. H. (2010). Contemporary Thinking About Causation in Evaluation: A Dialogue With Tom Cook and Michael Scriven. American Journal of Evaluation, 31(1), 105–117.

Halpern, J. Y. (2016). Actual causality. The MIT press.

## Visualising programme theories

Lovely collection of examples of the folllowing ways of visualising a programme theory:

1. Logic model
2. Logical framework
3. Theory of change
4. Context-mechanism-outcome configuration
5. Causal loop diagram
6. Stock and flow diagram
7. Concept map
8. Network map
9. Path model
10. Nested/Hybrid model

Also includes links to tools for reasoning about the representations (where they have some genre of formal semantics).

### Examples   ### References

Lemire, S., Porowski, A., & Mumma, K. (2023). How We Model Matters: Visualizing Program Theories. Abt Associates.

## Actual causes: two examples using the updated Halpern-Pearl definition Halpern (2015) provides three variants of the Halpern-Pearl definitions of actual causation. I’m trying to get my head around the formalism, which is elegant, concise, and precise, but tedious to use in practice, so I wrote an R script to do the sums. This blog post is not self-contained – you will need to read the original paper for an introduction to the model. However, it works through two examples, which may help if you’re also struggling with the the paper.

The second (“updated”) definition of an actual cause asserts that $$\vec{A} = \vec{a}$$ is a cause of $$\varphi$$ in $$(M,\vec{u})$$ iff the following conditions hold:

AC1 $$(M,\vec{u}) \models (\vec{A} =\vec{a}) \land \varphi$$.

This says, if $$\vec{A} = \vec{a}$$ is an actual cause of $$\varphi$$ then they both hold in the actual world, $$(M,\vec{u})$$. Note, for this condition, we are just having a look at the model and not doing anything to it.

AC2 There is a partition of the endogenous variables in $$M$$ into $$\vec{Z} \supseteq \vec{X}$$ and $$\vec{W}$$ and there are settings $$\vec{x’}$$ and $$\vec{w}$$ such that

(a) $$(M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x’}, \vec{W} \leftarrow \vec{w}] \neg \varphi$$.

So, we’re trying to show that undoing the cause, i.e., setting $$\vec{X}$$ to $$\vec{x’} \ne \vec{x}$$, prevents the effect. We are allowed to modify $$\vec{W}$$ however we want to show this, whilst leaving $$\vec{Z}-\vec{X}$$ free to do whatever the model tells these variables to do.

(b) If $$(M,\vec{u}) \models \vec{Z} = \vec{z^{\star}}$$, for some $$\vec{z^{\star}}$$, then for all $$\vec{W’} \subseteq \vec{W}$$ and $$\vec{Z’} \subseteq \vec{Z}-\vec{X}$$,
$$(M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x}, \vec{W’} \leftarrow \vec{w’}, \vec{Z’} \leftarrow \vec{z^{\star}}] \varphi$$.

This says, trigger the cause (unlike AC1, we aren’t just looking to see if it holds) and check whether it leads to the effect under all subsets of $$\vec{Z}$$ (as per actual world) that aren’t $$\vec{X}$$ and all subsets of the modified $$\vec{W}$$ that we found for AC2(a). Note how we are setting $$\vec{Z}$$ for those subsets, rather than just observing it.

AC3 There is no $$\vec{A’} \subset \vec{A}$$ such that $$\vec{A’} = \vec{a’}$$ satisfies AC1 and AC2.

This says, there’s no superfluous stuff in $$\vec{A}$$. You taking a painkiller and waving a magic wand doesn’t cause your headache to disappear, under AC3, if the painkiller works without the wand.

### Example 1: an (actual) actual cause

Let’s give it a go with an overdetermined scenario (lightly edited from Halpern) that Alice and Bob both lob bricks at a glasshouse and smash the glass. Define

$$\mathit{AliceThrow} = 1$$
$$\mathit{BobThrow} = 1$$
$$\mathit{GlassBreaks} = \mathit{max}(\mathit{AliceThrow},\mathit{BobThrow})$$

So, if either Alice or Bob (or both) hit the glasshouse, then the glass breaks. Strictly speaking, I should have setup one or more exogenous variables, $$\vec{u}$$, that define the context and then defined $$\mathit{AliceThrow}$$ and $$\mathit{BobThrow}$$ in terms of $$\vec{u}$$, but it works fine to skip that step as I have here since I’m holding $$\vec{u}$$ constant anyway.

Is $$\mathit{AliceThrow} = 1$$ an actual cause of $$\mathit{GlassBreaks} = 1$$?

AC1 holds since $$(M,\vec{u}) \models \mathit{AliceThrow} = 1 \land \mathit{GlassBreaks} = 1$$. The first conjunct comes directly from one of the model equations. Spelling out the second conjunct,

$$\mathit{GlassBreaks} = \mathit{max}(\mathit{AliceThrow},\mathit{BobThrow})$$
$$= \mathit{max}(1, 1)$$
$$= 1$$

For AC2, we need to find a partition of the endogenous variables such that AC2(a) and AC2(b) hold. Try $$\vec{Z} = \{ \mathit{AliceThrow}, \mathit{GlassBreaks} \}$$ and $$\vec{W}= \{ \mathit{BobThrow} \}$$.

AC2(a) holds since $$(M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 0, \mathit{BobThrow} \leftarrow 0] \mathit{GlassBreaks} = 0$$.

For AC2(b), we begin with $$\vec{Z} = \{ \mathit{AliceThrow}, \mathit{GlassBreaks} \}$$ and the settings as per the unchanged model, so

$$(M,\vec{u}) \models \mathit{AliceThrow} = 1 \land \mathit{GlassBreaks} = 1$$.

We need to check that for all $$\vec{W’} \subseteq \vec{W}$$ and $$\vec{Z’} \subseteq \vec{Z}-\vec{X}$$,
$$(M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x}, \vec{W’} \leftarrow \vec{w’}, \vec{Z’} \leftarrow \vec{z^{\star}}] \varphi$$.

Here are the combinations and $$\varphi \equiv \mathit{GlassBreaks} = 1$$ holds for all of them:

$$(M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{GlassBreaks} \leftarrow 1, \mathit{BobThrow} \leftarrow 0 ] \varphi$$
$$(M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{BobThrow} \leftarrow 0 ] \varphi$$
$$(M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{GlassBreaks} \leftarrow 1 ] \varphi$$
$$(M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1 ] \varphi$$

AC3 is easy since the cause only has one variable, so there’s nothing superfluous.

### Example 2: not an actual cause

Now let’s try an example that isn’t an actual cause: the glass breaking causes Alice to throw the brick. It’s obviously false; however, it wasn’t clear to me exactly where it would fail until I worked through this…

AC1 holds since in the actual world, $$\mathit{GlassBreaks} = 1$$ and $$\mathit{AliceThrow} = 1$$ hold.

Examining the function defintions, they don’t provide a way to link $$\mathit{AliceThrow}$$ to a change in $$\mathit{GlassBreaks}$$, so the only apparent way to do so is through $$\vec{W}$$. Therefore, use the partition $$\vec{W} = \{\mathit{AliceThrow}\}$$ and $$\vec{Z} = \{\mathit{GlassBreaks}, \mathit{BobThrow}\}$$.

Now for AC2(a), we can easily get $$\mathit{AliceThrow} = 0$$ as required, since we can do what we like with $$\vec{W}$$. It doesn’t help when we move onto AC2(b) since we have to hold $$\mathit{AliceThrow} = 0$$, which is the negation of what we want. The same is the case for the other partition including $$\mathit{AliceThrow}$$ in $$\vec{W}$$, i.e., $$\vec{W} = \{ \mathit{AliceThrow}, \mathit{BobThrow} \}$$.

So, the broken glass does not cause Alice to throw a brick. The setup we needed to get through AC2(a) set us up to fail AC2(b).

### References

Halpern, J. Y. (2015). A Modification of the Halpern-Pearl Definition of Causality. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), 3022–3033.

## How theory-infused forms of evaluation proliferate

In case you’re wondering why we’re blessed with a multitude of terms for evaluations that use theory in some shape or fashion – theory-oriented evaluation, theory-based evaluation, theory-driven evaluation, program theory evaluation, intervening mechanism evaluation, theoretically relevant evaluation research, and program theory-driven evaluation science (Donaldson, 2022, p. 9) – the answer is in an XKCD comic: ### References

Donaldson, S. I. (2022). Introduction to Theory-Driven Program Evaluation (2nd ed.). Routledge.

## Carol Fitz-Gibbon (1938 – 2017), author of first description of theory-based evaluation, on importance of RCTs

“[…] I produced the first description of theory based evaluation […]. The point of theory based evaluation is to see, firstly, to what extent the theory is being implemented and, secondly, if the predicted outcomes then follow. It is particularly useful as an interim measure of implementation when the outcomes cannot be measured until much later. But most (if not all) theories in social science are only sets of persuasively stated hypotheses that provide a temporary source of guidance. In order to see if the hypotheses can become theories one must measure the extent to which the predicted outcomes are achieved. This requires randomised controlled trials. Even then the important point is to establish the direction and magnitude of the causal relation, not the theory. Many theories can often fit the same data.”

Fitz-Gibbon, C. T. (2002). Researching outcomes of educational interventions. BMJ, 324(7346), 1155.

“As an anecdote: one of us lives in a 1930s neighbourhood. The local municipality proposed converting a road connecting the neighbourhood to other neighbourhoods from a two-direction road to a one-way road. This would apply to motorized traffic only, not to bicycles. Many people protested because of the detour they would have to take by car. We asked the counterfactual question: suppose the city would have introduced the one-way road decades ago, would they then heavily support a policy to make it a two-direction road? Several people we asked did not know, the main reason for not supporting that change being that it would result in more traffic, noise and pollution, and a reduction in safety.”

– Van Wee et al. (2023, p. 84)

### Reference

Van Wee, B., Annema, J. A., & Van Barneveld, S. (2023). Controversial policies: Growing support after implementation. A discussion paper. Transport Policy, 139, 79–86.

## What is a counterfactual?

What’s a counterfactual? Philosophers love the example, “If Oswald hadn’t killed Kennedy, someone else would have”. More generally, Y would be y had X been x in situation U = u (Judea Pearl’s, 2011, rendering).

### References

Pearl, J. (2011). The structural theory of causation. In P. McKay Illari, F. Russo, & J. Williamson (Eds.), Causality in the Sciences (pp. 697–727). Oxford University Press.

## It’s all theory-based and counterfactual

Two of my favourite articles on #evaluation are Cook’s (2000) argument that all impact evaluations, RCTs included, are theory-based and Reichardt’s (2022) argument that there’s always a counterfactual, if not explicitly articulated then not far beneath the surface. I think both arguments are irrefutable, but how we can build on theirs and others’ work to improve evaluation commissioning and delivery seems a formidable challenge given the fiercely defended dichotomies in the field.

If all impact evaluation really is theory-based then it’s clear there’s huge variation in the quality of theories and theorising. If all impact evaluation depends on counterfactuals then there is huge variation in how compelling the evidence is for the counterfactual outcomes, particularly when there is no obvious comparison group.

Clarifying these kinds of distinctions is, I think, important for improving evaluations and the public services and other programmes they evaluate.

### References

Cook, T. D. (2000). The false choice between theory-based evaluation and experimentation. In A. Petrosino, P. J. Rogers, T. A. Huebner, & T. A. Hacsi (Eds.), New directions in evaluation: Program Theory in Evaluation: Challenges and Opportunities (pp. 27–34). Jossey-Bass.

Reichardt, C. S. (2022). The Counterfactual Definition of a Program Effect. American Journal of Evaluation, 43(2), 158–174.

## Process and implementation evaluations: A primer (Patricia Rogers and Michael Woolcock)

I tend to focus on the experimental and quasi-experimental elements of programme evaluations, but most of the work we do also includes implementation and process evaluations.

This looks interesting, by Patricia Rogers and Michael Woolcock:

In this working paper for the Center for International Development at Harvard University, Patricia Rogers and Michael Woolcock argue that implementation and process evaluations serve the vital purpose of jointly promoting accountability and learning.

This focus on accountability and learning can expand evaluations’ role from external instruments of compliance to internal drivers of partnership, innovation, and improvement. Process evaluations can offer a deeper understanding of interventions, guiding informed decision-making, fostering continuous learning, and cultivating adaptable organizations and sustainable positive impacts for those served.

The paper explores six process and implementation evaluation types, highlighting their strengths and weaknesses in different contexts.

References

Rogers, P.J. & Woolcock, M. (2023). Process and Implementation Evaluations: A Primer. Center for International Development at Harvard University.

## A cynical view of SEMs

It is all too common for a box and arrow diagram to be cobbled together in an afternoon and christened a “theory of change”. One formalised version of such a diagram is a structural equation model (SEM), the arrows of which are annotated with coefficients estimated using data. Here is John Fox (2002) on SEM and informal boxology:

“A cynical view of SEMs is that their popularity in the social sciences reflects the legitimacy that the models appear to lend to causal interpretation of observational data, when in fact such interpretation is no less problematic than for other kinds of regression models applied to observational data. A more charitable interpretation is that SEMs are close to the kind of informal thinking about causal relationships that is common in social-science theorizing, and that, therefore, these models facilitate translating such theories into data analysis.”

### References

Fox, J. (2002). Structural Equation Models: Appendix to An R and S-PLUS Companion to Applied Regression. Last corrected 2006.