
Social policy and programme evaluations often report findings in terms of casual estimands such as the average treatment effect (ATE) or the average treatment effect on the treated (ATT or ATET). An estimand is a quantity we are trying to estimate – but what exactly does that mean? This post explains through simple examples.
Suppose a study has two conditions, treat (=1) and control (=0). Causal estimands are defined in terms of potential outcomes: the outcome if someone had been assigned to treatment, \(Y(1)\), and outcome if someone had been assigned to control, \(Y(0)\).
We only get to see one of those two realised, depending on which condition someone was actually assigned to. The other is a counterfactual outcome. Assume, for a moment, that you are omniscient and can observe both potential outcomes. The treatment effect (TE) for an individual is \(Y(1)-Y(0)\) and, since you are omniscient, you can see it for everyone.
Here is a table of potential outcomes and treatment effects for 10 fictional study participants. A higher score represents a better outcome.
Person | Condition | Y(0) | Y(1) | TE |
---|---|---|---|---|
1 | 1 | 0 | 7 | 7 |
2 | 0 | 3 | 0 | -3 |
3 | 1 | 2 | 9 | 7 |
4 | 1 | 1 | 8 | 7 |
5 | 0 | 4 | 1 | -3 |
6 | 1 | 3 | 10 | 7 |
7 | 0 | 4 | 1 | -3 |
8 | 0 | 8 | 5 | -3 |
9 | 0 | 7 | 4 | -3 |
10 | 1 | 3 | 10 | 7 |
Note the pattern in the table. People who were assigned to treatment have a treatment effect of \(7\) and people who were assigned to control have a treatment effect of \(-3\), i.e., if they had been assigned to treatment, their outcome would have been worse. So everyone in this fictional study was lucky: they were assigned to the condition that led to the best outcome they could have had.
The average treatment effect (ATE) is simply the average of treatment effects:Β
\(\displaystyle \frac{7 + -3 + 7 + 7 + -3 + 7 + -3 + -3 + -3 + 7}{10}=2\)
The average treatment effect on the treated (ATT or ATET) is the average of treatment effects for people who were assigned to the treatment:
\(\displaystyle \frac{7 + 7 + 7 + 7 + 7}{5}=7\)
The average treatment effect on controlΒ (ATC) is the average of treatment effects for people who were assigned to control:
\(\displaystyle \frac{-3 + -3 + -3 + -3 + -3}{5}=-3\)
Alas we aren’t really omniscient, so in reality see a table like this:
Person | Condition | Y(0) | Y(1) | TE |
---|---|---|---|---|
1 | 1 | ? | 7 | ? |
2 | 0 | 3 | ? | ? |
3 | 1 | ? | 9 | ? |
4 | 1 | ? | 8 | ? |
5 | 0 | 4 | ? | ? |
6 | 1 | ? | 10 | ? |
7 | 0 | 4 | ? | ? |
8 | 0 | 8 | ? | ? |
9 | 0 | 7 | ? | ? |
10 | 1 | ? | 10 | ? |
This table highlights the fundamental problem of causal inference and why it is sometimes seen as a missing data problem.
Don’t confuse estimands and methods for estimation
One of the barriers to understanding these estimands is that we are used to taking a between-participant difference inΒ group means to estimate the average effect of a treatment. But the estmands are defined in terms of a within-participant difference between two potential outcomes, only one of which is observed.
The causal effect is a theoretical quantity defined for individual people and it cannot be directly measured.
Here is another example where the causal effect is zero for everyone, so ATT, ATE, and ATC are all zero too:
Person | Condition | Y(0) | Y(1) | TE |
---|---|---|---|---|
1 | 1 | 7 | 7 | 0 |
2 | 0 | 3 | 3 | 0 |
3 | 1 | 7 | 7 | 0 |
4 | 1 | 7 | 7 | 0 |
5 | 0 | 3 | 3 | 0 |
6 | 1 | 7 | 7 | 0 |
7 | 0 | 3 | 3 | 0 |
8 | 0 | 3 | 3 | 0 |
9 | 0 | 3 | 3 | 0 |
10 | 1 | 7 | 7 | 0 |
However, people have been assigned to treatment and control in such a way that, given the outcomes realised, it appears that treatment is better than control. Here is the table again, this time with observations we couldn’t observe removed:
Person | Condition | Y(0) | Y(1) | CE |
---|---|---|---|---|
1 | 1 | ? | 7 | ? |
2 | 0 | 3 | ? | ? |
3 | 1 | ? | 7 | ? |
4 | 1 | ? | 7 | ? |
5 | 0 | 3 | ? | ? |
6 | 1 | ? | 7 | ? |
7 | 0 | 3 | ? | ? |
8 | 0 | 3 | ? | ? |
9 | 0 | 3 | ? | ? |
10 | 1 | ? | 7 | ? |
So, if we take the average of realised treatment outcomes we get 7 and the average of realised control outcomes we get 3. The mean difference is then 4. This estimate is biased. The correct answer is zero, but we couldn’t tell from the available data.
The easiest way to estimate ATE is through a randomised controlled trial. In this kind of study, the mean difference in observed outcomes is an unbiased estimate of ATE. For other estimators that don’t require random treatment assignment and for other estimands, try Scott Cunningham’s Causal Inference: The Mixtape.
How do you choose between ATE, ATT, and ATC?
Firstly, if you are running a randomised controlled trial, you don’t choose: ATE, ATT, and ATC will be the same. This is because, on average across trials, the characteristics of those who were assigned to treatment or control will be the same.
So the distinction between these three estimands only matters for quasi-experimental studies, for example where treatment assignment is not under the control of the researcher.
Noah Greifer and Elizabeth Stuart offer a neat set of example research questions to help decide (here lightly edited to make them less medical):
- ATT: should an intervention currently being offered continue to be offered or should it be withheld?
- ATC: should an intervention be extended to people who don’t currently receive it?
- ATE: should an intervention be offered to everyone who is eligible?
How does intention to treat fit in?
The distinction between ATE and ATT is unrelated to the distinction between intention to treat and per-protocol analyses. Intention to treat analysis means we analyse people according to the group they were assigned to, even if they didn’t comply, e.g., by not engaging with the treatment. Per-protocol analysis is a biased analysis that only analyses data from participants who did comply and is generally not recommended.
For instance, it is possible to conduct a quasi-experimental study that uses intention to treat and estimates the average treatment effect on the treated. In this case, ATT might be better called something like average treatment effect for those we intended to treat (ATETWITT). Sadly this term hasn’t yet been used in the literature.
Summary
Causal effects are defined in terms of potential outcomes following treatment and following control. Only one potential outcome is observed, depending on whether someone was assigned to treatment or control, so causal effects cannot be directly observed. The fields of statistics and causal inference find ways to estimate these estimands using observable data. The easiest way to estimate ATE is through a randomised controlled trial. In this kind of study, the mean difference in observed outcomes is an unbiased estimate of ATE. Quasi-experimental designs allow the estimation of additional estimands: ATT and ATC.