Intersectionality, in under 200 words

If we try to eliminate pay gaps by monitoring only single characteristics like gender or ethnicity, we can still end up with pay gaps between combinations of characteristics. One way to do this would be to appoint white women and Black men to senior management positions, but not appoint any Black women.

The idea of an intersection comes from set theory and describes where two sets overlap. For instance, the intersection of the set of Black people and the set of women is the set of Black women.

Intersectionality is a broad framework that promotes the study and elimination of oppression and exploitation of people in terms of combinations of characteristics.

Is intersectionality a theory, explaining why this form of discrimination occurs? Here’s Patricia Hill Collins (2019, p.51), a leading scholar in this area:

“Every time I encounter an article that identifies intersectionality as a social theory, I wonder what conception of social theory the author has in mind. I don’t assume that intersectionality is already a social theory. Instead, I think a case can be made that intersectionality is a social theory in the making.”


Collins, P. H. (2019).  Intersectionality As Critical Social Theory. Duke University Press.

Two incontrovertible facts about RCTs

“… the following are two incontrovertible facts about a randomized clinical trial:

1. over all randomizations the groups are balanced;

2. for a particular randomization they are unbalanced.

Now, no [statistically] ‘significant imbalance’ can cause 1 to be untrue and no lack of a significant balance can make 2 untrue. Therefore the only reason to employ such a test must be to examine the process of randomization itself. Thus a significant result should lead to the decision that the treatment groups have not been randomized…”

– Senn (1994,  p. 1716)

Senn, S. (1994). Testing for baseline balance in clinical trials. Statistics in Medicine, 13, 1715–1726.

Parametric versus non-parametric statistics

There is no such thing as parametric or non-parametric data. There are parametric and non-parametric statistical models.

“The term nonparametric may have some historical significance and meaning for theoretical statisticians, but it only serves to confuse applied statisticians.”

– Noether, G. E. (1984, p. 177)

“. . . the distribution functions of the various stochastic variables which enter into their problems are assumed to be of known functional form, and the theories of estimation and of testing hypotheses are theories of estimation of and of testing hypotheses about, one or more parameters, finite in number, the knowledge of which would completely determine the various distribution functions involved. We shall refer to this situation for brevity as the parametric case, and denote the opposite situation, where the functional forms of the distributions are unknown, as the non-parametric case.”

– Wolfowitz, J. (1942, p. 264)


Noether, G. E. (1984). Nonparametrics: The early years—impressions and recollections. American Statistician, 38(3), 173–178.

Wolfowitz, J. (1942). Additive Partition Functions and a Class of Statistical Hypotheses. The Annals of Mathematical Statistics, 13(3), 247–279.

ACME: average causal mediation effect

Suppose there are two groups in a study: treatment and control. There are two potential outcomes for an individual, \(i\): outcome under treatment, \(Y_i(1)\), and outcome under control, \(Y_i(0)\). Only one of the two potential outcomes can be realised and observed as \(Y_i\).

The treatment effect for an individual is defined as the difference in potential outcomes for that individual:

\(\mathit{TE}_i = Y_i(1) – Y_i(0)\).

Since we cannot observe both potential outcomes for any individual, we usually we make do with a sample or population average treatment effect (SATE and PATE). Although these are unobservable (they are the averages of unobservable differences in potential outcomes), they can be estimated. For example, with random treatment assignment, the difference in observed sample mean outcomes for the treatment and control is an unbiased estimator of SATE. If we also have a random sample from the population of interest, then this difference in sample means gives us an unbiased estimate of PATE.

Okay, so what happens if we add a mediator? The potential outcome is expanded to depend on both treatment group and mediator value.

Let \(Y_i(t, m)\) denote the potential outcome for \(i\) under treatment \(t\) and with mediator value \(m\).

Let \(M_i(t)\) denote the potential value of the mediator under treatment \(t\).

The (total) treatment effect is now:

\(\mathit{TE}_i = Y_i(1, M_i(1)) – Y_i(0, M_i(0))\).

Informally, the idea here is that we calculate the potential outcome under treatment, with the mediator value as it is under treatment, and subtract from that the potential outcome under control with the mediator value as it is under control.

The causal mediation effect (CME) is what we get when we hold the treatment assignment constant, but work out the difference in potential outcomes when the mediators are set to values they have under treatment and control:

\(\mathit{CME}_i(t) = Y_i(t, M_i(1)) – Y_i(t, M_i(0))\)

The direct effect (DE) holds the mediator constant and varies treatment:

\(\mathit{DE}_i(t) = Y_i(1, M_i(t)) – Y_i(0, M_i(t))\)

Note how both CME and DE depend on the treatment group. If there is no interaction between treatment and mediator, then

\(\mathit{CME}_i(0) = \mathit{CME}_i(1) = \mathit{CME}\)


\(\mathit{DE}_i(0) = \mathit{DE}_i(1) = \mathit{DE}\).

ACME and ADE are the averages of these effects. Again, since they are defined in terms of potential values (of outcome and mediator), they cannot be directly observed, but – given some assumptions – there are estimators.

Baron and Kenny (1986) provide an estimator in terms of regression equations. I’ll focus on two of their steps and assume there is no need to adjust for any covariates. I’ll also assume that there is no interaction between treatment and moderator.

First, regress the mediator (\(m\)) on the binary treatment indicator (\(t\)):

\(m = \alpha_1 + \beta_1 t\).

The slope \(\beta_1\) tells us how much the mediator changes between the two treatment conditions on average.

Second, regress the outcome (\(y\)) on both mediator and treatment indicator:

\(y = \alpha_2 + \beta_2 t + \beta_3 m\).

The slope \(\beta_2\) provides the average direct effect (ADE), since this model holds the mediator constant (note how this mirrors the definition of DE in terms of potential outcomes).

Now to work out the average causal mediation effect (ACME), we need to wiggle the outcome by however much the mediator moves between treat and control, whilst holding the treatment group constant. Slope \(\beta_1\) tells us how much the treatment shifts the mediator. Slope \(\beta_3\) tells us how much the outcome increases for every unit increase in the mediator, holding treatment constant. So \(\beta_1 \beta_3\) is ACME.

For more, especially on complicating the Baron and Kenny approach, see Imai et al. (2010).


Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.

Imai, K., Keele, L., & Yamamoto, T. (2010). Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Statistical Science, 25, 51–71.


Assumptions not often assessed or satisfied in published mediation analyses in psychology and psychiatry

Elizabeth Stuart et al. (2021) reviewed 206 articles using mediation analysis “in top academic psychiatry or psychology journals” from 2013-2018, to determine how many satisfied assumptions of mediation analysis.

Here are the headline results (% of papers):

(The assumption of no interaction of exposure and mediator is as a percentage of the 97% of studies that used the Baron and Kenny approach.)

Although 42% of studies discussed mediation assumptions, “in most cases this discussion was simply an acknowledgement that the data were cross sectional and thus results should be interpreted with caution.”


Stuart, E. A., Schmid, I., Nguyen, T., Sarker, E., Pittman, A., Benke, K., Rudolph, K., Badillo-Goicoechea, E., & Leoutsakos, J.-M. (2021). Assumptions not often assessed or satisfied in published mediation analyses in psychology and psychiatry. Epidemiologic Reviews, In Press.

The Halting Problem in R

It’s possible to write a program that peeks at another program and tells you something about it. Here’s a R function that tells you how long a function is:


function_length <- function(func) {
  body(func) |>
    as.character() |>
    str_length() |>

It can even be applied to itself: function_length(function_length) happens to be 42.

It might be handy to have an R function that tells you whether a function terminates in finite time, returning TRUE if it does and FALSE otherwise. Something like this, with magic in the ellipsis to make it work:

terminates <- function(f) {
  if (...)

For instance, consider this function:

is_a_loop_is_a_loop <- function() {
  while (TRUE) {}

Calling terminates(is_a_loop_is_a_loop) should return FALSE, since it has an infinite while loop.

It turns out that there is no such general function – a result due to Alan Turing (1937).

Here is an R version of Christopher Strachey’s (1965) proof, apparently inspired by a proof Turing told him verbally “in a railway carriage on the way to a conference” in 1953. Though Strachey confesses that he forgot the detail.

Suppose for the sake of argument that terminates does exist, with the ellipsis above filled in, and that it works as we hope.

Now, define another function, taking_the_p, as follows:

taking_the_p <- function() {
  if (terminates(taking_the_p)) 
    while (TRUE) {}

The argument has a Russell’s paradox flavour to it.

Suppose terminates(taking_the_p) is TRUE. Then the if statement will dutifully trigger an infinite loop, meaning that taking_the_p doesn’t terminate. So terminates got it wrong. Suppose taking_the_p does actually terminate. That means terminates(taking_the_p) must have returned FALSE – again it was wrong.

But we started the proof by assuming that terminates actually works. Contradiction. Therefore there is no general function terminates.

There are countably infinite computable functions. The terminates function is an example of one that is not computable.

This general result does not mean it is never possible to determine whether a function terminates. There’s an easy heuristic: run a function for a bit and if it hasn’t finished after a period of time, decide that it doesn’t terminate. It’s far from the general solution, but for what it’s worth here’s the R:


terminates <- function(expr, secs = 1) {
  terminates <- FALSE
  res <- NULL
    res <- withTimeout(expr,
                       timeout = secs)
    terminates <- TRUE
  }, silent = TRUE)


Note that this takes an expression rather than a function, so that it easily applies to functions of any arguments.

Let’s try two examples. The identity function (which simply returns its argument) applied to itself, and is_a_loop_is_a_loop. For the latter, note the empty parentheses, (), which is how to ask R to run a function rather than inspect it.

# [1] TRUE

# [1] FALSE

The timeout is doing most of the heavy lifting. I have chosen 1 second, which would rule out a large number of functions that do terminate.


Strachey, C. (1965). An impossible program. The Computer Journal, 7(4), 313–313.

Turing, A. M. (1937). On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 42(1), 230–265.

“The tendency of empiricism, unchecked, is always anti-realist…”

“The tendency of empiricism, unchecked, is always anti-realist; it has a strong tendency to degenerate into some form of verificationism: to treat the question of what there is (and even the question of what we can – intelligibly – talk about) as the same question as the question of what we can find out, or know for certain; to reduce questions of metaphysics and ontology to questions of epistemology.”
—Strawson, G. (1987, p. 267)

Strawson, G. (1987). Realism and causation. The Philosophical Quarterly, 37, 253–277.

Passage of time in the block universe

Bullet point notes for now… To be continued.

  • Minkowski posited a structure of four dimensional spacetime which treated time as another dimension like the three dimensions of space. “From now onwards space by itself and time by itself will recede completely to become mere shadows and only a type of union of the two will still stand independently on its own.”
  • One way to think about spacetime, known as eternalism or the block universe, is that everything that ever happened, is currently experienced by someone as happening (at a point in spacetime), or is going to happen, has a location in this 4D structure.
  • This challenges the idea that we have free will, since our future selves already exist somewhere in the 4D structure. But free will left us long before anyway if we assume that all events have causes (see Spinoza’s, 1677, Ethics Part 2 Proposition 48; or Strawson, 1994, on the Basic Argument). It is apparently very difficult to defend the possibility of free will.
  • Each point in spacetime depends on others close by (potential headaches here caused by quantum entanglement), and the dependencies stretch back to the Big Bang, which is down one end of the structure. For “dependencies” think information, passed between the points in spacetime, and satisfying the weird geometries of general relativity.
  • Time is ordered according to these dependencies. (And because of something about entropy which makes zero sense to me.)
  • Consciousness is part of the physical world, located in points in spacetime alongside bodies (and there is a lot of conscious stuff about according to panpsychism). Regions of spacetime containing someone from birth till death have millions of conscious experiences along the time dimension, each one frozen in time.
  • But if each conscious experience at each point in spacetime is frozen, how come it feels to us that the world is moving…?
  • The feeling of time passing is produced by us using perceptual memories, passed along the causal chain with everything else, and a comparison of these memories with whatever is perceived at a given location in spacetime.
  • It’s a hyper-complicated variation on how we recognise change in a graph of something changing over time, even though the graph itself is stationary. The graph has a record of what happened and we can use it to infer change.
  • So, although each conscious experience is frozen in time, it can feel like that conscious experience is moving if it is different to a perception – or series of perceptions – in memory.
  • If you have no perception, no memory, no way to compare memory representations, or if there is no information passed between two points in spacetime, then you have no sense of a passage of time between those two points.
  • This is a very rough sketch of how you could have a feeling of movement in an instant at a point in spacetime. BUT it doesn’t explain how all those instants are piped together. There would be millions of instants across spacetime, not a journey between them.
  • Possible way out: “some approaches to these questions center on the idea that an experience of something temporally extended is itself temporally extended. The experience itself takes time to unfold – in fact it takes as much time as the process that it is an experience of. Arguably, this makes it easier to understand the nature of temporal experience. We no longer have to ask how it is that multiple individual experiences, each succeeding the other, can add up to an experience of succession. Instead, we recognize that the fundamental experiential unit is itself temporally extended, and use this to explain how there can be an experience of a temporally extended content.” See Deng (2019).

Maybe more clues here…?

Pop sci

What Einstein May Have Gotten Wrong (2020)

Time’s Passage is Probably an Illusion (2014)

Does Time Really Flow? New Clues Come From a Century-Old Approach to Math (2020)

Journal articles and other peer-reviewed work

Dainton, Barry,  Temporal Consciousness, The Stanford Encyclopedia of Philosophy (Winter 2018 Edition), Edward N. Zalta (ed.)

Deng, N. (2017). Making Sense of the Growing Block View. Philosophia, 45(3), 1113–1127.

Deng, N. (2019). One Thing After Another: Why the Passage of Time Is Not an Illusion. In The Illusions of Time (pp. 3–15). Springer International Publishing.

Lee, G. (2007). Consciousness in a Space-Time World. Philosophical Perspectives, 21, 341–374.

Gruber, R. P., Smith, R. P., & Block, R. A. (2018). The Illusory Flow and Passage of Time within Consciousness: A Multidisciplinary Analysis. Timing and Time Perception, 6(2), 125–153.

Prosser, S. (2012). Why Does Time Seem to Pass? Philosophy and Phenomenological Research, 85(1), 92–116.

Skow, B. (2009). Relativity and the Moving Spotlight. Journal of Philosophy, 106(12), 666–678.



MPs who left or were removed from the Commons Chamber

Here’s an interesting Excel dataset on MPs who left or were removed from the Commons Chamber, compiled by Sarah Priddy of Commons Library.

Dennis Skinner got kicked out in 2005 for saying, “The only thing growing then were the lines of coke in front of boy George [Osborne]…”

Dawn Butler now added for calling out Boris Johnson’s lies.

MPs who have withdrawn from the Commons Chamber or who have been suspended