## Sex versus gender

This comes up a lot – here’s my take:

The distinction between sex and gender seems straightforward. Sex refers to biology, e.g., chromosomes and genitalia. Gender refers to psychosocial processes, e.g., roles and expression. However, they are more complex than this neat division. Like happiness and pain, gender is partly ontologically subjective. Gender identity is ‘deeply felt’ and ‘not necessarily visible to others’ (American Psychological Association, 2015, p. 862).

Sex has social facets. The classic sociological work Doing Gender by West and Zimmerman (1987, p. 127) illustrates how a sex category is assigned at birth:

Sex is a determination made through the application of socially agreed upon biological criteria for classifying persons as females or males. […] Placement in a sex category is achieved through application of the sex criteria, but in everyday life, categorization is established and sustained by the socially required identificatory displays that proclaim one’s membership in one or the other category. In this sense, one’s sex category presumes one’s sex and stands as proxy for it in many situations […].

This assigned sex category affects children’s social life in a variety of arbitrary ways, e.g., how they are dressed and expectations about behaviour, and feeds into the development of gender. For the majority of social contexts where sex category is applied, chromosomes and genitals are concealed and irrelevant.

Gender has biological facets: minds do gender identity and are embodied in the brain and nervous system, so even the phenomenology of gender identity has a biological correlate somewhere (Serano, 2013, pp. 138–168). There is emerging evidence that gender identity is a complex trait that is part-heritable and polygenic. A recent systematic review suggests that its heritability is likely in the range 30−60% (Polderman et al., 2018).

The interwoven biopsychosocial nature of sex and gender has led some scientists to use the combined concept sex/gender (e.g., Rippon et al., 2014). This does not mean that the multiplicity of facets blend into an amorphous blob. It does mean that it is important to clarify what particular facets are intended when discussing measurement and theory: chromosomes, genitalia, gender identity, socialisation, etc. The view of sex as only biological and gender as only psychosocial is too simplistic to progress theorising.

Fugard, A. (2020). Should trans people be postmodernist in the streets but positivist in the spreadsheets? A reply to SullivanInternational Journal of Social Research Methodology23, 525–531. [Preprint here]

## Conceptual Framework for Public Mental Health

“This Conceptual Framework for Public Mental Health is an interactive web-based tool that brings together evidence from academic research, reports, and practitioner and public consultations to map out the factors affecting mental health across all stages of a person’s life, including links to key evidence and lived experiences.”

www.publicmentalhealth.co.uk

## Sample size determination for propensity score weighting

If you’re using propensity score weighting (e.g., inverse probability weighting), one question that will arise is how big a sample you need.

Solutions have been proposed that rely on a variance inflation factor (VIF). You calculate the sample size for a simple design and then multiply that by the VIF to take account of weighting.

But the problem is that it is difficult to choose a VIF in advance.

Austin (2021) has developed a simple method (R code in the paper) to estimate VIFs from c-statistics (area under the curve; AOC) of the propensity score models. These c-statistics are often published.

A larger c-statistic means a greater separation between treatment and control, which in turn leads to a larger VIF and requirement for a larger sample.

Picture illustrating different c-statistics.

The magnitude of the VIF also depends on the estimand of interest, e.g., whether average treatment effect (ATE), average treatment effect on the treated (ATET/ATT), or average treatment effect where treat and control overlap (ATO).

### References

Austin, P. C. (2021). Informing power and sample size calculations when using inverse probability of treatment weighting using the propensity score. Statistics in Medicine.

## Intersectionality, in under 200 words

If we try to eliminate pay gaps by monitoring only single characteristics like gender or ethnicity, we can still end up with pay gaps between combinations of characteristics. One way to do this would be to appoint white women and Black men to senior management positions, but not appoint any Black women.

The idea of an intersection comes from set theory and describes where two sets overlap. For instance, the intersection of the set of Black people and the set of women is the set of Black women.

Intersectionality is a broad framework that promotes the study and elimination of oppression and exploitation of people in terms of combinations of characteristics.

Is intersectionality a theory, explaining why this form of discrimination occurs? Here’s Patricia Hill Collins (2019, p.51), a leading scholar in this area:

“Every time I encounter an article that identifies intersectionality as a social theory, I wonder what conception of social theory the author has in mind. I don’t assume that intersectionality is already a social theory. Instead, I think a case can be made that intersectionality is a social theory in the making.”

### References

Collins, P. H. (2019).  Intersectionality As Critical Social Theory. Duke University Press.

## Two incontrovertible facts about RCTs

“… the following are two incontrovertible facts about a randomized clinical trial:

1. over all randomizations the groups are balanced;

2. for a particular randomization they are unbalanced.

Now, no [statistically] ‘significant imbalance’ can cause 1 to be untrue and no lack of a significant balance can make 2 untrue. Therefore the only reason to employ such a test must be to examine the process of randomization itself. Thus a significant result should lead to the decision that the treatment groups have not been randomized…”

– Senn (1994,  p. 1716)

Senn, S. (1994). Testing for baseline balance in clinical trials. Statistics in Medicine, 13, 1715–1726.

## Parametric versus non-parametric statistics

There is no such thing as parametric or non-parametric data. There are parametric and non-parametric statistical models.

“The term nonparametric may have some historical significance and meaning for theoretical statisticians, but it only serves to confuse applied statisticians.”

– Noether, G. E. (1984, p. 177)

“. . . the distribution functions of the various stochastic variables which enter into their problems are assumed to be of known functional form, and the theories of estimation and of testing hypotheses are theories of estimation of and of testing hypotheses about, one or more parameters, finite in number, the knowledge of which would completely determine the various distribution functions involved. We shall refer to this situation for brevity as the parametric case, and denote the opposite situation, where the functional forms of the distributions are unknown, as the non-parametric case.”

– Wolfowitz, J. (1942, p. 264)

### References

Noether, G. E. (1984). Nonparametrics: The early years—impressions and recollections. American Statistician, 38(3), 173–178.

Wolfowitz, J. (1942). Additive Partition Functions and a Class of Statistical Hypotheses. The Annals of Mathematical Statistics, 13(3), 247–279.

## ACME: average causal mediation effect

Suppose there are two groups in a study: treatment and control. There are two potential outcomes for an individual, $$i$$: outcome under treatment, $$Y_i(1)$$, and outcome under control, $$Y_i(0)$$. Only one of the two potential outcomes can be realised and observed as $$Y_i$$.

The treatment effect for an individual is defined as the difference in potential outcomes for that individual:

$$\mathit{TE}_i = Y_i(1) – Y_i(0)$$.

Since we cannot observe both potential outcomes for any individual, we usually we make do with a sample or population average treatment effect (SATE and PATE). Although these are unobservable (they are the averages of unobservable differences in potential outcomes), they can be estimated. For example, with random treatment assignment, the difference in observed sample mean outcomes for the treatment and control is an unbiased estimator of SATE. If we also have a random sample from the population of interest, then this difference in sample means gives us an unbiased estimate of PATE.

Okay, so what happens if we add a mediator? The potential outcome is expanded to depend on both treatment group and mediator value.

Let $$Y_i(t, m)$$ denote the potential outcome for $$i$$ under treatment $$t$$ and with mediator value $$m$$.

Let $$M_i(t)$$ denote the potential value of the mediator under treatment $$t$$.

The (total) treatment effect is now:

$$\mathit{TE}_i = Y_i(1, M_i(1)) – Y_i(0, M_i(0))$$.

Informally, the idea here is that we calculate the potential outcome under treatment, with the mediator value as it is under treatment, and subtract from that the potential outcome under control with the mediator value as it is under control.

The causal mediation effect (CME) is what we get when we hold the treatment assignment constant, but work out the difference in potential outcomes when the mediators are set to values they have under treatment and control:

$$\mathit{CME}_i(t) = Y_i(t, M_i(1)) – Y_i(t, M_i(0))$$

The direct effect (DE) holds the mediator constant and varies treatment:

$$\mathit{DE}_i(t) = Y_i(1, M_i(t)) – Y_i(0, M_i(t))$$

Note how both CME and DE depend on the treatment group. If there is no interaction between treatment and mediator, then

$$\mathit{CME}_i(0) = \mathit{CME}_i(1) = \mathit{CME}$$

and

$$\mathit{DE}_i(0) = \mathit{DE}_i(1) = \mathit{DE}$$.

ACME and ADE are the averages of these effects. Again, since they are defined in terms of potential values (of outcome and mediator), they cannot be directly observed, but – given some assumptions – there are estimators.

Baron and Kenny (1986) provide an estimator in terms of regression equations. I’ll focus on two of their steps and assume there is no need to adjust for any covariates. I’ll also assume that there is no interaction between treatment and moderator.

First, regress the mediator ($$m$$) on the binary treatment indicator ($$t$$):

$$m = \alpha_1 + \beta_1 t$$.

The slope $$\beta_1$$ tells us how much the mediator changes between the two treatment conditions on average.

Second, regress the outcome ($$y$$) on both mediator and treatment indicator:

$$y = \alpha_2 + \beta_2 t + \beta_3 m$$.

The slope $$\beta_2$$ provides the average direct effect (ADE), since this model holds the mediator constant (note how this mirrors the definition of DE in terms of potential outcomes).

Now to work out the average causal mediation effect (ACME), we need to wiggle the outcome by however much the mediator moves between treat and control, whilst holding the treatment group constant. Slope $$\beta_1$$ tells us how much the treatment shifts the mediator. Slope $$\beta_3$$ tells us how much the outcome increases for every unit increase in the mediator, holding treatment constant. So $$\beta_1 \beta_3$$ is ACME.

For more, especially on complicating the Baron and Kenny approach, see Imai et al. (2010).

### References

Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.

Imai, K., Keele, L., & Yamamoto, T. (2010). Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Statistical Science, 25, 51–71.

## Updating R

For ease of copy-paste:

install.packages("installr")
library(installr)
updateR()

## Assumptions not often assessed or satisfied in published mediation analyses in psychology and psychiatry

Elizabeth Stuart et al. (2021) reviewed 206 articles using mediation analysis “in top academic psychiatry or psychology journals” from 2013-2018, to determine how many satisfied assumptions of mediation analysis.

Here are the headline results (% of papers):

(The assumption of no interaction of exposure and mediator is as a percentage of the 97% of studies that used the Baron and Kenny approach.)

Although 42% of studies discussed mediation assumptions, “in most cases this discussion was simply an acknowledgement that the data were cross sectional and thus results should be interpreted with caution.”

### References

Stuart, E. A., Schmid, I., Nguyen, T., Sarker, E., Pittman, A., Benke, K., Rudolph, K., Badillo-Goicoechea, E., & Leoutsakos, J.-M. (2021). Assumptions not often assessed or satisfied in published mediation analyses in psychology and psychiatry. Epidemiologic Reviews, In Press.

## The Halting Problem in R

It’s possible to write a program that peeks at another program and tells you something about it. Here’s a R function that tells you how long a function is:

library(stringr)

function_length <- function(func) {
body(func) |>
as.character() |>
str_length() |>
sum()
}

It can even be applied to itself: function_length(function_length) happens to be 42.

It might be handy to have an R function that tells you whether a function terminates in finite time, returning TRUE if it does and FALSE otherwise. Something like this, with magic in the ellipsis to make it work:

terminates <- function(f) {
if (...)
TRUE
else
FALSE
}

For instance, consider this function:

is_a_loop_is_a_loop <- function() {
while (TRUE) {}
}

Calling terminates(is_a_loop_is_a_loop) should return FALSE, since it has an infinite while loop.

It turns out that there is no such general function – a result due to Alan Turing (1937).

Here is an R version of Christopher Strachey’s (1965) proof, apparently inspired by a proof Turing told him verbally “in a railway carriage on the way to a conference” in 1953. Though Strachey confesses that he forgot the detail.

Suppose for the sake of argument that terminates does exist, with the ellipsis above filled in, and that it works as we hope.

Now, define another function, taking_the_p, as follows:

taking_the_p <- function() {
if (terminates(taking_the_p))
while (TRUE) {}
}

The argument has a Russell’s paradox flavour to it.

Suppose terminates(taking_the_p) is TRUE. Then the if statement will dutifully trigger an infinite loop, meaning that taking_the_p doesn’t terminate. So terminates got it wrong. Suppose taking_the_p does actually terminate. That means terminates(taking_the_p) must have returned FALSE – again it was wrong.

But we started the proof by assuming that terminates actually works. Contradiction. Therefore there is no general function terminates.

There are countably infinite computable functions. The terminates function is an example of one that is not computable.

This general result does not mean it is never possible to determine whether a function terminates. There’s an easy heuristic: run a function for a bit and if it hasn’t finished after a period of time, decide that it doesn’t terminate. It’s far from the general solution, but for what it’s worth here’s the R:

library(R.utils)

terminates <- function(expr, secs = 1) {
terminates <- FALSE
res <- NULL
try({
res <- withTimeout(expr,
timeout = secs)
terminates <- TRUE
}, silent = TRUE)

terminates
}

Note that this takes an expression rather than a function, so that it easily applies to functions of any arguments.

Let’s try two examples. The identity function (which simply returns its argument) applied to itself, and is_a_loop_is_a_loop. For the latter, note the empty parentheses, (), which is how to ask R to run a function rather than inspect it.

terminates(identity(identity))
# [1] TRUE

terminates(is_a_loop_is_a_loop())
# [1] FALSE

The timeout is doing most of the heavy lifting. I have chosen 1 second, which would rule out a large number of functions that do terminate.

### References

Strachey, C. (1965). An impossible program. The Computer Journal, 7(4), 313–313.

Turing, A. M. (1937). On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 42(1), 230–265.