Do you often cry while having a warm feeling in the heart because you find something beautiful? If so, you might be intrigued by this study of the Geneva Sentimentality Scale!
Category: Ceci n’est pas une catégorie
Estimating causal effects with optimization-based methods
Cousineau et al. (2023) compared seven optimisation-based methods for estimating causal effects, using 7700 datasets from the 2016 Atlantic Causal Inference competition. These datasets use real covariates with simulated treatment assignment and response functions, so it’s real-world-inspired data, with the advantage that the true effect (here, sample average treatment effect; SATT) is known. See the supplementary material of Dorie et al.’s (2019) paper for more info on how the sims were setup.
The methods they compared were:
Method | R package | Function used |
---|---|---|
Approximate residual balancing (ARB) | balanceHD 1.0 | residualBalance.ate |
Covariate balancing propensity score (CBPS) | CBPS 0.21 | CBPS |
Entropy balancing (EBal) | ebal 0.1–6 | ebalance |
Genetic matching (GenMatch) | Matching 4.9–9 | GenMatch |
Kernel balancing (KBal) | kbal 0.1 | kbal |
Stable balancing weights (SBW) | sbw 1.1.1 | sbw |
I’m hearing entropy balancing discussed a lot, so had my eye on this.
Bias was the estimated SATT minus true SATT (i.e., the +/- sign was kept; I’m not sure what to make of that when averaging biases from analyses of multiple datasets). The root-mean-square error (RMSE) squares the bias from each estimate first, removing the sign, before averaging and square rooting, which seems easier to interpret.
Findings below. N gives the number of datasets out of 7700 where SATT could be estimated; red where my eyebrows were raised and pink for entropy balancing and its RMSE:
Bias | Time | ||||
---|---|---|---|---|---|
Method | N | Mean | SD | RMSE | Mean (sec) |
kbal | 7700 | 0.036 | 0.083 | 0.091 | 2521.3 |
balancehd | 7700 | 0.041 | 0.099 | 0.107 | 2.0 |
sbw | 4513 | 0.041 | 0.102 | 0.110 | 254.9 |
cbps_exact | 7700 | 0.041 | 0.105 | 0.112 | 6.4 |
ebal | 4513 | 0.041 | 0.110 | 0.117 | 0.2 |
cbps_over | 7700 | 0.044 | 0.117 | 0.125 | 17.3 |
genmatch | 7700 | 0.052 | 0.141 | 0.151 | 8282.4 |
This particular implementation of entropy balancing failed to find a solution for about 40% of the datasets! Note, however:
“All these optimization-based methods are executed using their default parameters on R 4.0.2 to demonstrate their usefulness when directly used by an applied researcher” (emphasis added).
Maybe tweaking the settings would have improved the success rate. And #NotAllAppliedResearchers 🙂
Below is a comparison with a bunch of other methods from the competition, for which findings were already available on a GitHub repo (see Dorie et al., 2019, Table 2 and 3, for more info on each method).
Bias | 95% CI | ||||
---|---|---|---|---|---|
Method | N | Mean | SD | RMSE | coverage (%) |
bart on pscore | 7700 | 0.001 | 0.014 | 0.014 | 88.4 |
bart tmle | 7700 | 0.000 | 0.016 | 0.016 | 93.5 |
mbart symint | 7700 | 0.002 | 0.017 | 0.017 | 90.3 |
bart mchains | 7700 | 0.002 | 0.017 | 0.017 | 85.7 |
bart xval | 7700 | 0.002 | 0.017 | 0.017 | 81.2 |
bart | 7700 | 0.002 | 0.018 | 0.018 | 81.1 |
sl bart tmle | 7689 | 0.003 | 0.029 | 0.029 | 91.5 |
h2o ensemble | 6683 | 0.007 | 0.029 | 0.030 | 100.0 |
bart iptw | 7700 | 0.002 | 0.032 | 0.032 | 83.1 |
sl tmle | 7689 | 0.007 | 0.032 | 0.032 | 87.6 |
superlearner | 7689 | 0.006 | 0.038 | 0.039 | 81.6 |
calcause | 7694 | 0.003 | 0.043 | 0.043 | 81.7 |
tree strat | 7700 | 0.022 | 0.047 | 0.052 | 87.4 |
balanceboost | 7700 | 0.020 | 0.050 | 0.054 | 80.5 |
adj tree strat | 7700 | 0.027 | 0.068 | 0.074 | 60.0 |
lasso cbps | 7108 | 0.027 | 0.077 | 0.082 | 30.5 |
sl tmle joint | 7698 | 0.010 | 0.101 | 0.102 | 58.9 |
cbps | 7344 | 0.041 | 0.099 | 0.107 | 99.7 |
teffects psmatch | 7506 | 0.043 | 0.099 | 0.108 | 47.0 |
linear model | 7700 | 0.045 | 0.127 | 0.135 | 22.3 |
mhe algorithm | 7700 | 0.045 | 0.127 | 0.135 | 22.8 |
teffects ra | 7685 | 0.043 | 0.133 | 0.140 | 37.5 |
teffects ipwra | 7634 | 0.044 | 0.161 | 0.166 | 35.3 |
teffects ipw | 7665 | 0.042 | 0.298 | 0.301 | 39.0 |
I’ll leave you to read the original for commentary on this, but check out the RMSE and CI coverage. Linear model is summarised as “Linear model/ordinary least squares”. I assume covariates were just entered as main effects, which is a little unfair. The simulations included non-linearity and diagnostic checks on models, such as partial residual plots, would spot this. Still doesn’t do too badly – better than genetic matching!
Interestingly the RMSE was a tiny bit worse for entropy balancing than for Stata’s teffects psmatch, which in simulations was setup to use nearest-neighbour matching on propensity scores estimated using logistic regression (I presume the defaults – I’m an R user).
The winners were all regression-based or what the authors called “mixed methods” – in this context meaning some genre of doubly-robust method that combined matching/weighting with regression adjustment. Bayesian additive regression trees (BART) feature towards the best end of the table. These sorts of regression-based methods don’t allow the design phase to be clearly separated from the estimation phase. For matching approaches where this separation is possible, the outcomes data can be held back from analysts until matches are found or weights estimated based only on covariates. Where the analysis also demands access to outcomes, a robust approach is needed, including a highly-specified and published statistical analysis plan and e.g., holding back some data in a training and validation phase before fitting the final model.
No info is provided on CI coverage for the seven optimisation-based methods they tested. This is why (Cousineau et al., 2023, p. 377):
“While some of these methods did provide some functions to estimate the confidence intervals (i.e., balancehd, sbw), these did not work due to the collinearity of the covariates. While it could be possible to obtain confidence intervals with bootstrapping for all methods, we did not pursue this avenue due to the computational resources that would be needed for some methods (e.g., kbal) and to the inferior results in Table 5 that did not warrant such resources.”
It would be interesting to zoom in on a smaller set of options and datasets and perhaps allow some more researcher input on how analyses are carried out.
References
Cousineau, M., Verter, V., Murphy, S. A., & Pineau, J. (2023). Estimating causal effects with optimization-based methods: A review and empirical comparison. European Journal of Operational Research, 304(2), 367–380.
Dorie, V., Hill, J., Shalit, U., Scott, M., & Cervone, D. (2019). Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition. Statistical Science, 34(1).
Mermin’s (1981) variant of Bell’s theorem – in R
Entanglement is the weirdest feature of quantum mechanics. David Mermin (1981) provides an accessible introduction to experiments showing that local determinism doesn’t hold in the quantum world, simplifying Bell’s theorem and tests thereof. This knitted Markdown file shows the sums in R. It’s probably only going to make sense if you have been here before, but hadn’t got around to doing the sums yourself (that was me, before writing this today!).
Superdeterminism
Bell (1981, C2-57):
“… it may be that it is not permissible to regard the experimental settings a and b in the analyzers as independent variables, as we did. We supposed them in particular to be independent of the supplementary [a.k.a. hidden] variables λ, in that a and b could be changed without changing the probability distribution ρ(λ). Now even if we have arranged that a and b are generated by apparently random radioactive devices, housed in separate boxes and thickly shielded, or by Swiss national lottery machines, or by elaborate computer programmes, or by apparently free willed experimental physicists, or by some combination of all of these, we cannot be sure that a and b are not significantly influenced by the same factors λ that influence A and B [measurement outcomes]. But this way of arranging quantum mechanical correlations would be even more mind boggling that one in which causal chains go faster than light. Apparently separate parts of the world would be deeply and conspiratorially entangled, and our apparent free will would be entangled with them.”
Hance and Hossenfelder (2022, p. 1382) on the assumption of statistical independence of supplementary/hidden variables and experimental settings:
“Types of hidden variables theories which violate statistical independence include those which are superdeterministic, retrocausal, and supermeasured. Some have dismissed them on metaphysical grounds, by associating a violation of statistical independence with the existence of ‘free will’ or ‘free choice’ and then arguing that these are not assumptions we should give up.
“It is, in hindsight, difficult to understand how this association came about. We believe it originated in the idea that a correlation between the hidden variables and the measurement setting would somehow prevent the experimentalist from choosing the setting to their liking. However, this is mistaking a correlation with a causation. And any serious philosophical discussion of free will acknowledges that human agency is of course constrained by the laws of nature anyway.”
References
Bell, J. S. (1981). Bertlmann’s socks and the nature of reality. Le Journal de Physique Colloques, 42(C2), C2-41-C2-62. Reprinted in Bell (2004).
Bell, J. S. (2004). Speakable and unspeakable in quantum mechanics: Collected papers on quantum philosophy (2nd ed.). Cambridge University Press.
A cynical view of SEMs
It is all too common for a box and arrow diagram to be cobbled together in an afternoon and christened a “theory of change”. One formalised version of such a diagram is a structural equation model (SEM), the arrows of which are annotated with coefficients estimated using data. Here is John Fox (2002) on SEM and informal boxology:
“A cynical view of SEMs is that their popularity in the social sciences reflects the legitimacy that the models appear to lend to causal interpretation of observational data, when in fact such interpretation is no less problematic than for other kinds of regression models applied to observational data. A more charitable interpretation is that SEMs are close to the kind of informal thinking about causal relationships that is common in social-science theorizing, and that, therefore, these models facilitate translating such theories into data analysis.”
References
Fox, J. (2002). Structural Equation Models: Appendix to An R and S-PLUS Companion to Applied Regression. Last corrected 2006.
Flowcharts in R
ggflowchart looks fun. Quick example here.
A simple circuit
Here is a simple quantum computing circuit:
There are two qubits (quantum bits), q[0] and q[1], and two classical bits, c[0] and c[1]. The latter will be used to store results of measuring the former.
Read the circuit left to right.
∣0⟩ is a qubit that will always have a measurement outcome of 0 (in the computational basis).
H is a Hadamard gate that puts that ∣0⟩ into a “superposition” (a sum) of both the “basis states” ∣0⟩ and ∣1⟩. The resulting superposition will collapse to either ∣0⟩ or ∣1⟩ with equal probability when measured (again, assuming the computational basis is used).
The next items on the circuit that look like little dials with cables attached denote measurement. Qubit q[0] is measured first and the result saved into c[0], then q[1] is measured and the result is saved into c[1]. The two qubits are unentangled, which means that measuring one has no effect on the other. (See this post for an example with entanglement.)
So basically this circuit is a fancy way to flip two coins, using quantum objects in superposition rather than metal discs. You can run it on a real quantum computer for free at IBM Quantum. I used such a circuit to decide what to do at the weekend, choosing randomly from four options. With \(n\) qubits you can do this for \(2^n\) options. It took about an hour to get the answer. There may be better things to do with quantum computers…
Appeal to consequences fallacy in understanding Bell’s theorem
Joan Vaccaro (2018, p. 11) on arguments against superdeterminism:
“An argument that has been advocated by leading physicists is that humans are necessarily independent of the universe that surrounds them because the practice of science requires the independence of the experimenter from the subject of study. For example, Bell et al. state that unless the experimenter and subject are independent, we would need to abandon ‘…the whole enterprise of discovering the laws of nature by experimentation’, and Zeilinger claims that if the experimenter and subject were not independent ‘…such a position would completely pull the rug out from underneath science.’ However, this argument contains a logical fallacy called an appeal to consequences. Specifically, arguing for experimenter–subject independence on the basis that the alternative has undesirable consequences does not prove that experimenters are independent of their subjects. Rather, the alternative may well be true, in which case we would need to deal with the consequences.”
References
Beautiful friendships have been jeopardised
This is an amusing opening to a paper on face validity, by Mosier (1947):
“Face validity is a term that is bandied about in the field of test construction until it seems about to become a part of accepted terminology. The frequency of its use and the emotional reaction which it arouses-ranging almost from contempt to highest approbation-make it desirable to examine its meaning more closely. When a single term variously conveys high praise or strong condemnation, one suspects either ambiguity of meaning or contradictory postulates among those using the term. The tendency has been, I believe, to assume unaccepted premises rather than ambiguity, and beautiful friendships have been jeopardized when a chance remark about face validity has classed the speaker among the infidels.”
I think dozens of beautiful friendships have been jeopardized by loose talk about randomised controlled trials, theory-based evaluation, realism, and positivism, among many others. I’ve just seen yet another piece arguing that you wouldn’t evaluate a parachute with an RCT and I can’t even.