Be careful when tweaking only one variable

Nice example of how tweaking only one variable can have an unintended effect on another:

“I began thinking about all this after reading an NBER working paper by Christopher Carpenter and Mark Stehr titled Intended and Unintended Effects of Youth Bicycle Helmet Laws. More than 20 states in the US have adopted laws requiring young people to wear a helmet when riding a bicycle. The authors found that these laws increased helmet use and reduced youth fatalities from cycling accidents by roughly 19 per cent, which is a good thing. But they also found that the laws significantly reduced, by between 4 and 5 per cent, the amount of cycling done by young people, which was not the intention.

“… The moral of this tale is to be careful when making policy, as you may get a response you didn’t plan for.”

Reporting standardised/simple effect size

I’ve moaned a bit about (what felt at the time to be a religion of) “effect size”. Recently Thom Baguley has published a paper on the topic, comparing standardised effects measures, which involve scaling with respect to the sample variance, with simple effects measures, which are expressed in the original units of measurement.

Baguley reviews some of the problems with standardised measures, all related to factors affecting sample variance. In general he advises reporting simple effect sizes, and preferably with confidence intervals.  If you really want to use standardised measures, for instance to compare conceptually similar measures on different scales, then he advises against reporting absolute and “canned” judgements like “small”, “medium”, and “large”, arguing instead in favour of descriptions about the relative size of effects.

I like his Tukey quote:

“… being so disinterested in our variables that we do not care about their units can hardly be desirable.”

It does seem odd to focus on, e.g., how much variance is explained rather than actually characterising the nature of relationships between variables.


Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100, 603–617.


(Here’s a wikipedia link.)  Apparently named in honour of Charlie Winsor (Huber, 2002), with whom Tukey had (a mean of) 1.9 meals per day over a period of 3 years (Fernholz and Morgenthaler, 2003).  Winsor, an “engineer-turned-physiologist-turned-statistician”, converted Tukey to stats (Brillinger, 2002).

A nice biographical detail about Winsor (from here):

I have heard gossip
that he was brilliant,
lazy and died young.


Peter J. Huber (2002).  John W. Tukey’s Contributions to Robust Statistics.  The Annals of Statistics, 30(6), 1640-1648.

Luisa Turrin Fernholz and Stephan Morgenthaler (2003).  A Conversation with John W. Tukey.   Statistical Science, 18(3), 346-356.

David R. Brillinger (2002).  John W. Tukey: his life and professional contributions. Annals of Statistics, 30, 1535-1575.

Individual differences (continued)

“I am surprised that the author has used this data set. In my lab, when we collect data with such large individual differences, we refer to the data as ‘junk’. We then redesign our stimuli and/or experimental procedures, and run a new experiment. The junk data never appear in publications”

—An anonymous reviewer in 2005, commenting on research that sought to model individual differences in cognition.

From the intro to Navarro, D. J.; Griffiths, T. L.; Steyvers, M. & Lee, M. D. Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 2006, 50, 101-122

Maximum likelihood

From an old post to the R mailing list by Peter Dalgaard:

“the possibility of finding maximum likelihood estimates by writing up the likelihood and maximizing it is often overlooked…”

R really is magical sometimes. Suppose you want to fit a distribution, M. All you need is to maximise \(\prod_{i=1}^N P(x_i | M)\), or equivalently, \(\sum_{i=1}^N \log P(x_i | M)\). Here’s an example of fitting a Gaussian, starting by breaking a fairly good first guess…

> x = rnorm(1000, 100, 15)
> f = function(p) -2*sum(dnorm(x, p[1], p[2], log=T))
> optim(c(mean(x)-50, sd(x)+15), f)
[1] 100 15

[1] 8193

function gradient
69 NA

[1] 0


(Well actually -2 times the log-likelihood.) Now to have a look at your estimate:

hist(x, probability=T)
curve(dnorm(x,100,15), min(x), max(x), add=T)

It’s funny how the same names keep popping up…

I first heard of Per Martin-Löf through his work in intuitionist logic, which turned out to be important in computer science (see Nordström, Petersson, and Smith, 1990).  His name has popped up again (Martin-Löf, 1973), this time in the context of his conditional likelihood ratio test, apparently used by Item Response Theory folk to assess whether two groups of items test the same ability (see Wainer et al, 1980).  Small world.


Martin-Löf, P. (1973). Statistiska modeller. Anteckningar fran seminarier lasaret 1969–1970 utarbetade av rolf sundberg. Obetydligt ändrat nytryck, october 1973 (photocopied manuscript). Institutet för Säkringsmatematik och Matematisk Statistik vid Stockholms Universitet.

Bengt Nordström, Kent Petersson, and Jan M. Smith. (1990). Programming in Martin-Löf’s Type Theory. Oxford University Press.

Howard Wainer, Anne Morgan and Jan-Eric Gustafsson (1980).  A Review of Estimation Procedures for the Rasch Model with an Eye toward Longish Tests.  Journal of Educational Statistics, 5, 35-64

Some advice on factor analysis from the 60s

“What are the alternatives to factor (or component) analysis if one has a correlation whose analysis one cannot escape? There is only one alternative method of analysing a correlation matrix which needs to be mentioned, and that is to LOOK AT IT.”

“Quite the best alternative to factor analysis is to avoid being saddled with the analysis of a correlation matrix in the first place. (Just to collect a lot of people, to measure them all on a lot of variables, and then to compute a correlation matrix is, after all, not a very advanced way of investigating anything.)”

From Andrew S. C. Ehrenberg (1962). Some Questions About Factor Analysis. The Statistician, 12(3), 191-208

13 ways to look at (Galton-Pearson) correlation

Found this paper on having a nosy around to see different ways of correlating non-Gaussian variables: Joseph Lee Rodgers and W. Alan Nicewander (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42(1), 59-66.

Therein you’ll find details of the history (apparently Gauss got there first, but didn’t care about the special case of bivariate correlation); a range of examples of how to get the coefficient (e.g., standardised covariance, standardised regression slope, a geometric interpretation in “person space”, the balloon rule). Also a nice reminder that, in terms of the maths, the dichotomy between experimental and observational analysis is false: the difference lies in interpretation. Still many people seem to think that ANOVA is for experiments and regression is for observational studies (or that SEM magically deals with causation in observational studies).

All amusing stuff.

A couple of properties of correlation

Spotted these in Langford, E.; Schwertman, N. & Owens, M. (2001) [Is the Property of Being Positively Correlated Transitive? The American Statistician, 55, 322-325.]

1. Let U, V, and W be independent random variables. Define X = U+V, Y = V+W, and Z = WU. Then the correlation between X and Y is positive, Y and Z is positive, but the correlation between X and Z is negative.

It’s easy to see why.  X and Y are both V but with different uncorrelated noise terms. Y and Z have W in common, again with different noise terms. Now X and Z have U in common: for this pair, X is U plus some noise and Z is –U plus some noise which is uncorrelated with the noise in X.

2. If X, Y, and Z are random variables, and X and Y are correlated (call the coefficient \(r_1\)), Y and Z are correlated (\(r_2\)), and \(r_1^2 + r_2^2 > 1\), then X and Z are positively correlated.

Mediation analysis

Glad it’s not just me…

… mediation: a crucial issue in causal inference and a difficult issue to think about. The usual rhetorical options here are:

– Blithe acceptance of structural equation models (of the form, “we ran the analysis and found that A mediates the effects of X on Y”)

– Blanket dismissal (of the form, “estimating mediation requires uncheckable assumptions, so we won’t do it”)

– Claims of technological wizardry (of the form, “with our new method you can estimate mediation from observational data”)

For example, in our book, Jennifer and I illustrate that regression estimates of mediation make strong assumptions, and we vaguely suggest that something better might come along. We don’t provide any solutions or even much guidance.

This is from a blog positing by Andrew Gelman.  He links to a paper which purports to solve the problem, but it looks Hard.