On rulers and see-saws

Once upon a time, tests were invented for measuring a property of people called H. Let’s call them rulers, though by today’s standards they look a bit odd — mostly straight, but with bends and loops, and not the same width the whole way along. Tick marks on the rulers were almost but not completely equally spaced. The rulers also work in a slightly odd way. You start at the smallest tick mark and score 1 if the person to be measured is taller than that position and 0 otherwise. You continue doing this until you run out of tick marks on the ruler and then sum together 1’s and 0’s to get an estimate for H.

Others have come up with similar devices, though there is not a one-to-one mapping between the resulting H measured. Some tests are easier to score highly on than others. Sometimes the rulers slip, or are too bendy, or the tick marks are hard to read in some places, so people pass for higher tick marks than they should be able to, or fail to reach a tick mark they should easily be able to. Through correlational studies, the scores can be put into correspondence, and are often normalised to have a mean of 100 and SD of 15 for a given population.

Similar devices have been made to measure W. One works by having participants sit on a see-saw whilst large objects of various shapes and sizes are placed on the other end. You score 1 for a test object if adding the object to the other end of the see-saw does not lift you off the ground and 0 otherwise.

Now something funny happened a few years after these tests were developed. It was found that H and W correlated with each other! One researcher argued that this was evidence of a general factor in anthropometry, and developed a statistical method to infer values for this factor, called factor analysis, given scores on a range of tests. It also turns out that this value, the g factor as it’s called, is predictive of a range of things including work productivity, social mobility, and longevity.  Different tests “load” to varying degrees on g.

Many researchers attacked g, arguing that people have no idea what it actually is. Also it has been suggested that the original, more specific measures H and W are more informative than g. But of course it remains interesting to know why H and W are linked. Is the correlation related to nutrition, perhaps influenced by parents and schooling on how much and what to eat (it has been observed that eating more increases W, but this doesn’t seem reliably to transfer to H)?  Genetics?  Mechanisms for digestion?

It has been argued that the best measures of g actually measure a range of different things — which makes a lot of sense given how g was discovered. The best tests look a bit like a combination of rulers and see-saws. One is called the Advanced Progressive Complicated Measuring Device (APCMD). Of course with such complicated tests, simultaneously measuring different things to get a high loading on g, it is difficult to get a nice test which appears to measure one thing without experimentation on large samples from various populations. But with the help of statistical tools (one is called a three-parameter logistic item response model — gosh!), it’s possible to fiddle with the tick marks, change the shapes of the weights, saw off bits of the see-saw, so that it appears that a single dimension is being measured.

A group of researchers have observed for the APCMD that some items of the test are better predictors of W than others, and some are better predictors of H than others. This is easy to see. For some items the see-saw isn’t even used at all, for others the ruler is left aside.

But the original designers of APCMD refuse to accept that the see-saw ruler — essential for the test — is really a see-saw ruler. Look! It’s a dynamic system being measured! Of course it will look like a complicated test. Just because it appears to have a see-saw and ruler bolted together, and that a see-saw measures W and a ruler measures H, it does not imply that together they measure W and H. In fact together they measure g, as is obvious to see because the total scores load highly on g. And, regardless, how can you explain the nice properties of the scores if they are the result of a messy combination of two dimensions?

How to get someone’s g

“Intelligence”, “IQ”, “g” (due to Spearman), are terms that are bandied around.

The following may be helpful: the gist of how to calculate someone’s g score, which is often used as the measure of someone’s “intelligence”.

For example, that’s the “IQ”/”intelligence” referred to in the recentish BBC article on research linking childhood intelligence and adult vegetarianism (clever children grow into clever vegetarian adults).

  1. Give hundreds or thousands of people a dozen tests of ability.
  2. Zap everyone’s scores with PCA or factor analysis.
  3. g is the first component and usually explains around half the variance.  Here’s an example genre of analysis of g with other facets to psychometric intelligence.
  4. Use the component to calculate a score.  For factor analysis there are many ways to do this, e.g. Thompson’s scores, Bartlett’s weighted least-squares.  The gist is that for each person you compute a weighted sum of their scores, where the weights are a function of how loaded the particular test score was on g.
  5. To get something resembling an IQ score, scale it so it has a mean of 100 and an SD of 15.
  6. Talk about it as if it were a substantive psychological construct, rather than a statistical artefact 😉

What is this mysterious g thing?

Ye olde Spearman

A good test of whether someone understands g is if they characterise it as a general factor in intelligence test scores and not as general intelligence (i.e., a substantive rather than statistical construct). But it’s interesting to see what Spearman originally said in his 1904 “General Intelligence,” Objectively Determined and Measured. On p. 272:

“… we reach the profoundly important conclusion that there really exists a something that we may provisionally term “General Sensory Discrimination” and similarly a “General Intelligence,” and further that the functional correspondence between these two is not appreciably less than absolute.”

He goes on to describe this as a “general theorem”, refining it to (p. 273):

Whenever branches of intellectual activity are at all dissimilar, then their correlations with one another appear wholly due to their being all variously saturated with some common fundamental Function (or group of Functions).”

(Enthusiastic emphasis in original.)

There’s a recent argument against this (though perhaps not quite, given Spearman’s parenthetical “group of Functions”), by van der Maas, et al. (2006). The abstract:

“Scores on cognitive tasks used in intelligence tests correlate positively with each other, i.e., they display a positive manifold of correlations. The positive manifold is often explained by positing a dominant latent variable, the g-factor, associated with a single quantitative cognitive or biological process or capacity. In this paper we propose a new explanation of the positive manifold based on a dynamical model, in which reciprocal causation or mutualism plays a central role. It is shown that the positive manifold emerges purely by positive beneficial interactions between cognitive processes during development. A single underlying g-factor plays no role in the model. The model offers explanations of important findings in intelligence research, such as the hierarchical factor structure of intelligence, the low predictability of intelligence from early childhood performance, the integration/differentiation effect, the increase in heritability of g, the Jensen effect, and is consistent with current explanations of the Flynn effect.”