On rulers and see-saws

Once upon a time, tests were invented for measuring a property of people called H. Let’s call them rulers, though by today’s standards they look a bit odd — mostly straight, but with bends and loops, and not the same width the whole way along. Tick marks on the rulers were almost but not completely equally spaced. The rulers also work in a slightly odd way. You start at the smallest tick mark and score 1 if the person to be measured is taller than that position and 0 otherwise. You continue doing this until you run out of tick marks on the ruler and then sum together 1’s and 0’s to get an estimate for H.

Others have come up with similar devices, though there is not a one-to-one mapping between the resulting H measured. Some tests are easier to score highly on than others. Sometimes the rulers slip, or are too bendy, or the tick marks are hard to read in some places, so people pass for higher tick marks than they should be able to, or fail to reach a tick mark they should easily be able to. Through correlational studies, the scores can be put into correspondence, and are often normalised to have a mean of 100 and SD of 15 for a given population.

Similar devices have been made to measure W. One works by having participants sit on a see-saw whilst large objects of various shapes and sizes are placed on the other end. You score 1 for a test object if adding the object to the other end of the see-saw does not lift you off the ground and 0 otherwise.

Now something funny happened a few years after these tests were developed. It was found that H and W correlated with each other! One researcher argued that this was evidence of a general factor in anthropometry, and developed a statistical method to infer values for this factor, called factor analysis, given scores on a range of tests. It also turns out that this value, the g factor as it’s called, is predictive of a range of things including work productivity, social mobility, and longevity.  Different tests “load” to varying degrees on g.

Many researchers attacked g, arguing that people have no idea what it actually is. Also it has been suggested that the original, more specific measures H and W are more informative than g. But of course it remains interesting to know why H and W are linked. Is the correlation related to nutrition, perhaps influenced by parents and schooling on how much and what to eat (it has been observed that eating more increases W, but this doesn’t seem reliably to transfer to H)?  Genetics?  Mechanisms for digestion?

It has been argued that the best measures of g actually measure a range of different things — which makes a lot of sense given how g was discovered. The best tests look a bit like a combination of rulers and see-saws. One is called the Advanced Progressive Complicated Measuring Device (APCMD). Of course with such complicated tests, simultaneously measuring different things to get a high loading on g, it is difficult to get a nice test which appears to measure one thing without experimentation on large samples from various populations. But with the help of statistical tools (one is called a three-parameter logistic item response model — gosh!), it’s possible to fiddle with the tick marks, change the shapes of the weights, saw off bits of the see-saw, so that it appears that a single dimension is being measured.

A group of researchers have observed for the APCMD that some items of the test are better predictors of W than others, and some are better predictors of H than others. This is easy to see. For some items the see-saw isn’t even used at all, for others the ruler is left aside.

But the original designers of APCMD refuse to accept that the see-saw ruler — essential for the test — is really a see-saw ruler. Look! It’s a dynamic system being measured! Of course it will look like a complicated test. Just because it appears to have a see-saw and ruler bolted together, and that a see-saw measures W and a ruler measures H, it does not imply that together they measure W and H. In fact together they measure g, as is obvious to see because the total scores load highly on g. And, regardless, how can you explain the nice properties of the scores if they are the result of a messy combination of two dimensions?