Arbitrary metrics?

Reading a preprint of Uses and Abuses of Intelligence (blurb here) at the moment. I’ll try to include some excerpts and comments on here over the next while, when I get the chance.

The first bit: the term “arbitrary metrics” in the preface.

The term arbitrary metrics refers to all tests which do not have properties approximating those of a tape measure (or “interval scale” in psychological terminology). These tests:

  • Are not constructed to have a graded series of items of increasing difficulty such that people “pass” all the items up to a given point and “fail” all subsequent items.
  • Do not generate a score distribution such that a given difference between scores means the same thing at different points in the scale. (A common example of failure to meet this requirement is that there are many “easy” items, closely spaced in terms of difficulty, and only a few, more widely spaced, “difficult” items.
  • As a result, any given “change” or “difference” score means very different things in different sectors of the distribution.)
  • Function in different ways in different cultural groups, that is to say the “centimetre marks” on the “tape measure” arrange themselves in different orders when the test is used in different cultural groups.

Some comments.

Fair enough, the idea of items getting harder and harder until eventually people reach a point where they can do no better is admirable. But there are issues of within-task learning and features known about problem solving (einstellung, represention, functional fixedness, etc, etc) that need to be addressed here. Different orders of item solution might affect how far one can go. One of the facets of ability you might want to test is how well people can uncover this order for themselves. Randomise the task order, allow any sequence of solution, and record the sequence chosen. It’s not enough just to give people a series of tasks of escalating difficulty. The mechanisms of the brain and its development are more complex than length measured by a ruler so the analogy is misleading.

It’s unavoidable that different cultures—and indeed different kinds of people within a culture!—will differ in what they find easy and what they find hard. The reader is invited to produce some examples of things they find easier than their friends and vice versa. There is an enormous amount of literature on this. Many good examples come from the peaks and troughs of ability in autistic populations. Many more good examples, I conjecture, would come from item level analyses of supposedly unidimensional tests. Though they appear to be difficult to replicate.*

Agreed, the ability distance(?) between two scores x and y should be the same for all y = x + 1. (Sticking here with discrete measures.) But this is going to be a nightmare in practice. Impossible? Maybe it’ll work for populations of people. It’s a miracle that sumscores work at all.

* For a hint of the mess follow this thread:

  1. What do Raven’s Matrices measure? An analysis in terms of sex differences
  2. Advanced progressive matrices and sex differences: Comment to Mackintosh and Bennett (2005)
  3. Reply to Colom and Abad (2006)

BUT the interesting stuff will come by embracing this kind of mess and not pinning hopes on unidimensional constructs that resemble a measuring tape!!!