Statistical models as cognitive models…

“Models of data have a deep influence on the kinds of theorising that researchers do. A structural equation model with latent variables named Shifting, Updating, and Inhibition (Miyake et al. 2000) might suggest a view of the mind as inter-connected Gaussian distributed variables. These statistical constructs are driven by correlations between variables, rather than by the underlying cognitive processes […]. Davelaar and Cooper (2010) argued, using a more cognitive-process-based mathematical model of the Stop Signal task and the Stroop task, that the inhibition part of the statistical model does not actually model inhibition, but rather models the strength of the pre-potent response channel. Returning to the older example introduced earlier of g (Spearman 1904), although the scores from a variety of tasks are positively correlated, this need not imply that the correlations are generated by a single cognitive (or social, or genetic, or whatever) process. The dynamical model proposed by van der Mass et al. (2006) shows that correlations can emerge due to mutually beneficial interactions between quite distinct processes.”

Fugard, A. J. B & Stenning, K. (2013). Statistical models as cognitive models of individual differences in reasoningArgument & Computation4, 89–102.

Correlation structure of the big 5, huge 2, and general 1

Here’s a picture from van der Linden, D., et al. (in press)  [The General Factor of Personality: A meta-analysis of Big Five intercorrelations and a criterion-related validity study. Journal of Research in Personality.]:

(α is also more memorably called “Stability” and β is “Plasticity”; GFP is the General Factor in Personality.)

This suggests that, on self-reported questionnaire responses, openness and extraversion tend to go together, and that conscientiousness, agreeableness, and emotional stability tend to go together.  Furthermore, all five tend to go together, e.g., pick a bunch of random extraverted people and, more likely than not, they’ll be agreeable and emotionally stable.  Or at least they’ll report that they are so.

The authors allude to possible statistical artefacts causing the correlations, or factors of general social desirability.

The evidence for a “substantive” interpretation of the correlations comes from studies showing heritability of the general factor and correlations with other measures.  The latter tend to be low, e.g., on average rs around .15, peaking at .3, between the factor scores and boss-reports of various factors of job performance.

What’s missing: explanations from the perspective of social and neural process theories.

SEM again

As yet I haven’t convinced myself that SEM is a good idea—or at least not in the examples I’ve seen in psychology. Two reasons for starters: (i) fit statistic chaos and (ii) weird analyses driven almost entirely by correlations or at best with vague theorizing based on  trivial analyses of the tasks (it has a bit of this and bit of that…).

Anyway, recently Andrew Gelman noted:

“… there’s a research paradigm in which you fit a model—maybe a regression, maybe a structural equations model, maybe a multilevel model, whatever—and then you read off the coefficients, with each coefficient telling you something. You gather these together and those are your conclusions.

“My paradigm is a bit different. I sometimes say that each causal inference requires its own analysis and maybe its own experiment. I find it difficult to causally interpret several different coefficients from the same model.”

He goes on in the discussion to add:

“I have no problem with multiple-equation models, including measurement-error models, multilevel models, and instrumental variables. But I’m skeptical of trying to answer several casual questions by fitting a single model to a dataset.”

Interesting discussion starting over here.

SEM fits

What makes a good SEM fit? Thought I’d take a peek at Bentler (2007).

He gives a range of fairly general suggestions of things to report about model fits, including (a) which bits of the model were and weren’t decided a priori; (b) tests of assumptions, e.g., multivariate normality; (c) descriptives and a correlation matrix; and (d) a summary of parameters before and after paths have been added or removed.

Regarding fit, he suggests that

“At least one statistical test of model fit … should be reported for each major SEM model, unless the author verifies that no appropriate test exists for their design and model.”

In addition, he suggests reporting SRMR or the average absolute standardized residual, as well as the largest several residuals in a correlation metric. He also suggests reporting CFI or RMSEA.

The all-important small sample problem: he says “small” is \(N < 100\) and that in such cases at least one alternative model should be provided which is successfully rejected even with the small sample size.

Now the interesting thing is what he makes of \(\chi^2\). He argues that any test is unlikely to have an exact \(\chi^2\) distribution, and thus it would be ill-advised to rely too much on an exact test p-value, though importantly he adds that

“The toolkit of possible \(\chi^2\) tests has recently vastly expanded … and it does not make sense to talk about “the” \(\chi^2\) test. Including F-tests, EQS 6 provides more than a dozen model tests… I certainly favor the use of a carefully chosen model test, but even the best of these can fail in applications…”

One of the articles he cites to justify suspicion of test distributions is a simulation study by a colleague and he (Yuan & Bentler, 2004). They summarise the problem faced by applied researchers:

“In practice, many reported chi-square statistics are significant even when sample sizes are not large, and in the context of nested models, the chi-square difference test is often not significant.”

Elaborating on this, they mean that when you don’t want to have a significant test result, you often do get it; when you do want significance, you don’t. (NHST—different issue for another day.) They also summarise another important problem:

“There are many model fit indices in the literature of SEM. For example, SAS CALIS provides about 20 fit indices in its default output. Consequently, there is no unique criterion for judging whether a model fits the data.”


So the gist from their simulations: often you can’t trust “the” \(\chi^2\) tests; nor can you trust the Wald zs testing the individual parameters. Hurrah.

Wald tests have been attacked elsewhere, e.g., by Hauck and Donner (1977) for logistic regression; they demonstrated the problem in an analysis of what predicts the presence of the T. vaginalis organism in college students. The gist: the further your estimate is away from the null value (usually zero; recall your null hypothesis is often that the slope is zero), the lower the power of the Wald test.

Ah, the joys of stats! Give me a nice graph or table any day.


Bentler, P. M. (2007). On tests and indices for evaluating structural models. Personality and Individual Differences, 42, 825-829.

Hauck, Walter W. Jnr & Donner, A. (1977). Wald’s Test as Applied to Hypotheses in Logit Analysis. Journal of the American Statistical Association, 72, 851-853.

K.-H. Yuan and P.M. Bentler (2004). On chi-square difference and z-tests in mean and covariance structure analysis when the base model is misspecified. Educational and Psychological Measurement, 64, 737–757.

Chi-square in SEM

Playing around again with SEM. Just where does that \(\chi^2\) come from?

You start with the sample covariance matrix (\(S\)) and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between \(S\) and the “implied” covariance matrix (i.e., the one predicted by the model, \(C\)) is minimised and out pops the final set of estimates. Then you multiply that difference between \(S\) and \(C\) by (\(N – 1\)) to get something out with a \(\chi^2\) distribution.


First how do we get \(C\)? Loehlin (2004, p. 41) to the rescue:

\(C = F \cdot (I-A)^{-1} \cdot S \cdot (1 – A)^{-1′} \cdot F’\)

Here \(A\) and \(S\) have the same dimensions as the sample covariance matrix. (This is a different \(S\) to the one I mentioned above—don’t be confused yet.)

\(A\) contains the (assymetric) path estimates, \(S\) contains the (symmetric) covariances and residual variances, and \(F\) is the so called filter matrix which marks which variables are measured variables. (\(I\) is the identity matrix and \(M’\) is the transpose of \(M\).)

I don’t quite get WHY the implied matrix is plugged together this way, but onwards…

So now we have a \(C\). Take \(S\) again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off \(C\) is. I’m playing with SEM in R using John Fox’s package which uses this one:

\(\mbox{tr}(SC^{-1}) + \mbox{log}(|C|) – \mbox{log}(|S|) – n\)

where \(\mbox{tr}\) is the trace of a matrix and is the sum of the diagonal, and \(|M|\) is the determinant of \(M\). Oh and \(n\) is the number of observed variables.

The R code for this (pulled and edited from the null \(\chi^2\) calculation in the sem fit function) is

sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n

Here you can see trace is implemented as a sum after a diag. The solve function applied to only one matrix (as here) gives you the inverse of the matrix.

Let’s have a quick poke around with the sem package using a simple linear regression:



x1 = rnorm(N, 20, 20)
x2 = rnorm(N, 50, 10)
x3 = rnorm(N, 100, 15)
e = rnorm(N,0,100)

y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e

thedata = data.frame(x1,x2,x3,y)

mod1 = specify.model()
y <->y, e.y, NA
x1 <->x1, e.x1, NA
x2 <->x2, e.x2, NA
x3 <->x3, e.x3, NA
y <- x1, bx1, NA
y <- x2, bx2, NA
y <- x3, bx3, NA

sem1 = sem(mod1, cov(thedata), N=dim(thedata)[1], debug=T)

When I ran this, the model \(\chi^2 = 4.6454\).

The \(S\) and \(C\) matrices can be extracted using


Then plugging these into the formula …

N = 100
n = 4

S = sem1$S
C = sem1$C

(N – 1) *
(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)

… gives… 4.645429.

One other thing: to get the null \(\chi^2\) you just set \(C\) as the diagonal of \(S\).



Loehlin, J. C. (2004). Latent Variable Models (4th ed). LEA, NJ, USA.

John Fox on SEM

From an Appendix to An R and S-PLUS Companion to Applied Regression:

“A cynical view of SEMs is that their popularity in the social sciences reflects the legitimacy that the models appear to lend to causal interpretation of observational data, when in fact such interpretation is no less problematic than for other kinds of regression models applied to observational data. A more charitable interpretation is that SEMs are close to the kind of informal thinking about causal relationships that is common in social-science theorizing, and that, therefore, these models facilitate translating such theories into data analysis.”

SEM and multiple (linear) regression in R

After much peer pressure I’ve started to play with SEM. I thought I’d try John Fox’s sem package in R, to start with just doing a simple multiple regression.

Should mention Fox (2006) [Structural Equation Modeling With the sem Package in R. Structural Equation Modeling, 13:465-486] as a useful starting point.

First get the package in:


Using simulated data to start with, here’s 100 values for x1, x2, and x3 from Gaussian distributions.

x1 = rnorm(100, 20, 20)
x2 = rnorm(100, 50, 10)
x3 = rnorm(100, 100, 15)

And also some noise:

e = rnorm(100,0,50)

Now set an arbitrarily chosen relationship between y and the x’s:

y = 20*x1 + 3*x2 + 1.5*x3 +40 + e

And throw this all into a data frame

thedata = data.frame(x1,x2,x3,y)

Eventually after much impolite language use, the following model specification made sense.

thedata.mod1 = specify.model()
y <->y, e.y, NA,
x1 <->x1, e.x1, NA,
x2 <->x2, e.x2, NA,
x3 <->x3, e.x3, NA,
y <- x1, bx1, NA,
y <- x2, bx2, NA,
y <- x3, bx3, NA

The first four rows specify the error terms, the last three specify the regression I want to fit. NA is used to specify no starting value.

The sem function requires the model specification, a covariance matrix of the data (from cov) , and the number of observations (using dim):

sem.thedata1 = sem(thedata.mod1, cov(thedata), N=dim(thedata)[1], debug=T)
summary(lm(y ~ x1 + x2 + x3))

Let’s look at the output for the last two:

> summary(sem.thedata1)

Model Chisquare = 1.7523 Df = 3 Pr(>Chisq) = 0.62537
Chisquare (null model) = 397.79 Df = 6
Goodness-of-fit index = 0.99112
Adjusted goodness-of-fit index = 0.9704
RMSEA index = 0 90% CI: (NA, 0.13780)
Bentler-Bonnett NFI = 0.9956
Tucker-Lewis NNFI = 1.0064
Bentler CFI = 1
SRMR = 0.049717
BIC = -12.063

Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.861 -0.357 -0.122 -0.139 0.000 0.993

Parameter Estimates
Estimate Std Error z value Pr(>|z|)
e.y 3011.4667 428.11714 7.0342 2.0040e-12 y <–> y
e.x1 390.1715 55.46767 7.0342 2.0040e-12 x1 <–> x1
e.x2 107.8570 15.33320 7.0342 2.0040e-12 x2 <–> x2
e.x3 254.7381 36.21417 7.0342 2.0040e-12 x3 <–> x3
bx1 20.3897 0.28029 72.7436 0.0000e+00 y <— x1
bx2 3.3891 0.53566 6.3269 2.5015e-10 y <— x2
bx3 1.7304 0.34732 4.9822 6.2860e-07 y <— x3

Iterations = 0
> summary(lm(y ~ x1 + x2 + x3))

lm(formula = y ~ x1 + x2 + x3)

Min 1Q Median 3Q Max
-127.84 -30.55 -5.85 32.10 128.30

Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.5809 43.1330 -0.083 0.934
x1 20.3897 0.2846 71.633 < 2e-16 ***
x2 3.3891 0.5440 6.230 1.23e-08 ***
x3 1.7304 0.3527 4.906 3.79e-06 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 55.73 on 96 degrees of freedom
Multiple R-Squared: 0.9817, Adjusted R-squared: 0.9811
F-statistic: 1716 on 3 and 96 DF, p-value: < 2.2e-16

Reassuringly, the estimates for bx1, bx2, and bx3 match the corresponding estimates from lm. I don’t yet understand the fit statistics. I also don’t understand the error model.

There’s no intercept. Multiple ways to get it, apparently. One is to use the raw moments matrix.

thedata.mod2 = specify.model()
(Intercept) <-> (Intercept), e.i, NA,
y <-> y, e.y, NA,
x1 <-> x1, e.x1, NA,
x2 <-> x2, e.x2, NA,
x3 <-> x3, e.x3, NA,
y <- x1, bx1, NA,
y <- x2, bx2, NA,
y <- x3, bx3, NA,
y <- (Intercept), b0, NA

raw.thedata = raw.moments(~ ., data = thedata)
sem.thedata2 = sem(thedata.mod2, raw.thedata, N=dim(thedata)[1], debug=T)

Running this gave a couple of warnings: “Negative parameter variances. Model is probably underidentified.” Let’s compare again with lm anyway:

> summary(sem.thedata2)

Model Chisquare = 741.02 Df = 6 Pr(>Chisq) = 0
Chisquare (null model) = 1273 Df = 10
Goodness-of-fit index = 0.37618
Adjusted goodness-of-fit index = -0.55955
RMSEA index = 1.1124 90% CI: (1.0455, 1.1807)
Bentler-Bonnett NFI = 0.41789
Tucker-Lewis NNFI = 0.030061
Bentler CFI = 0.41804
SRMR = 0.63208
BIC = 713.39

Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 5.89 6.60 6.60 9.65 11.70

Parameter Estimates
Estimate Std Error z value Pr(>|z|)
e.i 1.0000 0.14211 7.0368 1.9669e-12 (Intercept) <–> (Intercept)
e.y 2981.3520 423.70206 7.0364 1.9722e-12 y <–> y
e.x1 689.3882 97.78209 7.0503 1.7859e-12 x1 <–> x1
e.x2 2627.5144 372.62751 7.0513 1.7724e-12 x2 <–> x2
e.x3 10049.2181 1427.00874 7.0422 1.8929e-12 x3 <–> x3
bx1 20.3897 0.26869 75.8850 0.0000e+00 y <— x1
bx2 3.3891 0.27530 12.3104 0.0000e+00 y <— x2
bx3 1.7304 NaN NaN NaN y <— x3
b0 -3.5809 NaN NaN NaN y <— (Intercept)

Iterations = 0

Aliased parameters: bx3 b0
Warning message:
NaNs produced in: sqrt(diag(object$cov))

The estimates look okay but something has gone wrong 🙁 I shall try again later.

Next up, some latent variable magic. Maybe, if sem allows me, it would also be nice to play with, say, binomially distributed data. And what do all those fit statistics mean…?