Playing around again with SEM. Just where does that \(\chi^2\) come from?

You start with the sample covariance matrix (\(S\)) and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between \(S\) and the “implied” covariance matrix (i.e., the one predicted by the model, \(C\)) is minimised and out pops the final set of estimates. Then you multiply that difference between \(S\) and \(C\) by (\(N – 1\)) to get something out with a \(\chi^2\) distribution.

Marvellous.

First how do we get \(C\)? Loehlin (2004, p. 41) to the rescue:

\(C = F \cdot (I-A)^{-1} \cdot S \cdot (1 – A)^{-1′} \cdot F’\)

Here \(A\) and \(S\) have the same dimensions as the sample covariance matrix. (This is a different \(S\) to the one I mentioned above—don’t be confused yet.)

\(A\) contains the (assymetric) path estimates, \(S\) contains the (symmetric) covariances and residual variances, and \(F\) is the so called filter matrix which marks which variables are measured variables. (\(I\) is the identity matrix and \(M’\) is the transpose of \(M\).)

I don’t quite get WHY the implied matrix is plugged together this way, but onwards…

So now we have a \(C\). Take \(S\) again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off \(C\) is. I’m playing with SEM in R using John Fox’s package which uses this one:

\(\mbox{tr}(SC^{-1}) + \mbox{log}(|C|) – \mbox{log}(|S|) – n\)

where \(\mbox{tr}\) is the trace of a matrix and is the sum of the diagonal, and \(|M|\) is the determinant of \(M\). Oh and \(n\) is the number of observed variables.

The R code for this (pulled and edited from the null \(\chi^2\) calculation in the sem fit function) is

sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n

Here you can see trace is implemented as a *sum* after a *diag*. The *solve* function applied to only one matrix (as here) gives you the inverse of the matrix.

Let’s have a quick poke around with the sem package using a simple linear regression:

require(sem)

N=100

x1 = rnorm(N, 20, 20)

x2 = rnorm(N, 50, 10)

x3 = rnorm(N, 100, 15)

e = rnorm(N,0,100)

y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e

thedata = data.frame(x1,x2,x3,y)

mod1 = specify.model()

y <->y, e.y, NA

x1 <->x1, e.x1, NA

x2 <->x2, e.x2, NA

x3 <->x3, e.x3, NA

y <- x1, bx1, NA

y <- x2, bx2, NA

y <- x3, bx3, NA

sem1 = sem(mod1, cov(thedata), N=dim(thedata)[1], debug=T)

summary(sem1)

When I ran this, the model \(\chi^2 = 4.6454\).

The \(S\) and \(C\) matrices can be extracted using

sem1$S

sem1$C

Then plugging these into the formula …

N = 100

n = 4

S = sem1$S

C = sem1$C

(N – 1) *

(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)

… gives… 4.645429.

One other thing: to get the null \(\chi^2\) you just set \(C\) as the diagonal of \(S\).

**Reference**

Loehlin, J. C. (2004). *Latent Variable Models* (4th ed). LEA, NJ, USA.

dear Andy, this post is really a very nice description to chi-squared statistic. However, I read ” to get the null \chi^2 you just set C as the diagonal of S.” Do you mean we need to interchange S with C and vice versa in the expression.