# Chi-square in SEM

Playing around again with SEM. Just where does that $$\chi^2$$ come from?

You start with the sample covariance matrix ($$S$$) and a model description (quantitative boxology; CFA tied together with regression). The fit machinery gives you estimates for the various parameters over several iterations until the difference between $$S$$ and the “implied” covariance matrix (i.e., the one predicted by the model, $$C$$) is minimised and out pops the final set of estimates. Then you multiply that difference between $$S$$ and $$C$$ by ($$N – 1$$) to get something out with a $$\chi^2$$ distribution.

Marvellous.

First how do we get $$C$$? Loehlin (2004, p. 41) to the rescue:

$$C = F \cdot (I-A)^{-1} \cdot S \cdot (1 – A)^{-1′} \cdot F’$$

Here $$A$$ and $$S$$ have the same dimensions as the sample covariance matrix. (This is a different $$S$$ to the one I mentioned above—don’t be confused yet.)

$$A$$ contains the (assymetric) path estimates, $$S$$ contains the (symmetric) covariances and residual variances, and $$F$$ is the so called filter matrix which marks which variables are measured variables. ($$I$$ is the identity matrix and $$M’$$ is the transpose of $$M$$.)

I don’t quite get WHY the implied matrix is plugged together this way, but onwards…

So now we have a $$C$$. Take $$S$$ again—the sample covariance matrix. Loehlin gives a number of different criterion measures which tell you how far off $$C$$ is. I’m playing with SEM in R using John Fox’s package which uses this one:

$$\mbox{tr}(SC^{-1}) + \mbox{log}(|C|) – \mbox{log}(|S|) – n$$

where $$\mbox{tr}$$ is the trace of a matrix and is the sum of the diagonal, and $$|M|$$ is the determinant of $$M$$. Oh and $$n$$ is the number of observed variables.

The R code for this (pulled and edited from the null $$\chi^2$$ calculation in the sem fit function) is

sum(diag(S %*% solve(C))) + log(det(C)) – log(det(S)) – n

Here you can see trace is implemented as a sum after a diag. The solve function applied to only one matrix (as here) gives you the inverse of the matrix.

Let’s have a quick poke around with the sem package using a simple linear regression:

require(sem)

N=100

x1 = rnorm(N, 20, 20)
x2 = rnorm(N, 50, 10)
x3 = rnorm(N, 100, 15)
e = rnorm(N,0,100)

y = 2*x1 – 1.2*x2 + 1.5*x3 + 40 + e

thedata = data.frame(x1,x2,x3,y)

mod1 = specify.model()
y <->y, e.y, NA
x1 <->x1, e.x1, NA
x2 <->x2, e.x2, NA
x3 <->x3, e.x3, NA
y <- x1, bx1, NA
y <- x2, bx2, NA
y <- x3, bx3, NA

sem1 = sem(mod1, cov(thedata), N=dim(thedata), debug=T)
summary(sem1)

When I ran this, the model $$\chi^2 = 4.6454$$.

The $$S$$ and $$C$$ matrices can be extracted using

sem1$S sem1$C

Then plugging these into the formula …

N = 100
n = 4

S = sem1$S C = sem1$C

(N – 1) *
(sum(diag(S %*% solve(C))) + log(det(C))-log(det(S)) – n)

… gives… 4.645429.

One other thing: to get the null $$\chi^2$$ you just set $$C$$ as the diagonal of $$S$$.

Reference

Loehlin, J. C. (2004). Latent Variable Models (4th ed). LEA, NJ, USA.

## One thought on “Chi-square in SEM”

1. kamakshaiah says:

dear Andy, this post is really a very nice description to chi-squared statistic. However, I read ” to get the null \chi^2 you just set C as the diagonal of S.” Do you mean we need to interchange S with C and vice versa in the expression.