Factor analysis digression


Set up some factor scores

f = rnorm(100,0,15)

Some noise for the five manifest variables

e1 = rnorm(100,0,5)
e2 = rnorm(100,0,5)
e3 = rnorm(100,0,20)
e4 = rnorm(100,0,50)
e5 = rnorm(100,0,5)

And then the variables themselves, generated from the latent variable. Lots of stuff to play with, e.g. the slopes and the noise term, to see what affects the factor loadings…

x1 = 200 * f + e1
x2 = 5 * f + e2
x3 = 2 * f + e3
x4 = 2 * f + e4
x5 = 2 * f + e5

Now let’s use factor analysis to get back f:

fa1 = factanal(~ x1 + x2 + x3 + x4 + x5, factors=1, scores = “Bartlett”)

The output

> fa1

Call:
factanal(x = ~x1 + x2 + x3 + x4 + x5, factors = 1, scores = “Bartlett”)

Uniquenesses:
x1 x2 x3 x4 x5
0.005 0.005 0.348 0.677 0.029

Loadings:
Factor1
x1 0.998
x2 0.998
x3 0.808
x4 0.569
x5 0.986

Factor1
SS loadings 3.940
Proportion Var 0.788

Test of the hypothesis that 1 factor is sufficient.
The chi square statistic is 45.24 on 5 degrees of freedom.
The p-value is 1.3e-08

Compare the scores from FA (should have variance 1, mean 0) with f.

> as.vector(fa1$scores)
[1] 0.46229282 -0.60935524 1.68547486 0.32820617 -0.31099996 …
> as.vector(scale(f))
[1] 0.45208732 -0.66695823 1.63456194 0.33728772 -0.35131222  …

Looks good, and hopefully the theoretical model in my head isn’t too far off.

Next stop, SEM with a latent variable in…