### Factor Analysis and SEM with Tetrachoric and Polychoric Correlations

Tetrachoric and polychoric correlations can be factor-analyzed or used to estimate Structural Equation Models (SEMs) in the same way as Pearson correlations.

#### Factor Analysis

For factor analysis, follow these steps:

1. Construct a matrix of tetra-/polychoric correlation coefficients.
2. We use as an example the well-known lsat6 data (five items from the LSAT test) analyzed by Bock and Lieberman (1970) and others.

Using TetMat

 Raw data file: lsat6.txt Command file: input.txt Output file: output.txt Matrix output: matrix.txt

Using Prelis

 Raw data file: lsat6.txt Command file: input2.txt Output file: output2.txt Matrix output: matrix2.txt

3. Supply the correlation matrix to your factor analysis program.

Using SAS

 SAS program: tetra1.sas.txt

In the SAS program above, the tetrachoric correlation matrix is read and stored as a SAS dataset with the type=corr designation.

With Mplus, MicroFact or TESTFACT, this separate step is not necessary, as the same program can estimate the tetra-/polychoric correlations and perform the factor analysis.

4. Perform the factor analysis.

Using SAS

 SAS program: tetra2.sas.txt SAS output: sas_output.txt

This example uses the dataset produced by the preceding SAS program. A common factor model is estimated using unweighted least squares (ULS).

Due to how they are estimated, a matrix of tetra-/polychoric correlations may have the property of what is called non-positive definiteness (NPD). When this happens, the matrix has one or more negative eigenvalues (usually small in magnitude).

This is not a problem per se, but it means that the matrix cannot be inverted. If the correlation matrix is NPD, factoring methods that require matrix inversion cannot be used. These methods include the Maximum Likelihood (ML) and Generalized Least Squares (GLS) estimation.

However, factoring methods that do not require matrix inversion can be used. These methods include Unweighted Least Squares (ULS or OLS), Principal Factor Analysis (PFA), iterated PFA (called PRINIT in SAS), and Principal Axes Factoring (PAF). In many or most factor analysis applications (including the identification of test or survey subscales), these other estimation methods work just as well as ML and GLS. (One of the main advantages of ML estimation--namely the ability to compare nested models via difference likelihood-ratio tests--is questionable anyways with tetrachoric/polychoric correlations.)

Different factor models for the same data estimated by any factoring method can compared using the Normed Fit Index or similar methods.

Note that ULS and iterated PFA will produce the same results (provided the convergence criterion is sufficiently small).

5. Optional: Smoothing a Matrix Before Factoring

Knol and ten Berge (1989), Knol and Berger (1991) and others suggest that a matrix of tetra-/polychoric correlations be "smoothed" before factor analysis.

Smoothing will remove the components of the matrix structure associated with the any negative eigenvalues. As these components of matrix structure represents "noise" here, smoothing will potentially lead to more accurate factor estimation. However, smoothing will not make a correlation matrix amenable to ML or GLS estimation if the matrix contains negative eigenvalues.

Further, if a correlation matrix is not NPD to begin with, smoothing will have no effect. With tetra-/polychoric correlations, especially if there are only a few variables, it is possible for the matrix to not be NPD.

If smoothing is required, the Smooth program (currently included with TetMat) performs a version of matrix smoothing similar to that described by Knol and ten Berge (1989).

#### SEM

For SEM, the same principles apply. If the correlation matrix contains negative eigenvalues, then ML or GLS estimation cannot be used. However ULS can be used. Different models for the same data can be compared using the Normed Fit Index or comparable methods.

References

Bock RD, Lieberman M. Fitting a response curve model for dichotomously scored items. Psychometrika, 1970, 35, 179-198.

Knol DL, Berger MP. Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 1991, 26, 457-477.

Knol DL, ten Berge JMF. Least-squares approximation of an improper correlation matrix by a proper one. Psychometrika, 1989, 54, 53-61.