Latent Class Analysis Frequently Asked Questions (FAQ)

LCA Frequently Asked Questions (FAQ)

Basic Questions

What is Latent Class Analysis?
What are common uses of LCA?
What is a latent class?
How does LCA work?
What are some good references for LCA?
What do I need to use LCA?
What kind of data are suitable for LCA?
What are some good programs for LCA?

Intermediate Questions

How are parameters actually estimated?
How are cases classified?
What is constrained LCA?
How is model fit assessed? How does one compare different models for the same data?
How does one determine the number of latent classes?
How does LCA compare to other statistical methods?

Advanced Questions

What are local maxima?
How can one allow conditional dependence in LCA?
How are ordered categories handled?
What are discrete latent trait models?
What is the probit latent class model?
What is model identifiability and how important an issue is it?
How are parameter standard errors estimated?

Back to LCA main page

What is Latent Class Analysis?

Latent Class Analysis (LCA) is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data. For example, it can be used to find distinct diagnostic categories given presence/absence of several symptoms, types of attitude structures from survey responses, consumer segments from demographic and preference variables, or examinee subpopulations from their answers to test items. The results of LCA can also be used to classify cases to their most likely latent class.

LCA is used in way analogous to cluster analysis (see FAQ, How does LCA compare to other statistical methods?). That is, given a sample of cases (subjects, objects, respondents, patients, etc.) measured on several variables, one wishes to know if there is a small number of basic groups into which cases fall. A more precise analogy is between LCA and a type of cluster analysis called finite mixture estimation (Day, 1969; Titterington, Smith & Makov, 1985; Wolfe, 1970).

Back to Contents

What are common uses of LCA?

The most common use of LCA is to discover case subtypes (or confirm hypothesized subtypes) based on multivariate categorical data. LCA is well suited to many health applications where one wishes to identify disease subtypes or diagnostic subcategories. Other common areas of application include marketing research, survey research, sociology, psychology, and education.

Another more-or-less distinctly medical LCA application is evaluation of diagnostic tests in the absence of a "gold standard." For example, if one has several tests for detecting presence/absence of a disease, but no comparison "gold standard" that indicates disease status with certainty, LCA can be used to provide estimates of diagnostic accuracy (sensitivity, specificity, proportion of correct diagnoses, etc.) of the different tests.

LCA may also serve simply as a convenient data-reduction tool.

Back to Contents

What is a latent class?

LCA defines latent classes by the criterion of "conditional independence." This means that, within each latent class, each variable is statistically independent of every other variable. For example, within a latent class that corresponds to a distinct medical syndrome, the presence/absence of one symptom is viewed as unrelated to presence/absence of all others.

To say this differently, latent classes are defined such that, if one removes the effect of latent class membership on the data, all that remains is randomness (understood here as complete independence among measures). Paul Lazarsfeld (Lazarsfeld & Henry, 1968; see also his earlier papers), the main originator of LCA, argued that this criterion leads to the most natural and useful groups.

For some applications, conditional independence may be an inappropriate assumption. For example, one may have two very similar items, such that responses on them are probably always associated. For this and certain related situations, extensions of the latent class model exist.

Back to Contents

How does LCA work?

LCA supposes a simple parametric model and uses observed data to estimate parameter values for the model.

The model parameters are: (1) the prevalence of each of C case subpopulations or latent classes (they are called 'latent' because a case's class membership is not directly observed); and (2) conditional response probabilities--i.e., the probabilities, for each combination of latent class, item or variable (the items or variables are termed the manifest variables), and response level for the item or variable, that a randomly selected member of that class will make that response to that item/variable. A conditional response probability parameter, then, might be the probability that a member of Latent Class 1 answers 'yes' to the Question 1.

Consider a simple medical example with five symptoms (coded 'present' or 'absent') and two latent classes ('disease present' and 'disease absent'). The model parameters are: (1) the prevalence of cases in the 'disease present' and 'disease absent' latent classes (but only one of the two prevalences needs to be estimated, since they must sum to 1.0); and (2) for each symptom and each latent class, the probability of the symptom being present/absent for a member of the latent class (once again, for each symptom and latent class, only the probability of symptom presence or symptom absence needs to be estimated, since one probability is obtained by subtracting the other from 1.0).

Parameters are estimated by the maximum likelihood (ML) criterion. The ML estimates are those most likely to account for the observed results. Estimation requires iterative computation, but this is fairly trivial for a computer.

Back to Contents

What are some good references for LCA?

Latent structure analysis

Still the standard reference, but it should be supplemented with:

Goodman, L. A. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 1974, 61, 215-231.

A shorter but excellent introduction is:

Latent class analysis

Two good, chapter-length introductions to LCA are:

Handbook of statistical modeling for the social and behavioral sciences

Rost J, Langeheine R. A guide through latent structure models for categorical data. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences. New York: Waxmann, 1997. In fact, this entire book is a good introductory resource. It includes many papers that illustrate applications of LCA in various areas. The papers, written mainly by methodologists, convey the "state of the art" for use of LCA.

For information on recent developments in discrete latent class models:

Latent class and discrete latent trait models: Similarities and differences.

Lindsay, B., Clogg, C. C., & Grego, J. (1991). Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 86, 96-107.

Uebersax, J. S. (1993). Statistical modeling of expert ratings on medical treatment appropriateness. Journal of the American Statistical Association, 88, 421-427. An introduction to probit discrete latent trait models, not covered by either of the above references.

Back to Contents

What do I need to use LCA?

First, of course, you need data. Some programs accept raw data but most require that you supply a frequency table. Raw data consist of a series of records, one for each observation, that give the response of each case to each of the J manifest variables. The manifest variables must be categorical; usually, categories are denoted by the integers 1, 2, ... . (Some programs may also allow 0 to denote a response level).

For example, a line of the raw data might have the form:

1 2 1 2 1

where 1=yes, 2=no for each of five manifest variables.

More often, however, one supplies a frequency table. An indexed frequency table lists, along with each observed response pattern, the number of cases with this pattern. Lines in indexed frequency table would have a form such as:

1 1 1 143

1 1 2 58

1 2 1 22

1 2 2 15

2 1 1 12

2 1 2 32

2 2 1 55

2 2 2 245

Here, there are three manifest variables; the first three columns define the rating pattern; column 4 contains the number of cases with this pattern.

The alternative is a full frequency table. With this format, one supplies only the frequencies. However, the frequencies must have a precise form. First, frequencies for all possible rating patterns (even those not observed) must be supplied. Second, frequencies follow a "last variable fastest, second-to-last variable second-fastest, ..., first variable slowest" order. Here "fastest/slowest" refers to the incrementing of rating levels. The level of the fastest variable changes first; after all its levels have been completed, the level of the second-fastest variable increments, etc. The data above are in this form. As a full frequency table, they could be supplied simply as:

143 58 22 15 12 32 55 245

Raw data can be converted to either table format with standard computer programs, such as SAS PROC FREQ.

Most current LCA programs do not allow missing values, so that a case with missing data on any variable must be excluded.

The second thing needed is software to perform LCA. As of this writing, no major statistical package (SAS, SPSS, Systat, etc.) includes a module for LCA. You will therefore need a standalone program. Fortunately, there are several good programs to choose from, some free. See the FAQ section, What are some good programs for LCA?

Back to Contents

What kind of data are suitable for LCA?

LCA is suitable for binary, ordered-category and Likert-scale, or nominal data. It is not used with purely ordinal (rank order) data. A version of LCA suitable for continuous variables is called latent profile analysis (Lazarsfeld & Henry, 1968).

With binary or nominal data, LCA is straightforward. With ordered-category or Likert-scale data, one may wish to apply certain constraints to response probability parameters (see FAQ, How are ordered category data handled?).

There is no technical barrier to analyzing models that combine categorical and continuous data. At least two computer programs, Multimix and Mplus, allow this.

Back to Contents

What are some good programs for LCA?

For information about free and commercial LCA software, please see the LCA software page.

Back to Contents

How are parameters actually estimated?

Estimation is usually done with simple iterative numerical methods. Basically, one wishes to find the set of parameter values that optimize some mathematical criterion--usually the criterion of maximum likelihood.

The oldest forms of LCA used complicated estimation methods based on matrix manipulation and simultaneous linear equations. A breakthrough came when Goodman (1974) showed how simple iterative proportional fitting could be used to find ML parameter values; this method is type of EM algorithm.

Haberman, working within a loglinear modeling framework, successfully used Newton-Raphson estimation for estimation.

More generally, estimation can be approached as a problem of multivariate nonlinear optimization. The simplex method, gradient methods, the Davidon-Fletcher-Powell method, and many other algorithms (see Press et al., 1989), as implemented, for example, by subroutines in the IMSL or NAG subroutine libraries, can be used for parameter estimation. The advantage of approaching the problem as one of generalized optimization is that it is very easy to apply various constraints, including structural constraints, on model parameters. Some subroutines also calculate asymptotic parameter standard errors and supply output that can be used to test model identifiability. I have found the STEPIT subroutine (Chandler, 1969) very useful.

Recently, several people have successfully used Markov Chain Monte Carlo (MCMC) and Gibbs sampling to estimate latent class models. This would probably be viewed as an area of active research in the area of latent class models.

Back to Contents

How are cases classified?

Once the latent class model is estimated, cases can be classified to their most likely latent class by means of _recruitment probabilities_. A recruitment probability is the probability that, for a randomly selected member of a given latent class, a given response pattern will be observed. The recruitment probabilities are calculated from the estimated conditional response probabilities in a straightforward way (see, e.g., Lazarsfeld & Henry, 1968).

From the recruitment probabilities, the estimated prevalence of each latent class, and Bayes' theorem, one easily calculates the a posteriori probability of a case's membership in each class. One may then assign the case to the latent class with the highest a posteriori probability (modal assignment), or leave classification "fuzzy"--i.e, view the case as belonging probabilistically to each latent class to the degree indicated.

Back to Contents

What is constrained LCA?

With constrained LCA, one places restrictions on the allowable values of estimated model parameters. Constraints may be of three types:

Fixed value constraints restrict the values of parameters to certain values. For example, one may require that the probability of a positive or "yes" response to a certain item be .90 for a certain latent class.

Equality constraints restrict two or more parameters to have equal values. For example, one may require that the probability of a "yes" response for two or more items be identical in a given latent class.

Structural parameter constraints are more complex. These involve functions that affect the values of two or more parameters simultaneously (See FAQ, What are discrete latent trait models?). Structural constraints can also used for LCA with ordered-category data. Formann (1985) and others have described logistic latent class models that can be applied to a wide variety of problems. Croon (1990; 1991) described structural constraints in conjunction with the concept of ordered latent classes.

Fixed value and equality constraints are easily handled with programs such as PanMark and MLLSA. More specialized software is generally needed to allow structural constraints.

Back to Contents

How is model fit assessed? How does one compare different models for the same data?

Model fit is assessed by comparing the observed crossclassification frequencies to the expected frequencies predicted by the model. The difference is formally assessed with a likelihood ratio (LR) chi-squared statistic; by convention, this is denoted by G² (G-squared). The G² statistic is similar to, but calculated somewhat differently from than the more familiar Pearson chi-squared (X² or X-squared) statistic. Specifically, G² is is equal to

      S
     SUM 2 f(s) ln [f(s)/e(s)]
     s=1

Where:

     s      indexes response patterns
     S    = total number of different observed response patterns
     f(s) = the observed frequency of response pattern s
     e(s) = the expected frequency of response pattern s

G² has a theoretical chi-squared distribution, with df equal to S minus the number of estimated parameters. Therefore, to assess fit of a given model, one calculates the p-value for (G², df) from a chi-squared table or computer program (e.g., the PROBCHI function in SAS). Since this is a goodness-of-fit test, a conservative critical value, say one that corresponds to p = .10, is appropriate. The df for the test are equal to (S - 1 - p), where p is the number of estimated model parameters. A model with values for (G², df) that exceed the critical value are considered not to fit the data; otherwise the model is considered plausible.

A complication may arise with large, sparse tables--this is especially a concern where there are many multi-category variables, such that the number of observed rating patterns is extremely large. For large sparse tables, the G² statistic no longer has a theoretical chi-squared distribution (Agresti & Yang, 1986). Thus statistical assessment by the method described above is inappropriate. In this case, while it may not be possible to statistically evaluate a single model, one may obtain some insight by means of comparing the fit of alternative models, either with a difference chi-squared test, or with parsimony indices. von Davier (1997) explored the use of parametric bootstrapping to assess model fit for large, sparse tables.

Difference Chi-Squared Test

Two latent class models (or two models of some other form, such two latent trait models or two loglinear models) for the same data are often compared via the difference G² statistic. This is calculated as the difference in the G² statistics for the two models, with df equal to the difference in the df's for the two models (or, alternatively, the difference in their number of estimated parameters). The difference G² statistic again has a theoretical chi-squared distribution, and critical values and/or p-values can again be gotten by usual methods.

Here a significant difference implies that one model fits better than the other. A nonsignificant difference implies that there is no meaningful difference. For this test, a conservative alpha level (e.g., p = .05) is appropriate.

Some caveat's apply, however, to use of the difference G² statistic. First, the two models must be "nested." This means that the parameters of one model are a subset of the parameters of the other. This usually occurs when, say, Model B is a restricted version of Model A, constructed by placing fixed-value or equality constraints of some of Model A's parameters. A significant difference implies that the additional constraints, or strictly speaking, the substantive hypotheses that suggests the constraints, are false.

Second, for the difference G² statistic to have a theoretical chi-squared distribution, the less restrictive model should fit the data.

Third, for large, sparse tables, the difference G² statistic again does not have a true chi-squared distribution. Agresti and Yang (1986) suggested that the difference G² statistic is more robust to violations of this assumption than the ordinary G² statistic. Often the magnitude of the difference G² is large enough to demonstrate a substantial difference between two nested models even without formal calculation of a p value.

Fourth, the difference G² is not appropriate for comparing models with different number of latent classes--unfortunately so, since this is often a main interest. Models that differ only in the number of assumed latent classes are nested, but in a somewhat different way than other nested models. Certain regularity assumptions required for the difference G² test to have a theoretical chi-squared distribution are not met. While some have suggested simple modifications to the difference G² statistic to adjust for this, this approach is questionable.

Parsimony Indices

Partly due to this, there has been much recent interest in assessing model fit via so-called information statistics. These statistics are based mainly on the value of -2 times the loglikelihood of the model, adjusted for the number of parameters in the model, the sample size, and, potentially, other factors. The main idea is that, all other things being equal, given two models with equal loglikelihoods, the model with the fewest parameters is better. Appropriately, these measures are called parsimony indices.

Common parsimony indices include the Akaike Information Criterion (AIC):

AIC = -2ln(L) + 2p

where lnl is the loglikelihood, and p is the number of estimated model parameters;

the Schwarz Bayesian Criterion (SBC):

SBC = -2ln(L) + p*ln(N) where N is the total number of observations; (some people refer to this index as the Bayesian Information Criterion, or BIC, though that name is also used for a different statistic);

and the CAIC:

CAIC = -2ln(L) + p * (1 + ln(N)) For more information on these and related indices, see Sclove (1987) and Bozgodian (1987).

For these indices, smaller values correspond to more parsimonious models. In comparing different models for the same data, then, one will prefer models with lower values on these indices.

Back to Contents

How does one determine the number of latent classes?

The primary method is to statistically assess latent class models with 2, 3, ..., up to the maximum plausible number of latent classes, and to statistically assess the fit of each one to the data. In general, as the number of classes becomes fewer, models fit the data worse, and a point is reached after which models are rejected by the G² criterion.

A more computation-intensive approach relies on bootstrapping, Monte Carlo, or similar methods (see Aitken et al, 1981; and especially Langeheine et al., 1996 and van der Heijden et al, 1997). These methods require no assumptions about the data such as those required for chi-squared tests.

Other, more "heuristic" methods include use of parsimony indices (AIC, BIC or CAIC), a "scree"-type test (where one plots model fit against number of latent classes, and looks for a leveling-off point of the curve), and examination of parameter estimates (for example, one might reject models as having too many latent classes if some latent classes are associated with very small prevalences or have many estimated conditional probabilities of 1 or 0.)

In some applications there may be no "right" answer to the question How many latent classes are there? For example, in a population of depressed patients, two latent classes of "Reactive depression" and "Endogenous depression" may, in one sense accurately represent the taxonic structure. However, it may be that there are two subtypes of Endogenous depression--so that a three-latent class solution is also in some sense correct.

Back to Contents

How does LCA compare to other statistical methods?

In purpose, LCA is closely analogous to cluster analysis: it is used to discover groups or types of case based on observed data, and, possibly, to also assign case to groups. There is a very strong connection between LCA and a particular kind of cluster analysis called multivariate mixture estimation (Day, 1969; Titterington, Smith & Makov, 1985; Wolfe, 1970). With multivariate mixture estimation, observed data for multiple continuous variables are assumed to derive from a mixture of two or more underlying multivariate-normal distributions. The continuous data version of LCA, latent profile analysis, is a restricted version of multivariate mixture estimation--the constraints are that measures are assumed uncorrelated within each distribution.

LCA is often called a categorical-data analogue to factor analysis. The precise rationale for this comparison is unclear. Factor analysis is concerned with the structure of variables (i.e., their correlations), whereas LCA is more concerned with the structures of cases (i.e., the latent taxonomic structure). While there is clearly some connection between these two issues, LCA does seem more strongly related to cluster analysis than to factor analysis.

Still, there are some methodological similarities between LCA and factor analysis worth noting. First, both are useful for data reduction. Second, latent classes, like factors, are unobserved constructs, inferred from observed data. Third, determining the number of latent classes is analogous in certain respects to that of determining the number of factors: as the number of clusters/factors increases, fit of the latent class/factor model to the observed data becomes better, but one seeks a balance between fit to the data and number of latent classes/factors required.

There are several connections--historical and mathematical--between LCA and latent trait analysis (LTA; including item response theory (IRT) and Rasch models). It is common to consider LCA and LTA as two variations of latent structure analysis. They are united by the assumption that observed data structure result from a latent structure. With LCA, the latent variable that determines data structure is nominal (latent class membership). With LTA, the latent variable that determines data structure is continuous--a latent (continuous) trait. With both LCA and LTA, manifest variables are assumed independent, conditional on values of the latent variable.

"In between" LCA and LTA, as it were, are discrete latent class models (Heinen, 1996). With these models, the latent variable is discrete, and unidimensional. There are latent classes, as with LCA, but the classes are viewed as ordered along a latent continuum, as with LTA. (See FAQ, What are discrete latent trait models?).

Another related statistical method is latent distribution analysis (LDA; Mislevy, 1984; Uebersax & Grove, 1993; Qu, Tan & Kutner, 1996). LDA also includes elements of both LCA and LTA. In LDA, there is a unidimensional, continuous latent trait. However, relative to this continuum are two or more separate distributions of cases--corresponding to different latent classes. For more information about the relationship between LCA, LTA, discrete latent trait models and LDA, see Uebersax (1997).

Grade-of-Membership (GOM) analysis (Woodbury & Manton, 1982) has often been used to discover taxonomic structure, mainly in health-related applications. LCA is similar to, but simpler than GOM analysis. GOM analysis views cases as having partial membership (grades of membership) in two or more latent classes. With LCA, class membership is not known precisely--one merely knows the probabilities of membership. Thus, with both methods, class membership is "fuzzy." What distinguishes the two approaches is that GOM analysis estimates, for each case, parameters that reflect each case's grade of membership in each latent class--this can be a considerably large number of parameters. With LCA, these parameters are not directly estimated; however, once the other model parameters are estimated, these probabilities can be easily estimated a posteriori by Bayes theorem. As a result, LCA requires much fewer estimated parameters.

Connections between LCA and loglinear modeling should also be noted. Espeland and Handelman (1988) approached LCA as a mixture of loglinear models. Haberman (1979) and Hagenaars (1988) also approach LCA from the standpoint of loglinear models.

Back to Contents

What are local maxima?

Ideally, the estimation algorithm converges on the globally best solution--the one set of parameter values, out of all possible values, with the largest loglikelihood. Sometimes, though, an estimation algorithm may instead converge on a local maximum solution. A local maximum solution is the best solution in a neighborhood of the parameter space, but not the global maximum.

The problem is like climbing a mountain in the dark. By proceeding constantly uphill, always taking the steepest slope, you will reach the top of whatever peak you are already on. However, the highest peak may actually be across a valley; to reach it, you would need to first go downhill, and then uphill again. Finding a global maximum can be difficult for most estimation algorithms, because their strategy is to move "uphill" at all times.

Local maxima are related to the complexity of the model; they become more common as the number of latent classes increases. For example, with say eight dichotomous items and only two or three latent classes, chances are good that an algorithm will reach the global maximum. With, say, five latent classes, however, a single run has a good chance of reaching a local maximum.

To guard against local maxima solutions, one should run the estimation algorithm several times with different parameter start values and either (1) verify that the same solution is reached each time, or (2) if there are differences, choose the best solution. The PanMark and Latent GOLD programs have options for automatic testing of numerous starting values.

When adequate precautions are taken, local maxima do not pose a serious obstacle to the effective use of LCA. For more information on this subject, including specific strategies for avoiding local maxima solutions, click here

Back to Contents

How can one allow conditional dependence in LCA?

The usual latent class model assumes that variables are independent within latent classes. This is sometimes an untenable assumption. For example, two items may be alternative measures of same basic construct (e.g., "cigarettes smoked in the last 30 days" and "cigarettes smoked in the last year" or measure closely related traits (e.g., anxiety and agitation.) In such cases, manifest variables would be assumed to be associated within latent classes, a situation termed conditional dependence or local dependence. The standard LCA model must be modified to account for this.

Progress has been made in recent years in methods for detecting conditional dependence and in relaxing the conditional independence assumptions of LCA. For more detailed discussion, including example programs, click here.

Back to Contents

How are ordered categories handled?

The basic LCA model does not distinguish between unordered categorical (nominal) and ordered categorical data. For ordered category data, one may wish to supply additional constraints to the model. These constraints (1) reduce the number of parameters that require estimation and (2) insure that results will have a form consistent with the known ordering of response levels. Clogg (1979) showed how simple equality constraints can be used to accommodate ordered-category or Likert-type data. Other systems of structural constraints, based on logistic (Rost, 1985, 1988) or probit (Uebersax, 1993) models are also available.

Back to Contents

What are discrete latent trait models?

Discrete latent trait models (Heinen, 1996) are a form of constrained LCA. Latent classes are assumed to have a unidimensional structure. For example, they may correspond to different levels of respondent ability, attitude, or disease severity, relative to an underlying continuum (latent trait). Two fairly similar approaches to discrete latent trait models exist, one based on logistic models (Lindsay, Clogg & Grego, 1991), and one based on probit models (Uebersax, 1993).

Discrete latent trait models are often combined with unconstrained LCA to test whether the latent structure is unidimensional. Specifically, one compares model fit for an unconstrained LCA model with C latent classes to the fit of a unidimensional discrete latent trait model with C latent classes for the same data. If the difference G² statistic for the comparison is nonsignificant, one concludes that the latent class structure is unidimensional

Discrete latent trait models are also potentially helpful in problems of measurement and scaling (Clogg, 1988).

Back to Contents

What is the probit latent class model?

The probit latent class model is a version of LCA based on the following assumptions:

Manifest variables (binary or ordered-categorical; purely nominal variables are excluded) are discretized versions of latent continuous variables.

Discretization occurs as the result of fixed thresholds that divide a latent continuous variable into distinct regions that correspond to observed response levels.

For each latent class, the latent continuous variables have a multivariate-normal distribution.

The estimated parameters for the probit latent class model are: (i) the means of each latent continuous variable for each latent distribution (i.e., distribution centroids); (ii) the variance/covariance matrix for latent continuous variables in each latent distribution; (iii) the threshold locations that divide each latent continuous variable into different regions; and (iv) the latent class prevalences.

For a basic latent class model, the covariance parameters are assumed equal to 0, which is the same as assuming conditional independence. One advantage of the probit latent class model, however, is precisely that this assumption can be easily relaxed to accommodate various conditional dependencies among manifest variables. Variances are often fixed to a constant, say 1.0.

For binary data, with covariances equal to 0 and constant variances, the probit latent class model is, for most practical purposes, isomorphically identical to the standard latent class model. However, the probit latent class model allows useful and plausible structural constraints to be applied to the latent class model. For example, as mentioned above, various forms of conditional dependence may be introduced, or a unidimensional (or, say, two-dimensional) structure imposed on latent classes. The probit LCA model automatically provides an appropriate constraint system for ordered category data.

The probit latent class model also provides a unifying framework for understanding various latent structure models; a number of models, including latent class analysis, latent trait analysis, and latent distribution analysis, are subsumed under the model. The model also approaches mixtures of binary or ordered-category data in precisely the same way as multivariate mixture estimation with continuous data. Thus it leads directly to mixture estimation models for mixed-mode measurement--that is, combinations of continuous, binary, and ordered-category data.

For more discussion on probit latent class models, see Uebersax (in press), available on the "Some of my papers and programs" page.

Back to Contents

What is model identifiability and how important an issue is it?

Model identifiability is a simple concept. Once understood, it should pose no problem for the researcher. An identified model is one there there exists one best solution. With a nonidentified model, more than one "best" solutions (in fact, an infinite number) exist; the situation is related to the problem of having more equations than unknowns.

One may distinguish two types of nonidentifiability: intrinsic and empirical nonidentifiability. With intrinsic nonidentifiability, it is the model design--that is, the number of manifest variables, number of response levels for each manifest variable, and number of latent classes--that results in nonidentification; all instances of the same such design are unidentified (with, the possible exception of certain degenerate data structures). With empirical nonidentifiability, a model may or may not be identified, depending on the particular values of the observed data. We consider intrinsic nonidentifiability first.

The most common cause of intrinsic model nonidentifiability is a poorly specified model. Specifying a model too complex--usually, one with too many latent classes--can cause the problem. For every new latent class added to a model, more parameters require estimation; the maximum number of estimatable parameter is limited by the available degrees of freedom (the number of unique observed rating patterns, minus 1).

For example, with three binary manifest variables, there are 2 x 2 x 2 = 8 possible rating patterns; and, if all rating patterns are observed, (8 - 1) = 7 total df. An unconstrained two-class model requires exactly seven independent estimated parameters (1 latent class prevalence, and, for each latent class, three response probabilities). Because the number of estimated parameters and total df are the same, this model is "just identified"; a three class model in this case is nonidentified, however, as it would require estimation of an additional latent class prevalence and three additional response probabilities.

It happens that the basic LCA model is intrinsically unidentified whenever there are only two manifest variables (regardless of the number of rating levels; Goodman, 1974). In practice, this is seldom a limitation, as one can make the model identifiable by adding one or more equality constraints to the model, as suggested by substantive considerations.

With polytomous data, there are potentially a few specific combinations of number of manifest variables, levels of the variables, and number of latent classes where a model is intrinsically unidentified, even though there appears sufficient total df for the number of estimated parameters. At least one such combination is known (McCutcheon, 1987). Again, instances such as this can be easily handled with equality constraints.

Empirical nonidentifiability has been examined less and perhaps less common. It occurs due to certain specific accidental structures of the observed data. For example, observed data might conform perfectly to the results expected for a two-latent class model. If one specifies a three-class model, the model will be unidentified. This is not a common occurrence, but it may be more likely with small sample sizes and sparse tables.

As noted above, if a model is not identified, it can usually be made so by, for example, adding plausible equality constraints to the model. For example, if two manifest variables have response levels of "low," "medium," and "high," one might require that the conditional response probability of "low" be the same for both variables in one or more latent class. Therefore, the main concern is not so much nonidentifiality per se, but that a model might be unidentified without the researcher realizing this. This is a potential problem because, if a model is unidentified, a researcher may mistakenly accept results of LCA as "the" solution, when, in fact, it is merely one of many possible solutions.

Fortunately, it is fairly easy to detect nonidentifiability. The best method relies upon testing the matrix of second partial derivatives of all free, independent estimated model parameters towards the loglikelihood (the Hessian matrix; van de Pol, Langeheine & de Jong, 1989). If this matrix is of less than full rank, the model is not identified. A similar method can be used based on the Jacobian matrix of partial derivatives (Goodman, 1974; Clogg, 1977).

Typically this test is performed after the algorithm has converged on a solution. This is slightly inefficient inasmuch as one must first obtain estimates, only then to find, for a nonidentified model, that the estimates are meaningless. Compounding this, one characteristic of an unidentified model is that it takes an unusually long time for convergence to occur (in fact, this is one way nonidentifiability can be detected.)

A more efficient way to check for intrinsic nonidentifiability is as follows:

Supply plausible start parameter values to the estimation algorithm, and let it proceed for one iteration;

After one iteration, output the expected response pattern frequencies.

Rerun the estimation algorithm, taking the above expected frequencies as observed frequencies. Let the algorithm run to convergence (this should occur very quickly, since you have supplied data that fit the start values perfectly).

Let the program perform the identification test at the end of this run.

The advantage with this method is that only a few iterations are needed to begin the identification test. The results of the test should be valid for the original data, since intrinsic nonidentifiability does not depend on the actual data. This method, of course, does not check for accidental nonidentifiability of the original data.

The above supposes one is using a program such as PanMark or MLLSA that features a mathematical test of identifiability. For software without this feature, other methods can be used to detect nonidentifiability. One method is to run the estimation algorithm two or more times, using the same data, but different start values. If the same solution is reached using different start values, the model is identified. Similarly, one can follow Steps 1 and 2 above, then change starting values and see if the algorithm recovers the values used to generate the first set of expected frequencies; if so, then the model is identified.

Back to Contents

How are parameter standard errors estimated?

Asymptotic estimated standard errors for parameter estimates are obtained from the matrix of second partial derivatives of all independent model parameters towards the log-likelihood (Hessian matrix). The Hessian is inverted, and all elements multiplied times -1. This gives the asymptotic variance/covariance matrix; taking the square roots of the diagonal elements gives the asymptotic estimated standard errors.

Derivatives can be calculated either analytically, or numerically--i.e., by evaluating how much the log-likelihood changes when adding and/or subtracting a small value (delta) to/from model parameter values.

Parameter standard errors can also be estimated using the parametric bootstrap method. This method resamples (constructs multiple simulated data sets) using the expected frequencies of a given latent class model. Specifically, for a set of observed data, one first estimates a latent class model, second, calculates the expected frequencies given the parameter estimates so obtained, third, constructs numerous "pseudosamples" from the expected frequencies, and fourth, fits a latent class model to each pseudosample. The variation of a parameter estimate across pseudosamples gives an empirical estimate of its standard error.

Back to Contents

Go to Latent Structure Analysis site
Go to Rater Agreement site

Last updated: 08 July 2009 (new domain)

(c) 2000-2007 John Uebersax PhD email