Latent Class Models for the Analysis of Rater or Test Agreement
IntroductionLatent class models for agreement data have become increasingly popular. The premise of this approach is that cases in a population belong to two or more latent classes--"latent class" simply means that the class membership of a given case is not directly observed. A latent class may be thought of as a case subtype or "genotype." A case's probability of being assigned a given rating level is assumed to depend on the case's latent class. For example, a medical population may consist of two latent classes: disease-negative (normal) cases and disease-positive cases. The probability that case is diagnosed "positive" by a rater or procedure depends on which latent class the case belongs to. Analysis aims to estimate (1) the proportion of cases in each latent class, and (2) for each latent class, the probability that a member of that latent class will elicit each rating level from a given rater. When the latent class model fits, these parameters can be used to assess rater agreement, provide information about rating accuracy, or or be put to other practical uses. Advantages of the latent class approach are as follows:
Limitations of the approach are as follows:
Since this page was originally published, many new papers have appeared. Much discussion has been given to the potential of latent class models to estimate diagnostic accuracy in the absence of a gold standard (for a recent review and extensive bibliography, see Asselineau et al., 2018). This discussion has mostly neglected the latent class/latent trait model proposed by Uebersax & Grove (1993), which relaxes the assumption of conditional independece in a simple, theoretically plausible, and computationally attractive way (i.e., by supposing a single unidimensional latent trait with a mixture-of-normals distribution). This simplification would be most justified when errors amongst raters are uncorrelated conditional on the location of cases relative to an underlying variable like disease severity. Recent papers have also mostly overlooked the fact that there are important benefits of latent class models in this context besides supplying diagnostic classifications of individual cases. For example, comparison of nested models can supply information about whether raters differ in rating level thresholds and overall bias (Uebersax & Grove, 1993). Moreover, even when not used to classify individual cases, these methods may supply an appealing alternative to kappa coefficients or other methods for assessing agreement amongst raters or diagnostic tests.
For more information on Latent Class Modeling, including
FAQs, Software information, Links, and detailed References, visit the
Latent Structure Analysis pages.
(Top of page)
Bibliography: Latent Class Models for Agreement(Top of page)
Where to StartClogg CC. Latent class models. In: Arminger G, Clogg CC, Sobel ME (Eds.), Handbook of statistical modeling for social and behavioral sciences (Ch. 6; pp. 311-359). New York: Plenum, 1995. Uebersax JS, Grove WM. Latent class analysis of diagnostic agreement. Statistics in Medicine, 1990, 9, 559-572. Latent Class Analysis in GeneralClogg CC. Latent class models. In: Arminger G, Clogg CC, Sobel ME (Eds.), Handbook of statistical modeling for social and behavioral sciences (Ch. 6; pp. 311-359). New York: Plenum, 1995. Goodman LA. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 1974, 61, 215-231. Heinen T. Latent class and discrete latent trait models: Similarities and differences. Thousand Oaks, California: Sage, 1996. Lazarsfeld PF, Henry NW. Latent structure analysis. Boston: Houghton Mifflin, 1968. McCutcheon AC. Latent class analysis. Beverly Hills: Sage Publications, 1987.
Latent Class Models for Agreement DataAlbert, Paul S. Random effects modeling approaches for estimating ROC curves from repeated ordinal tests without a gold standard. Biometrics, 2007, 63, 593 - 602. Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics, 2004, 60(2), 427–435. Albert PS, Dodd LE. On estimating diagnostic accuracy from studies with multiple raters and partial gold standard evaluation. J Am Stat Assoc. 2008 Mar 1;103(481):61–73. Albert PS, McShane LM, Shih JH, Network TUCIBT. Latent class modeling approaches for assessing diagnostic error without a gold standard: with applications to p53 immunohistochemical assay in bladder tumors. Biometrics, 2001, 57(2), 610–619. Asselineau J, Paye A, Bessède E, Perez P, Proust-Lima C. Different latent class models were used and evaluated for assessing the accuracy of campylobacter diagnostic tests: overcoming imperfect reference standards? Epidemiology and Infection, 2018, 146, 1556–1564. doi: 10.1017/S0950268818001723 Chu H, Chen S, Louis TA. Random effects models in a meta-analysis of the accuracy of two diagnostic tests without a gold standard. J Am Stat Assoc. 2009 Jun 1;104(486):512–523. Dawid AP, Skene AM. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 1979, 28, 20-28. Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple tests. Biometrics 2001; 57:158–167. Dillon WR, Mulani N. A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research, 1984, 19, 438-458. Espeland MA, Handelman SL. Using latent class models to characterize and assess relative error in discrete measurements. Biometrics, 1989, 45, 587-599. Goetghebeur E, Liinev J, Boelaert M, der Stuyft PV. Diagnostic test analyses in search of their gold standard: latent class analyses with random effects. Statistical Methods in Medical Research 2000; 9:231-248. Hadgu A, Qu Y. A biomedical application of latent class models with random effects. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1998, 47(4), 603-616. Hui SL, Zhou XH. Evaluation of diagnostic tests without gold standards. Statistical Methods in Medical Research, 1998, 7, 354-370. Menten J, Boelaert M, Lesaffre E. Bayesian latent class models with conditionally dependent diagnostic tests: A case study. Statist. Med. 2008; 27:4469-4488. Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press: Oxford, U.K., 2003. Pepe MS, Janes H. Insights into latent class analysis of diagnostic test performance. Biostatistics 2007 8(2):474-484; doi:10.1093/biostatistics/kxl038 Qu Y, Tan M, Kutner MH. Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics, 1996, 52, 797-810. Toft N, Jørgensen E, Højsgaard S. Diagnosing diagnostic tests: evaluating the assumptions underlying the estimation of sensitivity and specificity in the absence of a gold standard. Prev Vet Med. 2005 Apr;68(1):19-33. Uebersax JS. Validity inferences from interobserver agreement. Psychological Bulletin, 1988, 104, 405–416. [download] Uebersax JS. A review of modeling approaches for the analysis of observer agreement. Investigative Radiology, 1992, 17, 738–743. [download] Uebersax JS. Statistical modeling of expert ratings on medical treatment appropriateness. Journal of the American Statistical Association, 1993, 88, 421–427. [download] Uebersax JS. Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models. Applied Psychological Measurement, 1999, 23, 283–297. [download] Uebersax JS, Grove WM. Latent class analysis of diagnostic agreement. Statistics in Medicine, 1990, 9, 559–572. Uebersax JS, Grove WM. A latent trait finite mixture model for the analysis of rating agreement. Biometrics, 1993, 49, 823–835. [download] Walter SD, Irwig LM. Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. Journal of Clinical Epidemiology, 1988, 41, 923-937. Xu H, Craig BA. A probit latent class model with general correlation structures for evaluating accuracy of diagnostic tests. Biometrics, 2009, 65(4), 1145–1155. Xu H, Black M, Craig BA Evaluating accuracy of diagnostic tests with intermediate results in the absence of a gold standard. Statistics in Medicine, 2013, 32(15), 2571–2584.
Software for Latent Class AnalysisClogg CC. Unrestricted and restricted maximum likelihood latent structure analysis: a manual for users. Working paper 1977-09, Pennsylvania State University, Population Issues Research Center. van de Pol F, Langeheine R, de Jong W. PANMARK user manual. Netherlands Central Bureau of Statistics, Voorburg, The Netherlands, 1989. Vermunt J. LEM users manual. Department of Methodology, Tilburg University, Tilburg, The Netherlands, 1998. Vermunt JK, Magidson J. Latent GOLD User's Guide. Belmont, Mass.: Statistical Innovations, Inc., 2000. (Top of Page) Go to Agreement Statistics Go to Latent Structure Analysis This page maintained by John Uebersax PhD email
rev: 18 Jan 2019 (updated bibliography)
|