Latent Trait Analysis and Item Response Theory (IRT) Models

This page links to several other pages that discuss Latent Trait Analysis. These pages are not as extensive as the Latent Class Analysis web pages . They are not a comprehensive guide to latent trait modeling, but only an accretion of pages and files that deal with specific aspects of the subject. Still, they do contain basic information on latent trait models and may help to introduce the subject to the newcomer.

Introduction

Latent Trait Analysis (LTA), a form of latent structure analysis (Lazarsfeld & Henry, 1968), is used for the analysis of categorical data. The simplest way to understand it is that LTA is form of factor analysis for binary (dichotomous) or ordered-category data. In the area of educational testing and psychological measurement, latent trait analysis is termed Item Response Theory (IRT). There is so much overlap between LTA and IRT that these terms are basically interchangeable.

What LTA Can Do

Using LTA, one can reduce a set of many binary or ordered-category variables to a small set of factors--with LTA, these factors are called latent traits. Just as with factor analysis, this can be done for data reduction, data exploration, or theory confirmation.

Many applications of latent trait and item response modeling involve one-factor models, also called unidimensional latent trait models. In this context LTA/IRT models are very flexible and can be used in many specialized ways. Their power and flexibility derive from the fact that they are formalized probability models--they include a "theory" that relates the unobserved (latent) construct(s) of interest--educational attainment, disease severity, program effect, etc.--to the observed (manifest) variables that are actually measured.

With educational testing in particular, these methods have proven very effective. Many well-known tests of academic ability and attainment, such as the SAT, the GRE and the LSAT, are analyzed and, to some extent, constructed using unidimensional latent trait models. In this context, latent trait models let one:

Precisely measure the difficulty or easiness of each item.
Determine the association of each item with the construct being measured.
Determine which items are biased in the sense of having different meanings or measurement characteristics in different subpopulations.
Design a test with the fewest items necessary to measure the construct with requisite accuracy.
Measure test accuracy at different levels of respondent ability.
Design an "adaptive test"--one where answers to preceding items determine which items are subsequently administered, again with the aim of producing the shortest overall test.

The above pertain mainly to educational testing, but they illustrate the range and power of LTA/IRT models.

Basic Readings

The two essential texts on the subject remain:

Lord FM, Novick MR. Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley, 1968.

Two more recent books are:

Dayton CM. Latent class scaling analysis. (Quantitative Applications in the Social Sciences, Vol. 126.) Newbury Park, California: Sage Publications, May 1999.

A good introduction, suitable for nonstatisticians, is:

Safrit MJ, Cohen AS, Costa MG. Item response theory and the measurement of motor behavior. Research Quarterly For Exercise and Sport, 1989, 60, 325-335.

An important article in the development of latent trait models is:

Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 1981, 46, 443-459.

Finally, one book I found helpful in learning these methods is:

Hulin CL, Drasgow F, Parsons CK. Item response theory. Homewood, Illinois: Dow Jones-Irwin, 1983. Additional references are found on the other specific web pages here.

Issues

There are two main variants of latent trait models. One is Gaussian or so-called "normal ogive" models ("ogive" refers to the characteristic "s"-shape of an item response function). These models derive from the assumption of normally-distributed measurement error.

The other main variation consists of logistic-ogive and Rasch models. These models derive from somewhat different theoretical assumptions than Gaussian latent trait models.

In practice, it often does not make much difference which variation one pursues. However, one should attempt to choose a model according to how well the theoretical assumptions fit the application.

The Gaussian models tend to be underutilized and less familiar than the logistic/Rasch latent trait models. Partly this is due to a mistaken perception that the former are computationally more difficult than the latter. Also, at this point, there is a certain amount of "inertia" to overcome; many textbooks on IRT models, for example, almost exclusively discuss the logistic/Rasch models.

My own work has mainly been in the area of Gaussian latent trait models. I see these as especially advantageous for at least two reasons:

The assumption of normally distributed measurement error is very plausible in many applications; and
The Gaussian latent trait model can be estimated via what is termed the "heuristic method" with standard software, such as SAS or LISREL.

Because of my greater familiarity with the Gaussian models, they are given most attention on these pages. However, I encourage people to learn about logistic/Rasch models as well. Some good basic discussions may be found in:

Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of item response theory. Newbury Park: Sage, 1991.

There are also variations of latent trait/IRT models that use nonparametric item response functions, though these are not discussed here.

An important issue concerns the choice between using latent trait models versus more classical methods. This question often occurs in the context of developing a scale in survey, behavioral, or health research. In general, my advice is this: it is not necessary to pursue latent trait models in every application. If one merely wishes to develop a scale consisting of a few (3-20) items that all measure the same trait and to administer this scale to measure some trait of interest, it often is not necessary to apply latent trait models. In general, scale scores produced by latent trait modeling correlate extremely highly (e.g., r = .99) with corresponding scores produced merely by summing respondents scores on the individual items of a scale.

Latent trait and IRT models are best used when any of the following are true:

There is an assumed multidimensional structure to the set of items, and one wishes to understand the multidimensional structure.
There is a large number of items (say 50 or more).
There is some special feature of the research that calls for a formal probability model--such as the need to investigate item bias, or to develop an adaptive test.

Specific Pages and Files

The following are pages and files on this web site that pertain to various aspects of latent trait modeling.

References Cited

Uebersax JS. Statistical modeling of expert ratings on medical treatment appropriateness. Journal of the American Statistical Association, 1993, 88, 421-427.

Uebersax JS, Grove WM. A latent trait finite mixture model for the analysis of rating agreement. Biometrics, 1993, 49, 823-835.

Go to Latent Structure Analysis
Go to Statistical Methods for Rater Agreement

Revised: 30 June 2000