Skip to main content
Top
Published in: The Patient - Patient-Centered Outcomes Research 1/2014

01-03-2014 | Practical Application

An Introduction to Item Response Theory for Patient-Reported Outcome Measurement

Authors: Tam H. Nguyen, Hae-Ra Han, Miyong T. Kim, Kitty S. Chan

Published in: The Patient - Patient-Centered Outcomes Research | Issue 1/2014

Login to get access

Abstract

The growing emphasis on patient-centered care has accelerated the demand for high-quality data from patient-reported outcome (PRO) measures. Traditionally, the development and validation of these measures has been guided by classical test theory. However, item response theory (IRT), an alternate measurement framework, offers promise for addressing practical measurement problems found in health-related research that have been difficult to solve through classical methods. This paper introduces foundational concepts in IRT, as well as commonly used models and their assumptions. Existing data on a combined sample (n = 636) of Korean American and Vietnamese American adults who responded to the High Blood Pressure Health Literacy Scale and the Patient Health Questionnaire-9 are used to exemplify typical applications of IRT. These examples illustrate how IRT can be used to improve the development, refinement, and evaluation of PRO measures. Greater use of methods based on this framework can increase the accuracy and efficiency with which PROs are measured.
Literature
1.
go back to reference Brook RH, Ware JE, Jr., Davies-Avery A, Stewart AL, Donald CA, Rogers WH, et al. Overview of adult health measures fielded in Rand’s health insurance study. Med Care. 1979;17(7 Suppl):iii–x, 1–131. Brook RH, Ware JE, Jr., Davies-Avery A, Stewart AL, Donald CA, Rogers WH, et al. Overview of adult health measures fielded in Rand’s health insurance study. Med Care. 1979;17(7 Suppl):iii–x, 1–131.
3.
go back to reference Darzi L. High quality care for all: NHS Next Stage Review final report. 2008. Contract No.: ISBN 978-0-10-174322-8. Darzi L. High quality care for all: NHS Next Stage Review final report. 2008. Contract No.: ISBN 978-0-10-174322-8.
7.
go back to reference Hambleton RK. Emergence of item response modeling in instrument development and data analysis. Med Care. 2000;38(9 Suppl):II60–5.PubMed Hambleton RK. Emergence of item response modeling in instrument development and data analysis. Med Care. 2000;38(9 Suppl):II60–5.PubMed
8.
go back to reference Nunnally JC. Psychometric theory. New York: McGraw Hill; 1967. Nunnally JC. Psychometric theory. New York: McGraw Hill; 1967.
9.
go back to reference Embretson SE. The new rules of measurement. Psychol Assess. 1996;8(4):341–9.CrossRef Embretson SE. The new rules of measurement. Psychol Assess. 1996;8(4):341–9.CrossRef
10.
go back to reference Hambleton RK, Jones RW. Comparison of classical test theory and item response theory and their applications to test development. Instructional Topics in Educational Measurement. 1993. p. 38–47. Hambleton RK, Jones RW. Comparison of classical test theory and item response theory and their applications to test development. Instructional Topics in Educational Measurement. 1993. p. 38–47.
11.
go back to reference Hambleton RK, Swaminathan H, Rogers WH. Fundamentals of item response theory. Newbury Park: Sage Publications; 1991. Hambleton RK, Swaminathan H, Rogers WH. Fundamentals of item response theory. Newbury Park: Sage Publications; 1991.
12.
go back to reference Brennan RL, editor. Educational measurement. 4th ed. Westport: Praeger Publishers; 2006. Brennan RL, editor. Educational measurement. 4th ed. Westport: Praeger Publishers; 2006.
13.
go back to reference van der Linden WJ, Hambleton RK. Handbook of modern item response theory. New York: Springer; 1997.CrossRef van der Linden WJ, Hambleton RK. Handbook of modern item response theory. New York: Springer; 1997.CrossRef
14.
go back to reference Holland PW, Wainer H. Differential item functioning. Hillsdale: Lawrence Erlbaum Associates; 1993. Holland PW, Wainer H. Differential item functioning. Hillsdale: Lawrence Erlbaum Associates; 1993.
15.
go back to reference Reeve BB. An introduction to modern measurement theory. National Cancer Institute. 2002. Reeve BB. An introduction to modern measurement theory. National Cancer Institute. 2002.
16.
go back to reference Baker F. The basis of item response theory. 2nd ed. College Park: ERIC Clearinghouse on Assessment and Evaluation; 2001. Baker F. The basis of item response theory. 2nd ed. College Park: ERIC Clearinghouse on Assessment and Evaluation; 2001.
17.
go back to reference Lord FM. The relation of test score to the trait underlying the test. Educ Psychol Meas. 1953;13:517–48.CrossRef Lord FM. The relation of test score to the trait underlying the test. Educ Psychol Meas. 1953;13:517–48.CrossRef
18.
go back to reference Birnbaum A. Part 5: some latent trait models and their use in inferring an examinee’s ability. In: Lord FM, Novick MR, editors. Statistical theories of mental test scores. Reading: Addison-Wesley; 1968. Birnbaum A. Part 5: some latent trait models and their use in inferring an examinee’s ability. In: Lord FM, Novick MR, editors. Statistical theories of mental test scores. Reading: Addison-Wesley; 1968.
19.
go back to reference Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: MESA; 1960. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: MESA; 1960.
20.
go back to reference Reeve BB, Fayers P. Applying item response theory modeling for evaluating questionnaire item and scale properties. In: Fayers P, Hays RD, editors. Assessing quality of life in clinical trials: methods of practice. 2nd ed. Oxford: Oxford University Press; 2005. p. 55–73. Reeve BB, Fayers P. Applying item response theory modeling for evaluating questionnaire item and scale properties. In: Fayers P, Hays RD, editors. Assessing quality of life in clinical trials: methods of practice. 2nd ed. Oxford: Oxford University Press; 2005. p. 55–73.
21.
go back to reference Embretson SE, Reise SP. Item response theory for psychologists. Mahwah: Lawrence Erlbaum Associates; 2000. Embretson SE, Reise SP. Item response theory for psychologists. Mahwah: Lawrence Erlbaum Associates; 2000.
22.
go back to reference Samejima F. Estimation of latent ability using a response pattern of graded scores. Psychom Monogr. 1969;34(17 Suppl):386–415. Samejima F. Estimation of latent ability using a response pattern of graded scores. Psychom Monogr. 1969;34(17 Suppl):386–415.
23.
go back to reference Andrich D. A rating formulation for ordered response categories. Psychometrika. 1978;43:561–73.CrossRef Andrich D. A rating formulation for ordered response categories. Psychometrika. 1978;43:561–73.CrossRef
24.
go back to reference Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149–74.CrossRef Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149–74.CrossRef
25.
go back to reference Muraki E. A generalized partial credit model: application of an EM algorithm. Appl Psychol Meas. 1992;17:159–76.CrossRef Muraki E. A generalized partial credit model: application of an EM algorithm. Appl Psychol Meas. 1992;17:159–76.CrossRef
26.
go back to reference Bock RD. Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika. 1972;37:29–51.CrossRef Bock RD. Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika. 1972;37:29–51.CrossRef
27.
go back to reference Reckase M. Unifactor latent trait models applied to multifactor tests: results and implications. J Educ Stat. 1979;4:207–30.CrossRef Reckase M. Unifactor latent trait models applied to multifactor tests: results and implications. J Educ Stat. 1979;4:207–30.CrossRef
28.
go back to reference Hattie J. Methodology review: assessing unidimensionality of tests and items. Appl Psychol Meas. 1985;9:139–64.CrossRef Hattie J. Methodology review: assessing unidimensionality of tests and items. Appl Psychol Meas. 1985;9:139–64.CrossRef
29.
go back to reference Stout W. A nonparametric approach for assessing latent trait unidimensionality. Psychometrika. 1987;52:589–617.CrossRef Stout W. A nonparametric approach for assessing latent trait unidimensionality. Psychometrika. 1987;52:589–617.CrossRef
30.
go back to reference Gessaroli M, DeChamplain A. Using an approximate Chi-square statistic to test the number of dimensions underlying the responses to a set of items. J Educ Meas. 1996;33:157–79.CrossRef Gessaroli M, DeChamplain A. Using an approximate Chi-square statistic to test the number of dimensions underlying the responses to a set of items. J Educ Meas. 1996;33:157–79.CrossRef
31.
go back to reference Reise SP. Item response theory and its applications for cancer outcomes measurement. In: Lipscomb J, Gotay CC, Snyder C, editors. Outcomes assessment in cancer: measures, methods, and applications. Cambridge: Cambridge University Press; 2004. p. 425–44.CrossRef Reise SP. Item response theory and its applications for cancer outcomes measurement. In: Lipscomb J, Gotay CC, Snyder C, editors. Outcomes assessment in cancer: measures, methods, and applications. Cambridge: Cambridge University Press; 2004. p. 425–44.CrossRef
33.
go back to reference Smith RM, Plackner C. The family approach to assessing fit in Rasch measurement. J Appl Meas. 2009;10(4):424–37.PubMed Smith RM, Plackner C. The family approach to assessing fit in Rasch measurement. J Appl Meas. 2009;10(4):424–37.PubMed
34.
go back to reference Bond TG, Fox CM. Applying the Rasch model: fundamental measurement in the human sciences. Hillsdale: Lawrence Erlbaum Baum Associates; 2001. Bond TG, Fox CM. Applying the Rasch model: fundamental measurement in the human sciences. Hillsdale: Lawrence Erlbaum Baum Associates; 2001.
35.
go back to reference Wright BD, Mead J. BICAL: calibrating items and scales with the Rasch model (Research Memorandum No. 23). Chicago: University of Chicago, Department of Education, Statistical Laboratory; 1977. Wright BD, Mead J. BICAL: calibrating items and scales with the Rasch model (Research Memorandum No. 23). Chicago: University of Chicago, Department of Education, Statistical Laboratory; 1977.
36.
go back to reference Orlando M, Thissen D. Likelihood-based item-fit indices for dichotomous item response theory models. Appl Psychol Meas. 2000;24(1):50–64.CrossRef Orlando M, Thissen D. Likelihood-based item-fit indices for dichotomous item response theory models. Appl Psychol Meas. 2000;24(1):50–64.CrossRef
37.
go back to reference McLeod LD, Swygert KA, Thissen D. Factor analysis for items scored in two categories. In: Thissen D, Wainer H, editors. Test scoring. Mahwah: Lawrence Earlbaum & Associates; 2001. McLeod LD, Swygert KA, Thissen D. Factor analysis for items scored in two categories. In: Thissen D, Wainer H, editors. Test scoring. Mahwah: Lawrence Earlbaum & Associates; 2001.
38.
go back to reference Haley SM, McHorney CA, Ware JE Jr. Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. Unidimensionality and reproducibility of the Rasch item scale. J Clin Epidemiol. 1994;47(6):671–84 (pii: 0895-4356(94)90215-1).PubMedCrossRef Haley SM, McHorney CA, Ware JE Jr. Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. Unidimensionality and reproducibility of the Rasch item scale. J Clin Epidemiol. 1994;47(6):671–84 (pii: 0895-4356(94)90215-1).PubMedCrossRef
40.
go back to reference Looveer J, Mulligan J. The efficacy of link items in the construction of a numeracy achievement scale—from kindergarten to year 6. J Appl Meas. 2009;10:247–65.PubMed Looveer J, Mulligan J. The efficacy of link items in the construction of a numeracy achievement scale—from kindergarten to year 6. J Appl Meas. 2009;10:247–65.PubMed
41.
go back to reference Linacre JM. Sample size and item calibration stability. Rasch Meas Trans. 1994;7(4):328. Linacre JM. Sample size and item calibration stability. Rasch Meas Trans. 1994;7(4):328.
42.
go back to reference Tsutakawa RK, Johnson JC. The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika. 1990;55:371–90.CrossRef Tsutakawa RK, Johnson JC. The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika. 1990;55:371–90.CrossRef
43.
go back to reference Orlando M, Marshall GN. Differential item functioning in a Spanish translation of the PTSD checklist: detection and evaluation of impact. Psychol Assess. 2002;14(1):50–9.PubMedCrossRef Orlando M, Marshall GN. Differential item functioning in a Spanish translation of the PTSD checklist: detection and evaluation of impact. Psychol Assess. 2002;14(1):50–9.PubMedCrossRef
44.
go back to reference Thissen D, Steinberg L, Gerrard M. Beyond group mean differences: the concept of item bias. Psychol Bull. 1986;99(1):118–28.CrossRef Thissen D, Steinberg L, Gerrard M. Beyond group mean differences: the concept of item bias. Psychol Bull. 1986;99(1):118–28.CrossRef
46.
go back to reference Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. JAMA. 1999;282(18):1737–44 (pii: joc90770).PubMedCrossRef Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. JAMA. 1999;282(18):1737–44 (pii: joc90770).PubMedCrossRef
47.
48.
go back to reference Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300.
49.
go back to reference Chen WH, Thissen D. Local dependance indices for item pairs using item response theory. J Educ Behav Stat. 1997;22:265–89.CrossRef Chen WH, Thissen D. Local dependance indices for item pairs using item response theory. J Educ Behav Stat. 1997;22:265–89.CrossRef
50.
go back to reference Stucki G, Daltroy L, Katz JN, Johannesson M, Liang MH. Interpretation of change scores in ordinal clinical scales and health status measures: the whole may not equal the sum of the parts. J Clin Epidemiol. 1996;49(7):711–7 (pii: 0895-4356(96)00016-9).PubMedCrossRef Stucki G, Daltroy L, Katz JN, Johannesson M, Liang MH. Interpretation of change scores in ordinal clinical scales and health status measures: the whole may not equal the sum of the parts. J Clin Epidemiol. 1996;49(7):711–7 (pii: 0895-4356(96)00016-9).PubMedCrossRef
51.
go back to reference Ware JE, Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies of widely used headache impact scales. Med Care. 2000;38(9 Suppl):II73–82.PubMed Ware JE, Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies of widely used headache impact scales. Med Care. 2000;38(9 Suppl):II73–82.PubMed
55.
go back to reference Salsman JM, Victorson D, Choi SW, Peterman AH, Heinemann AW, Nowinski C, et al. Development and validation of the positive affect and well-being scale for the neurology quality of life (Neuro-QOL) measurement system. Qual Life Res. 2013. doi:10.1007/s11136-013-0382-0. Salsman JM, Victorson D, Choi SW, Peterman AH, Heinemann AW, Nowinski C, et al. Development and validation of the positive affect and well-being scale for the neurology quality of life (Neuro-QOL) measurement system. Qual Life Res. 2013. doi:10.​1007/​s11136-013-0382-0.
56.
go back to reference Muraki E, Bock RD. PARSCALE 4 for windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software]. Skokie: Scientific Software International, Inc.; 2003. Muraki E, Bock RD. PARSCALE 4 for windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software]. Skokie: Scientific Software International, Inc.; 2003.
57.
go back to reference Thissen D, Chen WH, Bock RD. MULTILOG 7 for windows: multiple-category item analysis and test scoring using item response theory [Computer software]. Skokie: Scientific Software International, Inc.; 2003. Thissen D, Chen WH, Bock RD. MULTILOG 7 for windows: multiple-category item analysis and test scoring using item response theory [Computer software]. Skokie: Scientific Software International, Inc.; 2003.
58.
go back to reference Muthén LK, Muthén BO. Mplus user’s guide. Los Angeles: Muthén & Muthén; 2011. Muthén LK, Muthén BO. Mplus user’s guide. Los Angeles: Muthén & Muthén; 2011.
59.
go back to reference Cai L, Thissen D, du Toit S. IRTPRO 2.1 for Windows: Item response theory for patient-reported outcomes [Computer software]. Lincolnwood: Scientific Software International, Inc.; 2011. Cai L, Thissen D, du Toit S. IRTPRO 2.1 for Windows: Item response theory for patient-reported outcomes [Computer software]. Lincolnwood: Scientific Software International, Inc.; 2011.
60.
go back to reference Zimowski MF, Muraki E, Mislevy RJ, Bock RD. BILOG-MG 3 for windows: multiple-group IRT analysis and test maintenance for binary items [Computer software]. Skokie: Scientific Software International, Inc; 2003. Zimowski MF, Muraki E, Mislevy RJ, Bock RD. BILOG-MG 3 for windows: multiple-group IRT analysis and test maintenance for binary items [Computer software]. Skokie: Scientific Software International, Inc; 2003.
61.
go back to reference Houts CR, Cai L. flexMIRT version 1.88: a numerical engine for multilevel item factor analysis and test scoring [Computer software]. Seattle: Vector Psychometric Group; 2012. Houts CR, Cai L. flexMIRT version 1.88: a numerical engine for multilevel item factor analysis and test scoring [Computer software]. Seattle: Vector Psychometric Group; 2012.
62.
go back to reference RUMM Laboratory Pty Ltd. RUMM2030 [Computer software]. Perth: RUMM Laboratory Pty Ltd; 2012. RUMM Laboratory Pty Ltd. RUMM2030 [Computer software]. Perth: RUMM Laboratory Pty Ltd; 2012.
63.
go back to reference Linacre JM. Winsteps version 3.80.0 [Computer Software]. Beaverton: Winsteps.com; 2013. Linacre JM. Winsteps version 3.80.0 [Computer Software]. Beaverton: Winsteps.com; 2013.
64.
go back to reference StataCorp. Stata Statistical Software: Release 13. College Station: StataCorp LP; 2013. StataCorp. Stata Statistical Software: Release 13. College Station: StataCorp LP; 2013.
65.
go back to reference Rizopoulos D. ltm: an R package for latent variable modelling and item response theory analyses. J Stat Softw. 2006;17:1–25.CrossRef Rizopoulos D. ltm: an R package for latent variable modelling and item response theory analyses. J Stat Softw. 2006;17:1–25.CrossRef
67.
go back to reference Childs RA, Chen WH. Obtaining comparable item parameter estimates in MULTILOG and PARSCALE for two polytomous IRT models. Appl Psychol Meas. 1999;23:371–9.CrossRef Childs RA, Chen WH. Obtaining comparable item parameter estimates in MULTILOG and PARSCALE for two polytomous IRT models. Appl Psychol Meas. 1999;23:371–9.CrossRef
68.
go back to reference Paek I, Han KT. IRTPRO 2.1 for windows (item response theory for patient-reported outcomes). Appl Psychol Meas. 2013;37(3):242–52.CrossRef Paek I, Han KT. IRTPRO 2.1 for windows (item response theory for patient-reported outcomes). Appl Psychol Meas. 2013;37(3):242–52.CrossRef
Metadata
Title
An Introduction to Item Response Theory for Patient-Reported Outcome Measurement
Authors
Tam H. Nguyen
Hae-Ra Han
Miyong T. Kim
Kitty S. Chan
Publication date
01-03-2014
Publisher
Springer International Publishing
Published in
The Patient - Patient-Centered Outcomes Research / Issue 1/2014
Print ISSN: 1178-1653
Electronic ISSN: 1178-1661
DOI
https://doi.org/10.1007/s40271-013-0041-0

Other articles of this Issue 1/2014

The Patient - Patient-Centered Outcomes Research 1/2014 Go to the issue