Top

Quality of Life Research

Published in:

Open Access 01-04-2018 | Commentary

Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes

Authors: Niels Smits, Muirne C. S. Paap, Jan R. Böhnke

Published in: Quality of Life Research | Issue 4/2018

Abstract

Purpose

Multidimensional item response theory and computerized adaptive testing (CAT) are increasingly used in mental health, quality of life (QoL), and patient-reported outcome measurement. Although multidimensional assessment techniques hold promises, they are more challenging in their application than unidimensional ones. The authors comment on minimal standards when developing multidimensional CATs.

Methods

Prompted by pioneering papers published in QLR, the authors reflect on existing guidance and discussions from different psychometric communities, including guidelines developed for unidimensional CATs in the PROMIS project.

Results

The commentary focuses on two key topics: (1) the design, evaluation, and calibration of multidimensional item banks and (2) how to study the efficiency and precision of a multidimensional item bank. The authors suggest that the development of a carefully designed and calibrated item bank encompasses a construction phase and a psychometric phase. With respect to efficiency and precision, item banks should be large enough to provide adequate precision over the full range of the latent constructs. Therefore CAT performance should be studied as a function of the latent constructs and with reference to relevant benchmarks. Solutions are also suggested for simulation studies using real data, which often result in too optimistic evaluations of an item bank’s efficiency and precision.

Discussion

Multidimensional CAT applications are promising but complex statistical assessment tools which necessitate detailed theoretical frameworks and methodological scrutiny when testing their appropriateness for practical applications. The authors advise researchers to evaluate item banks with a broad set of methods, describe their choices in detail, and substantiate their approach for validation.

Martin, M., Kosinski, M., Bjorner, J. B., Ware, J. E., MacLean, R., & Li, T. (2007). Item response theory methods can improve the measurement of physical function by combining the Modified Health Assessment Questionnaire and the SF-36 Physical Function Scale. Quality of Life Research, 16(4), 647–660.CrossRefPubMed

Reise, S. P., Morizot, J., & Hays, R. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 19–31. https://doi.org/10.1007/s11136-007-9183-7.CrossRefPubMed

Swartz, R. J., Schwartz, C., Basch, E., Cai, L., Fairclough, D. L., McLeod, L., … Rapkin, B. (2011). The king’s foot of patient-reported outcomes: Current practices and new developments for the measurement of change. Quality of Life Research, 20(8), 1159–1167.CrossRefPubMedPubMedCentral

Deng, N., Guyer, R., & Ware, J. E. (2015). Energy, fatigue, or both? A bifactor modeling approach to the conceptualization and measurement of vitality. Quality of Life Research, 24(1), 81–93.CrossRefPubMed

Wu, S. M., Schuler, T. A., Edwards, M. C., Yang, H.-C., & Brothers, B. M. (2013). Factor analytic and item response theory evaluation of the Penn State Worry Questionnaire in women with cancer. Quality of Life Research, 22(6), 1441–1449.CrossRefPubMed

Yost, K. J., Waller, N. G., Lee, M. K., & Vincent, A. (2017). The PROMIS fatigue item bank has good measurement properties in patients with fibromyalgia and severe fatigue. Quality of Life Research, 26(6), 1417–1426.CrossRefPubMed

Michel, P., Baumstarck, K., Lancon, C., Ghattas, B., Loundou, A., Auquier, P., & Boyer, L. (2017). Modernizing quality of life assessment: Development of a multidimensional computerized adaptive questionnaire for patients with schizophrenia. Quality of Life Research. https://doi.org/10.1007/s11136-017-1553-1.

Zheng, Y., Chang, C.-H., & Chang, H.-H. (2013). Content-balancing strategy in bifactor computerized adaptive patient-reported outcome measurement. Quality of Life Research, 22(3), 491–499. https://doi.org/10.1007/s11136-012-0179-6.CrossRefPubMed

Fayers, P. M., & Machin, D. (2007). Quality of life: The assessment, analysis and interpretation of patient-reported outcomes (2nd ed.). Chichester: Wiley.CrossRef

10.

Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1–23.CrossRef

11.

Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational and Psychological Measurement, 75(6), 954–978.CrossRef

12.

Wang, W.-C., & Chen, P.-H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28(5), 295–316.CrossRef

13.

Fayers, P. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research, 16, 187–194. https://doi.org/10.1007/s11136-007-9197-1.CrossRefPubMed

14.

Doostfatemeh, M., Ayatollah, S. M. T., & Jafari, P. (2016). Power and sample size calculations in clinical trials with patient-reported outcomes under equal and unequal group sizes based on graded response model: A simulation study. Value in Health, 19(5), 639–647. https://doi.org/10.1016/j.jval.2016.03.1857.CrossRefPubMed

15.

Emons, W. H. M., Sijtsma, K., & Meijer, R. R. (2007). On the consistency of individual classification using short scales. Psychological Methods, 12(1), 105–120. https://doi.org/10.1037/1082-989X.12.1.105.CrossRefPubMed

16.

Heo, M., Kim, N., & Faith, M. S. (2015). Statistical power as a function of Cronbach alpha of instrument questionnaire items. BMC Medical Research Methodology, 15(1), 86. https://doi.org/10.1186/s12874-015-0070-6.CrossRefPubMedPubMedCentral

17.

Holman, R., Glas, C. A. W., & de Haan, R. J. (2003). Power analysis in randomized clinical trials based on item response theory. Controlled Clinical Trials, 24(4), 390–410.CrossRefPubMed

18.

Sebille, V., Hardouin, J.-B., Le Neel, T., Kubis, G., Boyer, F., Guillemin, F., & Falissard, B. (2010). Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients: A simulation study. BMC Medical Research Methodology, 10(1), 24.CrossRefPubMedPubMedCentral

19.

Costa, D. S. J. (2015). Reflective, causal, and composite indicators of quality of life: A conceptual or an empirical distinction? Quality of Life Research, 24(9), 2057–2065. https://doi.org/10.1007/s11136-015-0954-2.CrossRefPubMed

20.

Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis, and application of psychological and educational tests. The Hague: Eleven Publishing.

21.

Landsheer, J. A., & Boeije, H. R. (2008). In search of content validity: Facet analysis as a qualitative method to improve questionnaire design. Quality & Quantity, 44(1), 59.CrossRef

22.

Brod, M., Tesler, L. E., & Christensen, T. L. (2009). Qualitative research and content validity: Developing best practices based on science and experience. Quality of Life Research, 18, 1263–1278.CrossRefPubMed

23.

Paap, M. C. S., Bode, C., Lenferink, L. I. M., Terwee, C. B., & van der Palen, J. (2015). Identifying key domains of health-related quality of life for patients with chronic obstructive pulmonary disease: Interviews with healthcare professionals. Quality of Life Research, 24(6), 1351–1367.CrossRefPubMed

24.

Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A. … On Behalf of the PROMIS Cooperative Group. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45(5), S22–S31.CrossRefPubMed

25.

Chernyshenko, O. S., Stark, S., Drasgow, F., & Roberts, B. W. (2007). Constructing personality scales under the assumptions of an ideal point response process: Toward increasing the flexibility of personality measures. Psychological Assessment, 19(1), 88–106. https://doi.org/10.1037/1040-3590.19.1.88.CrossRefPubMed

26.

Bonifay, W., Lane, S. P., & Reise, S. P. (2017). Three concerns with applying a bifactor model as a structure of psychopathology. Clinical Psychological Science, 5(1), 184–186.CrossRef

27.

Edwards, M. C., & Edelen, M. O. (2009). Special topics in item response theory. In R. E. Millsap & A. Maydeu-Olivares (Eds.), The SAGE handbook of quantitative methods in psychology (pp. 178–198). London: SAGE.CrossRef

28.

Reckase, M. D. (2009). Multidimensional item response theory. New York: Spring.CrossRef

29.

MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111(3), 490–504. https://doi.org/10.1037/0033-2909.111.3.490.CrossRefPubMed

30.

Browne, M. W. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44(1), 108–132. https://doi.org/10.1006/jmps.1999.1279.CrossRefPubMed

31.

Cai, L., Thissen, D., & du Toit, S. H. W. (2011). IRTPRO for windows. Lincolnwood, IL: Scientific Software International.

32.

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 29.CrossRef

33.

Glas, C. A. W. (2010). Preliminary manual of the software program multidimensional item response theory (MIRT). University of Twente. Enschede: Department of Research Methodology, Measurement and Data-Analysis.

34.

Cai, L. (2017). flexMIR version 3.51: Flexible multilevel multidimensional item analysis and test scoring. Chapel Hill, NC: Vector Psychometric Group.

35.

Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C.-H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research, 16(1), 109–119.CrossRefPubMed

36.

Smits, N. (2016). On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: A simulation study. Quality of Life Research, 25(7), 1635–1644.CrossRefPubMed

37.

Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144. https://doi.org/10.1111/j.1745-3984.1990.tb00738.x.CrossRef

38.

Tsutakawa, R. K., & Johnson, J. C. (1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, 55(2), 371–390.CrossRef

39.

Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299. https://doi.org/10.1037/a0015825.CrossRefPubMed

40.

Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7, 109.PubMedPubMedCentral

41.

Li, Y. H., & Schafer, W. D. (2005). Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics. Applied Psychological Measurement, 29(1), 3–25. https://doi.org/10.1177/0146621604270667.CrossRef

42.

Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20(4), 389–404. https://doi.org/10.1177/014662169602000406.CrossRef

43.

Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354. https://doi.org/10.1007/BF02294343.CrossRef

44.

Paap, M. C. S., Kroeze, K. A., Glas, C. A. W., Terwee, C. B., van der Palen, J., & Veldkamp, B. P. (2017). Measuring patient-reported outcomes adaptively: Multidimensionality matters!. Applied Psychological Measurement. https://doi.org/10.1177/0146621617733954.

45.

Paap, M. C. S., Born, S., & Braeken, J. (in press). Measurement efficiency for fixed-precision multidimensional computerized adaptive tests: Comparing health measurement and educational testing using example banks. Applied Psychological Measurement.

46.

Thissen, D. J. (2000). Reliability and measurement precision. In H. Wainer (Ed.), Computerized adaptive testing: A primer (2nd ed., pp. 159–184). Mahwah, NJ: Lawrence Erlbaum Associates.

47.

Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37(1), 3–23.CrossRef

48.

Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21(4), 347–360. https://doi.org/10.1111/j.1745-3984.1984.tb01039.x.CrossRef

49.

Nicewander, W. A., & Thomasson, G. L. (1999). Some reliability estimates for computerized adaptive tests. Applied Psychological Measurement, 23(3), 239–247.CrossRef

50.

Boyd, A. M., Dodd, B. G., & Choi, S. W. (2010). Polytomous models in computerized adaptive testing. In M. L. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 229–255). New York: Routledge.

51.

Paap, M. C. S., Kroeze, K. A., Terwee, C. B., van der Palen, J., & Veldkamp, B. P. (2017). Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life. Quality of Life Research, 26(11), 2909–2918. https://doi.org/10.1007/s11136-017-1624-3.CrossRefPubMedPubMedCentral

52.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill.

53.

Gorin, J. S., Dodd, B. G., Fitzpatrick, S. J., & Shieh, Y. Y. (2005). Computerized adaptive testing with the partial credit model: Estimation procedures, population distributions, and item pool characteristics. Applied Psychological Measurement, 29(6), 433–456.CrossRef

54.

Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473–492.CrossRef

55.

Ayala, R. J. D. (1994). The influence of multidimensionality on the graded response model. Applied Psychological Measurement, 18(2), 155–170.CrossRef

56.

Wang, C., Chang, H.-H., & Boughton, K. A. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37(2), 99–122.CrossRef

57.

Flens, G., Smits, N., Carlier, I., van Hemert, A. M., & de Beurs, E. (2016). Simulating computer adaptive testing with the Mood and Anxiety Symptom Questionnaire. Psychological Assessment, 28(8), 953–962. https://doi.org/10.1037/pas0000240.CrossRefPubMed

58.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York: Springer.CrossRef

59.

Smits, N., Zitman, F. G., Cuijpers, P., den Hollander-Gijsman, M. E., & Carlier, I. V. (2012). A proof of principle for using adaptive testing in routine Outcome Monitoring: The efficiency of the Mood and Anxiety Symptoms Questionnaire-Anhedonic Depression CAT. BMC Medical Research Methodology, 12(1), 4.CrossRefPubMedPubMedCentral

60.

Levy, P. (1967). The correction for spurious correlation in the evaluation of short-form tests. Journal of Clinical Psychology, 23(1), 84–86.CrossRefPubMed

61.

Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

62.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah: Lawrence Erlbaum Associates.

63.

Choi, S. W., & van der Linden, W. J. (2017). Ensuring content validity of patient-reported outcomes: A shadow-test approach to their adaptive measurement. Quality of Life Research. https://doi.org/10.1007/s11136-017-1650-1.

64.

Smits, N., van der Ark, L. A., & Conijn, J. M. (2017). Measurement versus prediction in the construction of patient-reported outcome questionnaires: Can we have our cake and eat it? Quality of Life Research. https://doi.org/10.1007/s11136-017-1720-4.PubMed

65.

Maruyama, G., & Ryan, C. S. (2014). Research methods in social relations. Oxford: Wiley.

66.

Bollen, K. A., & Bauldry, S. (2011). Three Cs in measurement models: Causal indicators, composite indicators, and covariates. Psychological Methods, 16(3), 265–284. https://doi.org/10.1037/a0024448.CrossRefPubMedPubMedCentral

67.

Edwards, J. R. (2011). The fallacy of formative measurement. Organizational Research Methods, 14(2), 370–388. https://doi.org/10.1177/1094428110378369.CrossRef

68.

Fayers, P. M., & Hand, D. J. (2002). Causal variables, indicator variables and measurement scales: An example from quality of life. Journal of the Royal Statistical Society: Series A (Statistics in Society), 165(2), 233–253. https://doi.org/10.1111/1467-985X.02020.CrossRef

69.

Brazier, J., Ratcliffe, J., Salomon, J., & Tsuchiya, A. (2016). Measuring and valuing health benefits for economic evaluation. Oxford: Oxford University Press.CrossRef

70.

Food and Drug Administration. (2006). Draft guidance for industry or patient-reported outcome measures: Use in medical product development to support labeling claims. Federal Register, 71, 5862–5863.

71.

Dueck, A. C., & Sloan, J. A. (2007). Meeting on the FDA draft guidance on patient-reported outcomes. Value in Health, 10, S64–S65. https://doi.org/10.1111/j.1524-4733.2007.00268.x.CrossRefPubMed

72.

Ahmed, S., Berzon, R. A., Revicki, D. A., Lenderking, W. R., Moinpour, C. M., Basch, E. … & International Society for Quality of Life Research. (2012). The use of patient-reported outcomes (PRO) within comparative effectiveness research: Implications for clinical practice and health care policy. Medical Care, 50(12), 1060–1070.

73.

Speight, J., & Barendse, S. M. (2010). FDA guidance on patient reported outcomes. BMJ, 340, c2921. https://doi.org/10.1136/bmj.c2921.

74.

Reeve, B. B., Wyrwich, K. W., Wu, A. W., Velikova, G., Terwee, C. B., Snyder, C. F., … Butt, Z. (2013). ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Quality of Life Research, 22(8), 1889–1905.CrossRefPubMed

75.

Basch, E., Geoghegan, C., Coons, S., et al. (2015). Patient-reported outcomes in cancer drug development and us regulatory review: Perspectives from industry, the food and drug administration, and the patient. JAMA Oncology, 1(3), 375–379. https://doi.org/10.1001/jamaoncol.2015.0530.CrossRefPubMed

76.

Chang, C.-H., & Reeve, B. B. (2005). Item Response Theory and its applications to patient-reported outcomes measurement. Evaluation & the Health Professions, 28(3), 264–282. https://doi.org/10.1177/0163278705278275.CrossRef

77.

Sprangers, M. A. G., & Schwartz, C. E. (2017). Toward mindfulness in quality-of-life research: Perspectives on how to avoid rigor becoming rigidity. Quality of Life Research, 26(6), 1387–1392.CrossRefPubMedPubMedCentral

Title: Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes
Authors: Niels Smits
Muirne C. S. Paap
Jan R. Böhnke
Publication date: 01-04-2018
Publisher: Springer International Publishing
Published in: Quality of Life Research / Issue 4/2018
Print ISSN: 0962-9343
Electronic ISSN: 1573-2649
DOI: https://doi.org/10.1007/s11136-018-1821-8

At a glance: The STEP trials

Springer Medicine

Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes

Abstract

Purpose

Methods

Results

Discussion

At a glance: The STEP trials

Springer Medicine

Abstract

Purpose

Methods

Results

Discussion

Please log in to get access to this content

Other articles of this Issue 4/2018

Correction to: Impact of an electronic monitoring device and behavioural feedback on adherence to multiple sclerosis therapies in youth: results of a randomized trial

Utility of a patient-reported outcome in measuring functional impairment during autologous stem cell transplant in patients with multiple myeloma

Health-related quality of life in mothers of children with epilepsy: 10 years after diagnosis

Measurement invariance of the WHOQOL-AGE questionnaire across three European countries

Quality of life of French Canadian parents raising a child with autism spectrum disorder and effects of psychosocial factors

Conversion of standard retrospective patient-reported outcomes to momentary versions: cognitive interviewing reveals varying degrees of momentary compatibility