Skip to main content
Top
Published in: Quality of Life Research 7/2016

Open Access 01-07-2016

On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: a simulation study

Author: Niels Smits

Published in: Quality of Life Research | Issue 7/2016

Login to get access

Abstract

Purpose

To increase the precision of estimated item parameters of item response theory models for patient-reported outcomes, general population samples are often enriched with samples of clinical respondents. Calibration studies provide little information on how this sampling scheme is incorporated into model estimation. In a small simulation study the impact of ignoring the oversampling of clinical respondents on item and person parameters is illustrated.

Method

Simulations were performed using two scenarios. Under the first it was assumed that regular and clinical respondents form two distinct distributions; under the second it was assumed that they form a single distribution. A synthetic item bank with quasi-trait characteristics was created, and item scores were generated from this bank for samples with varying percentages of clinical respondents. Proper (using a multi-group model, and sample weights, respectively, for Scenarios 1 and 2) and improper (ignoring oversampling) approaches for dealing with the clinical sample were contrasted using correlations and differences between true and estimated parameters.

Results

Under the first scenario, ignoring the sampling scheme resulted in overestimation of both item and person parameters with bias decreasing with higher percentages of clinical respondents. Under the second, location and person parameters were underestimated with bias increasing in size with increasing percentage of clinical respondents. Under both scenarios, the standard error of the latent trait estimate was generally underestimated.

Conclusion

Ignoring the addition of extra clinical respondents leads to bias in item and person parameters, which may lead to biased norms and unreliable CAT scores. An appeal is made for researchers to provide more information on how clinical samples are incorporated in model estimation.
Footnotes
1
Under both scenarios the estimated location parameter values can be roughly predicted through a linear transformation: estimated parameter \(=\bar{\theta }^*+\sigma ^*\,\times\) true parameter, where \(\bar{\theta }^*\) and \(\sigma ^*\) are the overall mean and overall standard deviation of the population distribution, respectively.
 
Literature
1.
go back to reference Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45, S3–11.CrossRefPubMedPubMedCentral Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45, S3–11.CrossRefPubMedPubMedCentral
2.
go back to reference Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63(11), 1179–1194.CrossRefPubMedPubMedCentral Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., et al. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63(11), 1179–1194.CrossRefPubMedPubMedCentral
3.
go back to reference Terwee, C. B., Roorda, L. D., de Vet, H. C. W., Dekker, J., Westhovens, R., van Leeuwen, J., et al. (2014). Dutch–Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS). Quality of Life Research, 23(6), 1733–1741.PubMed Terwee, C. B., Roorda, L. D., de Vet, H. C. W., Dekker, J., Westhovens, R., van Leeuwen, J., et al. (2014). Dutch–Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS). Quality of Life Research, 23(6), 1733–1741.PubMed
5.
go back to reference Fliege, H., Becker, J., Walter, O. B., Rose, M., Bjorner, J. B., & Klapp, B. F. (2009). Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. International Journal of Methods in Psychiatric Research, 18(1), 23–36.CrossRefPubMed Fliege, H., Becker, J., Walter, O. B., Rose, M., Bjorner, J. B., & Klapp, B. F. (2009). Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. International Journal of Methods in Psychiatric Research, 18(1), 23–36.CrossRefPubMed
6.
go back to reference Walter, O. B., Becker, J., Bjorner, J. B., Fliege, H., Klapp, B. F., & Rose, M. (2007). Development and evaluation of a computer adaptive test for ‘anxiety’ (Anxiety-CAT). Quality of Life Research, 16, 143–155. doi:10.1007/s11136-007-9191-7.CrossRefPubMed Walter, O. B., Becker, J., Bjorner, J. B., Fliege, H., Klapp, B. F., & Rose, M. (2007). Development and evaluation of a computer adaptive test for ‘anxiety’ (Anxiety-CAT). Quality of Life Research, 16, 143–155. doi:10.​1007/​s11136-007-9191-7.CrossRefPubMed
7.
go back to reference Schwartz, C. E., & Oort, F. J. (2015). Abstracts of the 22nd annual conference of the international society for quality of life research. Quality of Life Research, 24(Suppl 1), 1–191.PubMed Schwartz, C. E., & Oort, F. J. (2015). Abstracts of the 22nd annual conference of the international society for quality of life research. Quality of Life Research, 24(Suppl 1), 1–191.PubMed
8.
go back to reference Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. New York, NY: Routledge. Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory. New York, NY: Routledge.
9.
go back to reference Embretson, S., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum. Embretson, S., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
10.
go back to reference Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45, S22–31.CrossRefPubMed Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the patient-reported outcomes measurement information system (PROMIS). Medical Care, 45, S22–31.CrossRefPubMed
11.
go back to reference Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FL: Holt, Rinehart and Winston. Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FL: Holt, Rinehart and Winston.
12.
go back to reference Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Review of Clinical Psychology, 5, 27–48.CrossRef Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Review of Clinical Psychology, 5, 27–48.CrossRef
13.
go back to reference Reise, S. P., & Revicki, D. A. (Eds.). (2014). Handbook of item response theory modeling: Applications to typical performance assessment. New York, NY: Routledge. Reise, S. P., & Revicki, D. A. (Eds.). (2014). Handbook of item response theory modeling: Applications to typical performance assessment. New York, NY: Routledge.
14.
go back to reference Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., Cella, D., et al. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, anxiety, and anger. Assessment, 18(3), 263–283.CrossRefPubMedPubMedCentral Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., Cella, D., et al. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, anxiety, and anger. Assessment, 18(3), 263–283.CrossRefPubMedPubMedCentral
15.
go back to reference Revicki, D. A., Chen, W.-H., Harnam, N., Cook, K. F., Amtmann, D., Callahan, L. F., et al. (2009). Development and psychometric analysis of the PROMIS pain behavior item bank. Pain, 146(1), 158–169.CrossRefPubMedPubMedCentral Revicki, D. A., Chen, W.-H., Harnam, N., Cook, K. F., Amtmann, D., Callahan, L. F., et al. (2009). Development and psychometric analysis of the PROMIS pain behavior item bank. Pain, 146(1), 158–169.CrossRefPubMedPubMedCentral
16.
go back to reference Hays, R. D., Bjorner, J. B., Revicki, D. A., Spritzer, K. L., & Cella, D. (2009). Development of physical and mental health summary scores from the patient-reported outcomes measurement information system (PROMIS) global items. Quality of Life Research, 18(7), 873–880.CrossRefPubMedPubMedCentral Hays, R. D., Bjorner, J. B., Revicki, D. A., Spritzer, K. L., & Cella, D. (2009). Development of physical and mental health summary scores from the patient-reported outcomes measurement information system (PROMIS) global items. Quality of Life Research, 18(7), 873–880.CrossRefPubMedPubMedCentral
17.
go back to reference Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S., Revicki, D., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150(1), 173–182.CrossRefPubMedPubMedCentral Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S., Revicki, D., et al. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150(1), 173–182.CrossRefPubMedPubMedCentral
18.
go back to reference Woods, C. M. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Estimating the latent density in unidimensional IRT to permit non-normality. New York, NY: Routledge. Woods, C. M. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Estimating the latent density in unidimensional IRT to permit non-normality. New York, NY: Routledge.
19.
go back to reference Smits, N., & Finkelman, M.D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of Computerized Adaptive Testing, 1, 19–37. Smits, N., & Finkelman, M.D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of Computerized Adaptive Testing, 1, 19–37.
20.
go back to reference Samejima, F. (1969). Estimation of latent ability using a pattern of graded responses. Psychometrika Monograph, 17(Suppl), 1–100. Samejima, F. (1969). Estimation of latent ability using a pattern of graded responses. Psychometrika Monograph, 17(Suppl), 1–100.
21.
go back to reference Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133–144.CrossRef Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133–144.CrossRef
22.
go back to reference Kocalevent, R. D., Rose, M., Becker, J., Walter, O. B., Fliege, H., Bjorner, J. B., et al. (2009). An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception. Journal of Clinical Epidemiology, 62, 278–287. doi:10.1016/j.jclinepi.2008.03.003.CrossRefPubMed Kocalevent, R. D., Rose, M., Becker, J., Walter, O. B., Fliege, H., Bjorner, J. B., et al. (2009). An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception. Journal of Clinical Epidemiology, 62, 278–287. doi:10.​1016/​j.​jclinepi.​2008.​03.​003.CrossRefPubMed
24.
go back to reference Borsboom, D. (2008). Psychometric perspectives on diagnostic systems. Journal of Clinical Psychology, 64(9), 1089–1108.CrossRefPubMed Borsboom, D. (2008). Psychometric perspectives on diagnostic systems. Journal of Clinical Psychology, 64(9), 1089–1108.CrossRefPubMed
25.
go back to reference Ruscio, J., & Ruscio, A. M. (2000). Informing the continuity controversy: A taxometric analysis of depression. Journal of Abnormal Psychology, 109(3), 473–487.CrossRefPubMed Ruscio, J., & Ruscio, A. M. (2000). Informing the continuity controversy: A taxometric analysis of depression. Journal of Abnormal Psychology, 109(3), 473–487.CrossRefPubMed
26.
go back to reference Crins, M. H. P., Roorda, L. D., Smits, N., de Vet, H. C. W., Westhovens, R., & Cella, D., et al. (2015a). Calibration of the Dutch–Flemish PROMIS pain behavior item bank in patients with chronic pain. European Journal of Pain. doi:10.1002/ejp.727. Crins, M. H. P., Roorda, L. D., Smits, N., de Vet, H. C. W., Westhovens, R., & Cella, D., et al. (2015a). Calibration of the Dutch–Flemish PROMIS pain behavior item bank in patients with chronic pain. European Journal of Pain. doi:10.​1002/​ejp.​727.
27.
go back to reference Crins, M. H. P., Roorda, L. D., Smits, N., de Vet, H. C. W., Westhovens, R., Cella, D., et al. (2015b). Calibration and validation of the Dutch–Flemish PROMIS pain interference item bank in patients with chronic pain. PloS One, 10(7), e0134094.CrossRefPubMedPubMedCentral Crins, M. H. P., Roorda, L. D., Smits, N., de Vet, H. C. W., Westhovens, R., Cella, D., et al. (2015b). Calibration and validation of the Dutch–Flemish PROMIS pain interference item bank in patients with chronic pain. PloS One, 10(7), e0134094.CrossRefPubMedPubMedCentral
28.
go back to reference Smits, N., Zitman, F. G., Cuijpers, P., den Hollander-Gijsman, M. E., & Carlier, I. V. E. (2012). A proof of principle for using adaptive testing in routine outcome monitoring: The efficiency of the mood and anxiety symptoms questionnaire—Anhedonic depression CAT. BMC Medical Research Methodology, 12, 2.CrossRef Smits, N., Zitman, F. G., Cuijpers, P., den Hollander-Gijsman, M. E., & Carlier, I. V. E. (2012). A proof of principle for using adaptive testing in routine outcome monitoring: The efficiency of the mood and anxiety symptoms questionnaire—Anhedonic depression CAT. BMC Medical Research Methodology, 12, 2.CrossRef
29.
go back to reference Haslam, N. (2003). Categorical versus dimensional models of mental disorder: The taxometric evidence. Australian and New Zealand Journal of Psychiatry, 37(6), 696–704.CrossRefPubMed Haslam, N. (2003). Categorical versus dimensional models of mental disorder: The taxometric evidence. Australian and New Zealand Journal of Psychiatry, 37(6), 696–704.CrossRefPubMed
30.
go back to reference Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.CrossRef Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.CrossRef
31.
go back to reference McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum. McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
32.
go back to reference Lohr, S. (1999). Sampling: Design and analysis. Pacific Grove, CA: Duxbury. Lohr, S. (1999). Sampling: Design and analysis. Pacific Grove, CA: Duxbury.
33.
go back to reference Thomas, D. R., & Cyr, A. (2002). Applying item response theory methods to complex survey data. In Proceedings of the statistical society of Canada annual meeting: Survey methods section (pp. 17–25). Thomas, D. R., & Cyr, A. (2002). Applying item response theory methods to complex survey data. In Proceedings of the statistical society of Canada annual meeting: Survey methods section (pp. 17–25).
34.
go back to reference Chalmers, R. P. (2015). The mirt package: Multidimensional item response theory, 2015. Library of the R package. Chalmers, R. P. (2015). The mirt package: Multidimensional item response theory, 2015. Library of the R package.
35.
go back to reference Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.CrossRef Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.CrossRef
37.
go back to reference Cella, D., Choi, S., Garcia, S., Cook, K. F., Rosenbloom, S., Lai Jin-Shei, J. S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research, 23(10), 2651–2661.CrossRefPubMedPubMedCentral Cella, D., Choi, S., Garcia, S., Cook, K. F., Rosenbloom, S., Lai Jin-Shei, J. S., et al. (2014). Setting standards for severity of common symptoms in oncology using the PROMIS item banks and expert judgment. Quality of Life Research, 23(10), 2651–2661.CrossRefPubMedPubMedCentral
38.
go back to reference Revicki, D. A., Chen, W.-H., & Tucker, C. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Developing Item Banks for Patient-Reported Health Outcomes. New York, NY: Routledge. Revicki, D. A., Chen, W.-H., & Tucker, C. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Developing Item Banks for Patient-Reported Health Outcomes. New York, NY: Routledge.
39.
go back to reference Brown, A., & Croudace, T. J. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Scoring and estimating score precision using multidimensional IRT models. New York, NY: Routledge. Brown, A., & Croudace, T. J. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Scoring and estimating score precision using multidimensional IRT models. New York, NY: Routledge.
40.
go back to reference Kolen, M. J., & Tong, Y. (2010). Psychometric properties of IRT proficiency estimates. Educational Measurement: Issues and Practice, 29(3), 8–14.CrossRef Kolen, M. J., & Tong, Y. (2010). Psychometric properties of IRT proficiency estimates. Educational Measurement: Issues and Practice, 29(3), 8–14.CrossRef
41.
go back to reference Kim, S., Moses, T., & Yoo, H. H. (2015). Effectiveness of item response theory (IRT) proficiency estimation methods under adaptive multistage testing. ETS Research Report Series, 2015(1), 1–19.CrossRef Kim, S., Moses, T., & Yoo, H. H. (2015). Effectiveness of item response theory (IRT) proficiency estimation methods under adaptive multistage testing. ETS Research Report Series, 2015(1), 1–19.CrossRef
42.
go back to reference Rabe-Hesketh, S., & Skrondal, A. (2008). Classical latent variable models for medical research. Statistical Methods in Medical Research, 17(1), 5–32.CrossRefPubMed Rabe-Hesketh, S., & Skrondal, A. (2008). Classical latent variable models for medical research. Statistical Methods in Medical Research, 17(1), 5–32.CrossRefPubMed
43.
go back to reference Lucke, J. F. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Unipolar item response models. New York, NY: Routledge. Lucke, J. F. (2014). Handbook of item response theory modeling: Applications to typical performance assessment, chapter Unipolar item response models. New York, NY: Routledge.
Metadata
Title
On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: a simulation study
Author
Niels Smits
Publication date
01-07-2016
Publisher
Springer International Publishing
Published in
Quality of Life Research / Issue 7/2016
Print ISSN: 0962-9343
Electronic ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-015-1199-9

Other articles of this Issue 7/2016

Quality of Life Research 7/2016 Go to the issue