Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2016

Open Access 01-12-2016 | Software

www.common-metrics.org: a web application to estimate scores from different patient-reported outcome measures on a common scale

Authors: H. Felix Fischer, Matthias Rose

Published in: BMC Medical Research Methodology | Issue 1/2016

Login to get access

Abstract

Background

Recently, a growing number of Item-Response Theory (IRT) models has been published, which allow estimation of a common latent variable from data derived by different Patient Reported Outcomes (PROs). When using data from different PROs, direct estimation of the latent variable has some advantages over the use of sum score conversion tables. It requires substantial proficiency in the field of psychometrics to fit such models using contemporary IRT software. We developed a web application (http://​www.​common-metrics.​org), which allows estimation of latent variable scores more easily using IRT models calibrating different measures on instrument independent scales.

Results

Currently, the application allows estimation using six different IRT models for Depression, Anxiety, and Physical Function. Based on published item parameters, users of the application can directly estimate latent trait estimates using expected a posteriori (EAP) for sum scores as well as for specific response patterns, Bayes modal (MAP), Weighted likelihood estimation (WLE) and Maximum likelihood (ML) methods and under three different prior distributions. The obtained estimates can be downloaded and analyzed using standard statistical software.

Conclusions

This application enhances the usability of IRT modeling for researchers by allowing comparison of the latent trait estimates over different PROs, such as the Patient Health Questionnaire Depression (PHQ-9) and Anxiety (GAD-7) scales, the Center of Epidemiologic Studies Depression Scale (CES-D), the Beck Depression Inventory (BDI), PROMIS Anxiety and Depression Short Forms and others. Advantages of this approach include comparability of data derived with different measures and tolerance against missing values. The validity of the underlying models needs to be investigated in the future.
Literature
1.
go back to reference Reise SP, Waller NG. Item response theory and clinical measurement. Annu Rev Clin Psychol. 2009;5:27–48.CrossRefPubMed Reise SP, Waller NG. Item response theory and clinical measurement. Annu Rev Clin Psychol. 2009;5:27–48.CrossRefPubMed
2.
go back to reference Teresi JA, Ocepek-Welikson K, Kleinman M, Cook KF, Crane PK, Gibbons LE, et al. Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measures of physical functioning ability and general distress. Qual Life Res. 2007;16 Suppl 1:43–68.CrossRefPubMed Teresi JA, Ocepek-Welikson K, Kleinman M, Cook KF, Crane PK, Gibbons LE, et al. Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measures of physical functioning ability and general distress. Qual Life Res. 2007;16 Suppl 1:43–68.CrossRefPubMed
3.
go back to reference Choi SW, Reise SP, Pilkonis PA, Hays RD, Cella D. Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Qual Life Res. 2010;19:125–36.CrossRefPubMed Choi SW, Reise SP, Pilkonis PA, Hays RD, Cella D. Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Qual Life Res. 2010;19:125–36.CrossRefPubMed
4.
go back to reference Paz SH, Spritzer KL, Morales LS, Hays RD. Evaluation of the Patient-Reported Outcomes Information System (PROMIS(®)) Spanish-language physical functioning items. Qual Life Res. 2013;22:1819–30.CrossRefPubMed Paz SH, Spritzer KL, Morales LS, Hays RD. Evaluation of the Patient-Reported Outcomes Information System (PROMIS(®)) Spanish-language physical functioning items. Qual Life Res. 2013;22:1819–30.CrossRefPubMed
5.
go back to reference McHorney CA, Cohen AS. Equating health status measures with item response theory: illustrations with functional status items. Med Care. 2000;38:43–59.CrossRef McHorney CA, Cohen AS. Equating health status measures with item response theory: illustrations with functional status items. Med Care. 2000;38:43–59.CrossRef
6.
go back to reference Schalet BD, Revicki DA, Cook KF, Krishnan E, Fries JF, Cella D. Establishing a Common Metric for Physical Function: Linking the HAQ-DI and SF-36 PF Subscale to PROMIS® Physical Function. Med: J. Gen. Intern; 2015. Schalet BD, Revicki DA, Cook KF, Krishnan E, Fries JF, Cella D. Establishing a Common Metric for Physical Function: Linking the HAQ-DI and SF-36 PF Subscale to PROMIS® Physical Function. Med: J. Gen. Intern; 2015.
7.
go back to reference ten Klooster P, Oude Voshaar MAH, Gandek B, Rose M, Bjorner JB, Taal E, et al. Development and evaluation of a crosswalk between the SF-36 physical functioning scale and Health Assessment Questionnaire disability index in rheumatoid. Health Qual Life Outcomes. 2013;11:199.CrossRefPubMedPubMedCentral ten Klooster P, Oude Voshaar MAH, Gandek B, Rose M, Bjorner JB, Taal E, et al. Development and evaluation of a crosswalk between the SF-36 physical functioning scale and Health Assessment Questionnaire disability index in rheumatoid. Health Qual Life Outcomes. 2013;11:199.CrossRefPubMedPubMedCentral
8.
go back to reference Chen W-H, Revicki DA, Lai J-S, Cook KF, Amtmann D. Linking pain items from two studies onto a common scale using item response theory. J Pain Symptom Manage Elsevier Inc. 2009;38:615–28.CrossRef Chen W-H, Revicki DA, Lai J-S, Cook KF, Amtmann D. Linking pain items from two studies onto a common scale using item response theory. J Pain Symptom Manage Elsevier Inc. 2009;38:615–28.CrossRef
9.
go back to reference Cook KF, Schalet BD, Kallen Ma., Rutsohn JP, Cella D. Establishing a common metric for self-reported pain: linking BPI Pain Interference and SF-36 Bodily Pain Subscale scores to the PROMIS Pain Interference metric. Qual Life Res. 2015;24:2305–18. Cook KF, Schalet BD, Kallen Ma., Rutsohn JP, Cella D. Establishing a common metric for self-reported pain: linking BPI Pain Interference and SF-36 Bodily Pain Subscale scores to the PROMIS Pain Interference metric. Qual Life Res. 2015;24:2305–18.
10.
go back to reference Lai J-S, Cella D, Yanez B, Stone A. Linking Fatigue Measures on a Common Reporting Metric. Elsevier Ltd: J. Pain Symptom Manage; 2014. Lai J-S, Cella D, Yanez B, Stone A. Linking Fatigue Measures on a Common Reporting Metric. Elsevier Ltd: J. Pain Symptom Manage; 2014.
11.
go back to reference Bjorner JB, Kosinski M, Ware JE. Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales. Qual Life Res. 2003;12:981–1002.CrossRefPubMed Bjorner JB, Kosinski M, Ware JE. Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales. Qual Life Res. 2003;12:981–1002.CrossRefPubMed
12.
go back to reference Schalet BD, Cook KF, Choi SW, Cella D. Establishing a common metric for self-reported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. J Anxiety Disord Elsevier Ltd. 2014;28:88–96.CrossRef Schalet BD, Cook KF, Choi SW, Cella D. Establishing a common metric for self-reported anxiety: Linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. J Anxiety Disord Elsevier Ltd. 2014;28:88–96.CrossRef
13.
go back to reference Choi SW, Schalet BD, Cook KF, Cella D. Establishing a Common Metric for Depressive Symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS Depression. Psychol Assess. 2014;26:513–27.CrossRefPubMed Choi SW, Schalet BD, Cook KF, Cella D. Establishing a Common Metric for Depressive Symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS Depression. Psychol Assess. 2014;26:513–27.CrossRefPubMed
14.
go back to reference Wahl I, Löwe B, Bjorner JB, Fischer HF, Langs G, Voderholzer U, et al. Standardization of depression measurement: a common metric was developed for 11 self-report depression measures. J Clin Epidemiol. 2014;67:73–86.CrossRefPubMed Wahl I, Löwe B, Bjorner JB, Fischer HF, Langs G, Voderholzer U, et al. Standardization of depression measurement: a common metric was developed for 11 self-report depression measures. J Clin Epidemiol. 2014;67:73–86.CrossRefPubMed
15.
go back to reference Fischer HF, Tritt K, Klapp BF, Fliege H. How to compare scores from different depression scales: equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using Item Response. Int J Methods Psychiatr Res. 2011;20:203–14.CrossRefPubMed Fischer HF, Tritt K, Klapp BF, Fliege H. How to compare scores from different depression scales: equating the Patient Health Questionnaire (PHQ) and the ICD-10-Symptom Rating (ISR) using Item Response. Int J Methods Psychiatr Res. 2011;20:203–14.CrossRefPubMed
16.
go back to reference Gibbons LE, Feldman BJ, Crane HM, Mugavero M, Willig JH, Patrick D, et al. Migrating from a legacy fixed-format measure to CAT administration: calibrating the PHQ-9 to the PROMIS depression measures. Qual Life Res. 2011;20:1349–57.CrossRefPubMedPubMedCentral Gibbons LE, Feldman BJ, Crane HM, Mugavero M, Willig JH, Patrick D, et al. Migrating from a legacy fixed-format measure to CAT administration: calibrating the PHQ-9 to the PROMIS depression measures. Qual Life Res. 2011;20:1349–57.CrossRefPubMedPubMedCentral
17.
go back to reference Olino TM, Yu L, McMakin DL, Forbes EE, Seeley JR, Lewinsohn PM, et al. Comparisons across depression assessment instruments in adolescence and young adulthood: an item response theory study using two linking methods. J Abnorm Child Psychol. 2013;41:1267–77.CrossRefPubMedPubMedCentral Olino TM, Yu L, McMakin DL, Forbes EE, Seeley JR, Lewinsohn PM, et al. Comparisons across depression assessment instruments in adolescence and young adulthood: an item response theory study using two linking methods. J Abnorm Child Psychol. 2013;41:1267–77.CrossRefPubMedPubMedCentral
18.
go back to reference Haley SM, Ni P, Lai J-S, Tian F, Coster WJ, Jette AM, et al. Linking the activity measure for post acute care and the quality of life outcomes in neurological disorders. Arch Phys Med Rehabil. 2011;92:S37–43.CrossRefPubMedPubMedCentral Haley SM, Ni P, Lai J-S, Tian F, Coster WJ, Jette AM, et al. Linking the activity measure for post acute care and the quality of life outcomes in neurological disorders. Arch Phys Med Rehabil. 2011;92:S37–43.CrossRefPubMedPubMedCentral
19.
go back to reference Thissen D, Pommerich M, Billeaud K, Williams VSL. Item response theory for scores on tests including polytomous items with ordered responses. Appl Psychol Meas. 1995;19:39–49.CrossRef Thissen D, Pommerich M, Billeaud K, Williams VSL. Item response theory for scores on tests including polytomous items with ordered responses. Appl Psychol Meas. 1995;19:39–49.CrossRef
20.
go back to reference Fischer HF, Klug C, Roeper K, Blozik E, Edelmann F, Eisele M, et al. Screening for mental disorders in heart failure patients using computer-adaptive tests. Qual Life Res. 2014;23:1609–18.CrossRefPubMed Fischer HF, Klug C, Roeper K, Blozik E, Edelmann F, Eisele M, et al. Screening for mental disorders in heart failure patients using computer-adaptive tests. Qual Life Res. 2014;23:1609–18.CrossRefPubMed
21.
go back to reference Spitzer RL. Validation and Utility of a Self-report Version of PRIME-MD: The PHQ Primary Care Study. JAMA. 1999;282:1737–44.CrossRefPubMed Spitzer RL. Validation and Utility of a Self-report Version of PRIME-MD: The PHQ Primary Care Study. JAMA. 1999;282:1737–44.CrossRefPubMed
22.
go back to reference Kroenke K, Spitzer RL, Williams JBW, Löwe B. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. Gen Hosp Psychiatry Elsevier BV. 2010;32:345–59.CrossRef Kroenke K, Spitzer RL, Williams JBW, Löwe B. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. Gen Hosp Psychiatry Elsevier BV. 2010;32:345–59.CrossRef
23.
go back to reference Radloff LS. The CES-D Scale: A Self-Report Depression Scale for Research in the General Population. Appl Psychol Meas. 1977;1:385–401.CrossRef Radloff LS. The CES-D Scale: A Self-Report Depression Scale for Research in the General Population. Appl Psychol Meas. 1977;1:385–401.CrossRef
24.
go back to reference Hautzinger M, Bailer M, Worall H, Keller F. BDI Beck-Depressions-Inventar Testhandbuch. 2nd ed. Bern: Hans Huber; 1995. Hautzinger M, Bailer M, Worall H, Keller F. BDI Beck-Depressions-Inventar Testhandbuch. 2nd ed. Bern: Hans Huber; 1995.
25.
go back to reference Pilkonis PA, Choi SW, Reise SP, Stover AM, Riley WT, Cella D. Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger. Assessment. 2011;18:263–83.CrossRefPubMedPubMedCentral Pilkonis PA, Choi SW, Reise SP, Stover AM, Riley WT, Cella D. Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger. Assessment. 2011;18:263–83.CrossRefPubMedPubMedCentral
28.
go back to reference Chalmers RP. mirt: A Multidimensional Item Response Theory Package for the R Environment. J Stat Softw. 2012;48:1–29. Chalmers RP. mirt: A Multidimensional Item Response Theory Package for the R Environment. J Stat Softw. 2012;48:1–29.
29.
go back to reference R Development Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2008. R Development Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2008.
30.
go back to reference RStudio Inc. shiny: Web Application Framework for R. R package Version 0.9.1. 2014. RStudio Inc. shiny: Web Application Framework for R. R package Version 0.9.1. 2014.
32.
go back to reference Fischer HF, Wahl I, Fliege H, Klapp BF, Rose M. Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales. Med Care. 2012;50:320–6.CrossRefPubMed Fischer HF, Wahl I, Fliege H, Klapp BF, Rose M. Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales. Med Care. 2012;50:320–6.CrossRefPubMed
33.
go back to reference Gorter R, Fox J-P, Twisk J. Why Item Response Theory should be used for longitudinal questionnaire data analysis in medical research. BMC Med Res Methodol. 2015;15:1–12. Gorter R, Fox J-P, Twisk J. Why Item Response Theory should be used for longitudinal questionnaire data analysis in medical research. BMC Med Res Methodol. 2015;15:1–12.
34.
go back to reference Gorter R, Fox J-P, Apeldoorn A, Twisk J. The influence of measurement model choice for randomized controlled trial results. Elsevier Ltd: J. Clin. Epidemiol; 2016. Gorter R, Fox J-P, Apeldoorn A, Twisk J. The influence of measurement model choice for randomized controlled trial results. Elsevier Ltd: J. Clin. Epidemiol; 2016.
35.
go back to reference Marsman M, Maris G, Bechger T, Glas C. What can we learn from Plausible Values? Psychometrika. Springer US; 2016. Marsman M, Maris G, Bechger T, Glas C. What can we learn from Plausible Values? Psychometrika. Springer US; 2016.
Metadata
Title
www.common-metrics.org: a web application to estimate scores from different patient-reported outcome measures on a common scale
Authors
H. Felix Fischer
Matthias Rose
Publication date
01-12-2016
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2016
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-016-0241-0

Other articles of this Issue 1/2016

BMC Medical Research Methodology 1/2016 Go to the issue