Skip to main content
Top
Published in: Journal of General Internal Medicine 1/2012

Open Access 01-06-2012 | Original Research

Chapter 9: Options for Summarizing Medical Test Performance in the Absence of a “Gold Standard”

Authors: Thomas A. Trikalinos, MD, Cynthia M. Balion, PhD

Published in: Journal of General Internal Medicine | Special Issue 1/2012

Login to get access

Abstract

The classical paradigm for evaluating test performance compares the results of an index test with a reference test. When the reference test does not mirror the “truth” adequately well (e.g. is an “imperfect” reference standard), the typical (“naïve”) estimates of sensitivity and specificity are biased. One has at least four options when performing a systematic review of test performance when the reference standard is “imperfect”: (a) to forgo the classical paradigm and assess the index test’s ability to predict patient relevant outcomes instead of test accuracy (i.e., treat the index test as a predictive instrument); (b) to assess whether the results of the two tests (index and reference) agree or disagree (i.e., treat them as two alternative measurement methods); (c) to calculate “naïve” estimates of the index test’s sensitivity and specificity from each study included in the review and discuss in which direction they are biased; (d) mathematically adjust the “naïve” estimates of sensitivity and specificity of the index test to account for the imperfect reference standard. We discuss these options and illustrate some of them through examples.
Literature
1.
2.
go back to reference Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Radiology. 2003;226(1):24–28. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Radiology. 2003;226(1):24–28.
3.
go back to reference Rutjes AW, Reitsma JB, Coomarasamy A, Khan KS, Bossuyt PM. Evaluation of diagnostic tests when there is no gold standard. A review of methods. Health Technol Assess 2007; 11(50):iii, ix-51. Rutjes AW, Reitsma JB, Coomarasamy A, Khan KS, Bossuyt PM. Evaluation of diagnostic tests when there is no gold standard. A review of methods. Health Technol Assess 2007; 11(50):iii, ix-51.
4.
go back to reference Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004;140(3):189–202.PubMed Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004;140(3):189–202.PubMed
5.
6.
go back to reference Reitsma JB, Rutjes AW, Khan KS, Coomarasamy A, Bossuyt PM. A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidemiol. 2009;62(8):797–806.PubMedCrossRef Reitsma JB, Rutjes AW, Khan KS, Coomarasamy A, Bossuyt PM. A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidemiol. 2009;62(8):797–806.PubMedCrossRef
7.
go back to reference Jonas DE, Wilt TJ, Taylor BC, Wilkins TM, Matchar DB. Chapter 11: Challenges in and principles for conducting systematic reviews of genetic tests used as predictive indicators. J Gen Internal Med. 2011; doi: 10.1007/s11606-011-1898-z Jonas DE, Wilt TJ, Taylor BC, Wilkins TM, Matchar DB. Chapter 11: Challenges in and principles for conducting systematic reviews of genetic tests used as predictive indicators. J Gen Internal Med. 2011; doi: 10.​1007/​s11606-011-1898-z
8.
go back to reference Sun S. Meta-analysis of Cohen's kappa. Health Serv Outcomes Res Method. 2011;11:145–163.CrossRef Sun S. Meta-analysis of Cohen's kappa. Health Serv Outcomes Res Method. 2011;11:145–163.CrossRef
9.
go back to reference Sokal RR, Rohlf EF. Biometry. New York: Freeman; 1981. Sokal RR, Rohlf EF. Biometry. New York: Freeman; 1981.
10.
go back to reference Bablok W, Passing H, Bender R, Schneider B. A general regression procedure for method transformation. Application of linear regression procedures for method comparison studies in clinical chemistry, Part III. J Clin Chem Clin Biochem. 1988;26(11):783–790.PubMed Bablok W, Passing H, Bender R, Schneider B. A general regression procedure for method transformation. Application of linear regression procedures for method comparison studies in clinical chemistry, Part III. J Clin Chem Clin Biochem. 1988;26(11):783–790.PubMed
11.
go back to reference Linnet K. Estimation of the linear relationship between the measurements of two methods with proportional errors. Stat Med. 1990;9(12):1463–1473.PubMedCrossRef Linnet K. Estimation of the linear relationship between the measurements of two methods with proportional errors. Stat Med. 1990;9(12):1463–1473.PubMedCrossRef
12.
go back to reference Linnet K. Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin Chem. 1998;44(5):1024–1031.PubMed Linnet K. Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin Chem. 1998;44(5):1024–1031.PubMed
13.
14.
go back to reference Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–160.PubMedCrossRef Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–160.PubMedCrossRef
15.
go back to reference Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003;22(1):85–93.PubMedCrossRef Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003;22(1):85–93.PubMedCrossRef
16.
go back to reference Trikalinos TA, Ip S, Raman G, Cepeda MS, Balk EM, D'Ambrosio C, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Evidence Report/Technology Assessment. Rockville, MD: Agency for Healthcare Research and Quality; 2007:1–127. Evidence Report/Technology Assessment. Ref Type: Report. Trikalinos TA, Ip S, Raman G, Cepeda MS, Balk EM, D'Ambrosio C, et al. Home diagnosis of obstructive sleep apnea-hypopnea syndrome. Evidence Report/Technology Assessment. Rockville, MD: Agency for Healthcare Research and Quality; 2007:1–127. Evidence Report/Technology Assessment. Ref Type: Report.
17.
go back to reference Thompson IM, Pauler DK, Goodman PJ, Tangen CM, Lucia MS, Parnes HL, et al. Prevalence of prostate cancer among men with a prostate-specific antigen level < or =4.0 ng per milliliter. N Engl J Med. 2004;350(22):2239–2246.PubMedCrossRef Thompson IM, Pauler DK, Goodman PJ, Tangen CM, Lucia MS, Parnes HL, et al. Prevalence of prostate cancer among men with a prostate-specific antigen level < or =4.0 ng per milliliter. N Engl J Med. 2004;350(22):2239–2246.PubMedCrossRef
18.
go back to reference Vacek PM. The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics. 1985;41(4):959–968.PubMedCrossRef Vacek PM. The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics. 1985;41(4):959–968.PubMedCrossRef
19.
go back to reference Gart JJ, Buck AA. Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests. Am J Epidemiol. 1966;83(3):593–602.PubMed Gart JJ, Buck AA. Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests. Am J Epidemiol. 1966;83(3):593–602.PubMed
20.
go back to reference Goldberg JD, Wittes JT. The estimation of false negatives in medical screening. Biometrics. 1978;34(1):77–86.PubMedCrossRef Goldberg JD, Wittes JT. The estimation of false negatives in medical screening. Biometrics. 1978;34(1):77–86.PubMedCrossRef
21.
go back to reference Gyorkos TW, Genta RM, Viens P, MacLean JD. Seroepidemiology of Strongyloides infection in the Southeast Asian refugee population in Canada. Am J Epidemiol. 1990;132(2):257–264.PubMed Gyorkos TW, Genta RM, Viens P, MacLean JD. Seroepidemiology of Strongyloides infection in the Southeast Asian refugee population in Canada. Am J Epidemiol. 1990;132(2):257–264.PubMed
22.
go back to reference Joseph L, Gyorkos TW. Inferences for likelihood ratios in the absence of a "gold standard". Med Decis Making. 1996;16(4):412–417.PubMedCrossRef Joseph L, Gyorkos TW. Inferences for likelihood ratios in the absence of a "gold standard". Med Decis Making. 1996;16(4):412–417.PubMedCrossRef
23.
go back to reference Walter SD, Irwig L, Glasziou PP. Meta-analysis of diagnostic tests with imperfect reference standards. J Clin Epidemiol. 1999;52(10):943–951.PubMedCrossRef Walter SD, Irwig L, Glasziou PP. Meta-analysis of diagnostic tests with imperfect reference standards. J Clin Epidemiol. 1999;52(10):943–951.PubMedCrossRef
24.
go back to reference Walter SD, Irwig LM. Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. J Clin Epidemiol. 1988;41(9):923–937.PubMedCrossRef Walter SD, Irwig LM. Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. J Clin Epidemiol. 1988;41(9):923–937.PubMedCrossRef
25.
go back to reference Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001;57(1):158–167.PubMedCrossRef Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001;57(1):158–167.PubMedCrossRef
26.
go back to reference Black MA, Craig BA. Estimating disease prevalence in the absence of a gold standard. Stat Med. 2002;21(18):2653–2669.PubMedCrossRef Black MA, Craig BA. Estimating disease prevalence in the absence of a gold standard. Stat Med. 2002;21(18):2653–2669.PubMedCrossRef
27.
go back to reference Dendukuri N, Hadgu A, Wang L. Modeling conditional dependence between diagnostic tests: a multiple latent variable model. Stat Med. 2009;28(3):441–461.PubMedCrossRef Dendukuri N, Hadgu A, Wang L. Modeling conditional dependence between diagnostic tests: a multiple latent variable model. Stat Med. 2009;28(3):441–461.PubMedCrossRef
28.
go back to reference Garrett ES, Eaton WW, Zeger S. Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: a latent class model approach. Stat Med. 2002;21(9):1289–1307.PubMedCrossRef Garrett ES, Eaton WW, Zeger S. Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: a latent class model approach. Stat Med. 2002;21(9):1289–1307.PubMedCrossRef
29.
go back to reference Hui SL, Zhou XH. Evaluation of diagnostic tests without gold standards. Stat Methods Med Res. 1998;7(4):354–370.PubMedCrossRef Hui SL, Zhou XH. Evaluation of diagnostic tests without gold standards. Stat Methods Med Res. 1998;7(4):354–370.PubMedCrossRef
30.
go back to reference Qu Y, Tan M, Kutner MH. Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics. 1996;52(3):797–810.PubMedCrossRef Qu Y, Tan M, Kutner MH. Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics. 1996;52(3):797–810.PubMedCrossRef
31.
go back to reference Torrance-Rynard VL, Walter SD. Effects of dependent errors in the assessment of diagnostic test performance. Stat Med. 1997;16(19):2157–2175.PubMedCrossRef Torrance-Rynard VL, Walter SD. Effects of dependent errors in the assessment of diagnostic test performance. Stat Med. 1997;16(19):2157–2175.PubMedCrossRef
32.
go back to reference Toft N, Jorgensen E, Hojsgaard S. Diagnosing diagnostic tests: evaluating the assumptions underlying the estimation of sensitivity and specificity in the absence of a gold standard. Prev Vet Med. 2005;68(1):19–33.PubMedCrossRef Toft N, Jorgensen E, Hojsgaard S. Diagnosing diagnostic tests: evaluating the assumptions underlying the estimation of sensitivity and specificity in the absence of a gold standard. Prev Vet Med. 2005;68(1):19–33.PubMedCrossRef
33.
go back to reference Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics. 2004;60(2):427–435.PubMedCrossRef Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics. 2004;60(2):427–435.PubMedCrossRef
34.
go back to reference Alamanos Y, Voulgari PV, Drosos AA. Incidence and prevalence of psoriatic arthritis: a systematic review. J Rheumatol. 2008;35(7):1354–1358.PubMed Alamanos Y, Voulgari PV, Drosos AA. Incidence and prevalence of psoriatic arthritis: a systematic review. J Rheumatol. 2008;35(7):1354–1358.PubMed
35.
go back to reference Cantor T, Yang Z, Caraiani N, Ilamathi E. Lack of comparability of intact parathyroid hormone measurements among commercial assays for end-stage renal disease patients: implication for treatment decisions. Clin Chem. 2006;52(9):1771–1776.PubMedCrossRef Cantor T, Yang Z, Caraiani N, Ilamathi E. Lack of comparability of intact parathyroid hormone measurements among commercial assays for end-stage renal disease patients: implication for treatment decisions. Clin Chem. 2006;52(9):1771–1776.PubMedCrossRef
Metadata
Title
Chapter 9: Options for Summarizing Medical Test Performance in the Absence of a “Gold Standard”
Authors
Thomas A. Trikalinos, MD
Cynthia M. Balion, PhD
Publication date
01-06-2012
Publisher
Springer-Verlag
Published in
Journal of General Internal Medicine / Issue Special Issue 1/2012
Print ISSN: 0884-8734
Electronic ISSN: 1525-1497
DOI
https://doi.org/10.1007/s11606-012-2031-7

Other articles of this Special Issue 1/2012

Journal of General Internal Medicine 1/2012 Go to the issue
Live Webinar | 27-06-2024 | 18:00 (CEST)

Keynote webinar | Spotlight on medication adherence

Live: Thursday 27th June 2024, 18:00-19:30 (CEST)

WHO estimates that half of all patients worldwide are non-adherent to their prescribed medication. The consequences of poor adherence can be catastrophic, on both the individual and population level.

Join our expert panel to discover why you need to understand the drivers of non-adherence in your patients, and how you can optimize medication adherence in your clinics to drastically improve patient outcomes.

Prof. Kevin Dolgin
Prof. Florian Limbourg
Prof. Anoop Chauhan
Developed by: Springer Medicine
Obesity Clinical Trial Summary

At a glance: The STEP trials

A round-up of the STEP phase 3 clinical trials evaluating semaglutide for weight loss in people with overweight or obesity.

Developed by: Springer Medicine

Highlights from the ACC 2024 Congress

Year in Review: Pediatric cardiology

Watch Dr. Anne Marie Valente present the last year's highlights in pediatric and congenital heart disease in the official ACC.24 Year in Review session.

Year in Review: Pulmonary vascular disease

The last year's highlights in pulmonary vascular disease are presented by Dr. Jane Leopold in this official video from ACC.24.

Year in Review: Valvular heart disease

Watch Prof. William Zoghbi present the last year's highlights in valvular heart disease from the official ACC.24 Year in Review session.

Year in Review: Heart failure and cardiomyopathies

Watch this official video from ACC.24. Dr. Biykem Bozkurt discusses last year's major advances in heart failure and cardiomyopathies.