Skip to main content
Top
Published in: European Radiology 4/2015

Open Access 01-04-2015 | Gastrointestinal

Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: A discussion and proposal for an alternative approach

Authors: Steve Halligan, Douglas G. Altman, Susan Mallett

Published in: European Radiology | Issue 4/2015

Login to get access

Abstract

Objectives

The objectives are to describe the disadvantages of the area under the receiver operating characteristic curve (ROC AUC) to measure diagnostic test performance and to propose an alternative based on net benefit.

Methods

We use a narrative review supplemented by data from a study of computer-assisted detection for CT colonography.

Results

We identified problems with ROC AUC. Confidence scoring by readers was highly non-normal, and score distribution was bimodal. Consequently, ROC curves were highly extrapolated with AUC mostly dependent on areas without patient data. AUC depended on the method used for curve fitting. ROC AUC does not account for prevalence or different misclassification costs arising from false-negative and false-positive diagnoses. Change in ROC AUC has little direct clinical meaning for clinicians. An alternative analysis based on net benefit is proposed, based on the change in sensitivity and specificity at clinically relevant thresholds. Net benefit incorporates estimates of prevalence and misclassification costs, and it is clinically interpretable since it reflects changes in correct and incorrect diagnoses when a new diagnostic test is introduced.

Conclusions

ROC AUC is most useful in the early stages of test assessment whereas methods based on net benefit are more useful to assess radiological tests where the clinical context is known. Net benefit is more useful for assessing clinical impact.

Key points

The area under the receiver operating characteristic curve (ROC AUC) measures diagnostic accuracy.
Confidence scores used to build ROC curves may be difficult to assign.
False-positive and false-negative diagnoses have different misclassification costs.
Excessive ROC curve extrapolation is undesirable.
Net benefit methods may provide more meaningful and clinically interpretable results than ROC AUC.
Literature
1.
go back to reference Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36CrossRefPubMed Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36CrossRefPubMed
2.
go back to reference Boyer B, Canale S, Arfi-Rouche J, Monzani Q, Khaled W, Balleyguier C (2013) Variability and errors when applying the BIRADS mammography classification. Eur J Radiol 82:388–397CrossRefPubMed Boyer B, Canale S, Arfi-Rouche J, Monzani Q, Khaled W, Balleyguier C (2013) Variability and errors when applying the BIRADS mammography classification. Eur J Radiol 82:388–397CrossRefPubMed
5.
go back to reference Fawcett T (2006) An Introduction to ROC analysis. Pattern Recogn Lett 27:861–874CrossRef Fawcett T (2006) An Introduction to ROC analysis. Pattern Recogn Lett 27:861–874CrossRef
6.
go back to reference Wagner RF, Beiden SV, Campbell G, Metz CE, Sacks WM (2002) Assessment of medical imaging and computer-assist systems: lessons from recent experience. Acad Radiol 9:1264–1277CrossRefPubMed Wagner RF, Beiden SV, Campbell G, Metz CE, Sacks WM (2002) Assessment of medical imaging and computer-assist systems: lessons from recent experience. Acad Radiol 9:1264–1277CrossRefPubMed
7.
go back to reference Obuchowski NA (2007) New methodological tools for multiple-reader ROC studies. Radiology 243:10–12CrossRefPubMed Obuchowski NA (2007) New methodological tools for multiple-reader ROC studies. Radiology 243:10–12CrossRefPubMed
8.
go back to reference Obuchowski NA, Beiden SV, Berbaum KS et al (2004) Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol 11:980–995PubMed Obuchowski NA, Beiden SV, Berbaum KS et al (2004) Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol 11:980–995PubMed
10.
go back to reference Zweig MHCG (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577PubMed Zweig MHCG (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577PubMed
11.
go back to reference Spiegelhalter D, Pearson M, Short I (2011) Visualizing uncertainty about the future. Science 333:1393–1400CrossRefPubMed Spiegelhalter D, Pearson M, Short I (2011) Visualizing uncertainty about the future. Science 333:1393–1400CrossRefPubMed
13.
go back to reference Boone D, Mallett S, Zhu S et al (2013) Patients' & healthcare professionals' values regarding true- & false-positive diagnosis when colorectal cancer screening by CT colonography: discrete choice experiment. PLoS One 8:e80767CrossRefPubMedCentralPubMed Boone D, Mallett S, Zhu S et al (2013) Patients' & healthcare professionals' values regarding true- & false-positive diagnosis when colorectal cancer screening by CT colonography: discrete choice experiment. PLoS One 8:e80767CrossRefPubMedCentralPubMed
14.
go back to reference Schwartz LM, Woloshin S, Sox HC, Fischhoff B, Welch HG (2000) US women's attitudes to false positive mammography results and detection of ductal carcinoma in situ: cross sectional survey. BMJ 320:1635–1640CrossRefPubMedCentralPubMed Schwartz LM, Woloshin S, Sox HC, Fischhoff B, Welch HG (2000) US women's attitudes to false positive mammography results and detection of ductal carcinoma in situ: cross sectional survey. BMJ 320:1635–1640CrossRefPubMedCentralPubMed
15.
go back to reference Baker SG (2003) The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. J Natl Cancer Inst 95:511–515CrossRefPubMed Baker SG (2003) The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. J Natl Cancer Inst 95:511–515CrossRefPubMed
16.
go back to reference Harrington MB (1990) Some methodological questions concerning receiver operating characteristic (ROC) analysis as a method for assessing image quality in radiology. J Digit Imaging 3:211–218CrossRefPubMed Harrington MB (1990) Some methodological questions concerning receiver operating characteristic (ROC) analysis as a method for assessing image quality in radiology. J Digit Imaging 3:211–218CrossRefPubMed
17.
go back to reference Halligan S, Altman DG, Mallett S et al (2006) Computed tomographic colonography: assessment of radiologist performance with and without computer-aided detection. Gastroenterology 131:1690–1699CrossRefPubMed Halligan S, Altman DG, Mallett S et al (2006) Computed tomographic colonography: assessment of radiologist performance with and without computer-aided detection. Gastroenterology 131:1690–1699CrossRefPubMed
18.
go back to reference Wagner RF, Metz CE, Campbell G (2007) Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 14:723–748CrossRefPubMed Wagner RF, Metz CE, Campbell G (2007) Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 14:723–748CrossRefPubMed
19.
go back to reference Gur D, Rockette HE, Bandos AI (2007) "Binary" and "non-binary" detection tasks: are current performance measures optimal? Acad Radiol 14:871–876CrossRefPubMed Gur D, Rockette HE, Bandos AI (2007) "Binary" and "non-binary" detection tasks: are current performance measures optimal? Acad Radiol 14:871–876CrossRefPubMed
20.
go back to reference Lewin JM, Hendrick RE, D'Orsi CJ et al (2001) Comparison of full-field digital mammography with screen-film mammography for cancer detection: results of 4,945 paired examinations. Radiology 218:873–880CrossRefPubMed Lewin JM, Hendrick RE, D'Orsi CJ et al (2001) Comparison of full-field digital mammography with screen-film mammography for cancer detection: results of 4,945 paired examinations. Radiology 218:873–880CrossRefPubMed
21.
go back to reference Gur D, Bandos AI, Rockette HE (2008) Comparing areas under receiver operating characteristic curves: potential impact of the "Last" experimentally measured operating point. Radiology 247:12–15CrossRefPubMedCentralPubMed Gur D, Bandos AI, Rockette HE (2008) Comparing areas under receiver operating characteristic curves: potential impact of the "Last" experimentally measured operating point. Radiology 247:12–15CrossRefPubMedCentralPubMed
22.
go back to reference Alemayehu D, Zou KH (2012) Applications of ROC analysis in medical research: recent developments and future directions. Acad Radiol 19:1457–1464CrossRefPubMed Alemayehu D, Zou KH (2012) Applications of ROC analysis in medical research: recent developments and future directions. Acad Radiol 19:1457–1464CrossRefPubMed
23.
go back to reference Zou KH (2012) Professor Charles E. Metz leaves profound legacy in ROC methodology: an introduction to the two Metz Memorial Issues. Acad Radiol 19:1447–1448CrossRefPubMed Zou KH (2012) Professor Charles E. Metz leaves profound legacy in ROC methodology: an introduction to the two Metz Memorial Issues. Acad Radiol 19:1447–1448CrossRefPubMed
24.
go back to reference Van Calster B, Vickers AJ, Pencina MJ, Baker SG, Timmerman D, Steyerberg EW (2013) Evaluation of Markers and Risk Prediction Models: Overview of Relationships between NRI and Decision-Analytic Measures. Med Decis Making 33:490–501CrossRefPubMedCentralPubMed Van Calster B, Vickers AJ, Pencina MJ, Baker SG, Timmerman D, Steyerberg EW (2013) Evaluation of Markers and Risk Prediction Models: Overview of Relationships between NRI and Decision-Analytic Measures. Med Decis Making 33:490–501CrossRefPubMedCentralPubMed
25.
go back to reference Moons KG, Stijnen T, Michel BC et al (1997) Application of treatment thresholds to diagnostic-test evaluation: an alternative to the comparison of areas under receiver operating characteristic curves. Med Decis Making 17:447–454CrossRefPubMed Moons KG, Stijnen T, Michel BC et al (1997) Application of treatment thresholds to diagnostic-test evaluation: an alternative to the comparison of areas under receiver operating characteristic curves. Med Decis Making 17:447–454CrossRefPubMed
26.
go back to reference Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27:157–172, discussion 207-112CrossRefPubMed Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27:157–172, discussion 207-112CrossRefPubMed
27.
go back to reference Halligan S, Mallett S, Altman DG et al (2011) Incremental benefit of computer-aided detection when used as a second and concurrent reader of CT colonographic data: multiobserver study. Radiology 258:469–47628CrossRefPubMed Halligan S, Mallett S, Altman DG et al (2011) Incremental benefit of computer-aided detection when used as a second and concurrent reader of CT colonographic data: multiobserver study. Radiology 258:469–47628CrossRefPubMed
28.
go back to reference Mallett S, Halligan S, Thompson M, Collins GS, Altman DG (2012) Interpreting diagnostic accuracy studies for patient care. BMJ 345:e3999CrossRefPubMed Mallett S, Halligan S, Thompson M, Collins GS, Altman DG (2012) Interpreting diagnostic accuracy studies for patient care. BMJ 345:e3999CrossRefPubMed
Metadata
Title
Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: A discussion and proposal for an alternative approach
Authors
Steve Halligan
Douglas G. Altman
Susan Mallett
Publication date
01-04-2015
Publisher
Springer Berlin Heidelberg
Published in
European Radiology / Issue 4/2015
Print ISSN: 0938-7994
Electronic ISSN: 1432-1084
DOI
https://doi.org/10.1007/s00330-014-3487-0

Other articles of this Issue 4/2015

European Radiology 4/2015 Go to the issue