Skip to main content
Top
Published in: European Radiology 12/2020

01-12-2020 | Prostate Cancer | Imaging Informatics and Artificial Intelligence

Using decision curve analysis to benchmark performance of a magnetic resonance imaging–based deep learning model for prostate cancer risk assessment

Authors: Dominik Deniffel, Nabila Abraham, Khashayar Namdar, Xin Dong, Emmanuel Salinas, Laurent Milot, Farzad Khalvati, Masoom A. Haider

Published in: European Radiology | Issue 12/2020

Login to get access

Abstract

Objectives

To benchmark the performance of a calibrated 3D convolutional neural network (CNN) applied to multiparametric MRI (mpMRI) for risk assessment of clinically significant prostate cancer (csPCa) using decision curve analysis (DCA).

Methods

We retrospectively analyzed 499 patients who had positive mpMRI (PI-RADSv2 ≥ 3) and MRI-targeted biopsy. The training cohort comprised 449 men, including a calibration set of 50 men. Biopsy decision strategies included using risk estimates from the CNN (original and calibrated), to perform biopsy in men with PI-RADSv2 ≥ 4 only, or additionally in men with PI-RADSv2 3 and PSA density (PSAd) ≥ 0.15 ng/ml/ml. Discrimination, calibration and clinical usefulness in the unseen test cohort (n = 50) were assessed using C-statistic, calibration plots and DCA, respectively.

Results

The calibrated CNN achieved moderate calibration (Hosmer-Lemeshow calibration test, p = 0.41) and good discrimination (C = 0.85). DCA revealed consistently higher net benefit and net reduction in biopsies for the calibrated CNN compared with the original CNN, PI-RADSv2 ≥ 4 and the combined strategy of PI-RADSv2 and PSAd. Original CNN predictions were severely miscalibrated (p < 0.0001) resulting in net harm compared with a ‘biopsy all’ patients strategy. At-risk thresholds ≥ 10% using the calibrated CNN and the combined strategy reduced the number of biopsies by an estimated 201 and 55 men, respectively, per 1000 men at risk, without missing csPCa, while original CNN and PI-RADSv2 ≥ 4 could not achieve a net reduction in biopsies.

Conclusions

DCA revealed that our calibrated 3D-CNN resulted in fewer unnecessary biopsies compared with using PI-RADSv2 alone or in combination with PSAd. CNN calibration is important in achieving clinical utility.

Key Points

• A 3D deep learning model applied to multiparametric MRI may help to prevent unnecessary prostate biopsies in patients eligible for MRI-targeted biopsy.
• Owing to miscalibration, original risk estimates by the deep learning model require prior calibration to enable clinical utility.
• Decision curve analysis confirmed a net benefit of using our calibrated deep learning model for biopsy decisions compared with alternative strategies, including PI-RADSv2 alone and in combination with prostate-specific antigen density.
Appendix
Available only for authorised users
Literature
3.
go back to reference Weinreb JC, Barentsz JO, Choyke PL et al (2016) PI-RADS prostate imaging – reporting and data system: 2015, version 2. Eur Urol 69:16–40CrossRef Weinreb JC, Barentsz JO, Choyke PL et al (2016) PI-RADS prostate imaging – reporting and data system: 2015, version 2. Eur Urol 69:16–40CrossRef
14.
go back to reference Goldenberg SL, Nir G, Salcudean SE (2019) A new era: artificial intelligence and machine learning in prostate cancer. Nat Rev Urol 16:391–403CrossRef Goldenberg SL, Nir G, Salcudean SE (2019) A new era: artificial intelligence and machine learning in prostate cancer. Nat Rev Urol 16:391–403CrossRef
19.
go back to reference Mottet N, Cornford P, van den Bergh RCN et al (2019) EAU - EANM - ESTRO - ESUR - SIOG guidelines on prostate cancer 2019. Eur Assoc Urol Guidel 53:1–161 Mottet N, Cornford P, van den Bergh RCN et al (2019) EAU - EANM - ESTRO - ESUR - SIOG guidelines on prostate cancer 2019. Eur Assoc Urol Guidel 53:1–161
20.
go back to reference Steyerberg EW, Vickers AJ, Cook NR et al (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21:128–138CrossRef Steyerberg EW, Vickers AJ, Cook NR et al (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21:128–138CrossRef
23.
go back to reference Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. Proc 34th Int Conf Mach Learn 70:1321–1330 Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. Proc 34th Int Conf Mach Learn 70:1321–1330
25.
go back to reference Fitzgerald M, Saville BR, Lewis RJ (2015) Decision curve analysis. JAMA 313:409–410CrossRef Fitzgerald M, Saville BR, Lewis RJ (2015) Decision curve analysis. JAMA 313:409–410CrossRef
26.
go back to reference Balachandran VP, Gonen M, Smith JJ, DeMatteo RP (2015) Nomograms in oncology: more than meets the eye. Lancet Oncol 16:e173–e180CrossRef Balachandran VP, Gonen M, Smith JJ, DeMatteo RP (2015) Nomograms in oncology: more than meets the eye. Lancet Oncol 16:e173–e180CrossRef
29.
go back to reference Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Transforming classifier scores into accurate multiclass probability estimates clinical decision support systems view project evaluation methodology view project transforming classifier scores into accurate multiclass probability estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining https://doi.org/10.1186/s12916-019-1426-2 Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Transforming classifier scores into accurate multiclass probability estimates clinical decision support systems view project evaluation methodology view project transforming classifier scores into accurate multiclass probability estimates. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining https://​doi.​org/​10.​1186/​s12916-019-1426-2
36.
37.
go back to reference Hansen NL, Kesch C, Barrett T et al (2017) Multicentre evaluation of targeted and systematic biopsies using magnetic resonance and ultrasound image-fusion guided transperineal prostate biopsy in patients with a previous negative biopsy. BJU Int 120:631–638. https://doi.org/10.1111/bju.13711CrossRef Hansen NL, Kesch C, Barrett T et al (2017) Multicentre evaluation of targeted and systematic biopsies using magnetic resonance and ultrasound image-fusion guided transperineal prostate biopsy in patients with a previous negative biopsy. BJU Int 120:631–638. https://​doi.​org/​10.​1111/​bju.​13711CrossRef
43.
go back to reference Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates clinical decision support systems view project evaluation methodology view project transforming classifier scores into accurate multiclass probability estimates. https://doi.org/10.1145/775047.775151 Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates clinical decision support systems view project evaluation methodology view project transforming classifier scores into accurate multiclass probability estimates. https://​doi.​org/​10.​1145/​775047.​775151
Metadata
Title
Using decision curve analysis to benchmark performance of a magnetic resonance imaging–based deep learning model for prostate cancer risk assessment
Authors
Dominik Deniffel
Nabila Abraham
Khashayar Namdar
Xin Dong
Emmanuel Salinas
Laurent Milot
Farzad Khalvati
Masoom A. Haider
Publication date
01-12-2020
Publisher
Springer Berlin Heidelberg
Published in
European Radiology / Issue 12/2020
Print ISSN: 0938-7994
Electronic ISSN: 1432-1084
DOI
https://doi.org/10.1007/s00330-020-07030-1

Other articles of this Issue 12/2020

European Radiology 12/2020 Go to the issue