Skip to main content
Top

Open Access 09-02-2024 | Computed Tomography | Original Paper

Inconsistency between Human Observation and Deep Learning Models: Assessing Validity of Postmortem Computed Tomography Diagnosis of Drowning

Authors: Yuwen Zeng, Xiaoyong Zhang, Jiaoyang Wang, Akihito Usui, Kei Ichiji, Ivo Bukovsky, Shuoyan Chou, Masato Funayama, Noriyasu Homma

Published in: Journal of Imaging Informatics in Medicine

Login to get access

Abstract

Drowning diagnosis is a complicated process in the autopsy, even with the assistance of autopsy imaging and the on-site information from where the body was found. Previous studies have developed well-performed deep learning (DL) models for drowning diagnosis. However, the validity of the DL models was not assessed, raising doubts about whether the learned features accurately represented the medical findings observed by human experts. In this paper, we assessed the medical validity of DL models that had achieved high classification performance for drowning diagnosis. This retrospective study included autopsy cases aged 8–91 years who underwent postmortem computed tomography between 2012 and 2021 (153 drowning and 160 non-drowning cases). We first trained three deep learning models from a previous work and generated saliency maps that highlight important features in the input. To assess the validity of models, pixel-level annotations were created by four radiological technologists and further quantitatively compared with the saliency maps. All the three models demonstrated high classification performance with areas under the receiver operating characteristic curves of 0.94, 0.97, and 0.98, respectively. On the other hand, the assessment results revealed unexpected inconsistency between annotations and models’ saliency maps. In fact, each model had, respectively, around 30%, 40%, and 80% of irrelevant areas in the saliency maps, suggesting the predictions of the DL models might be unreliable. The result alerts us in the careful assessment of DL tools, even those with high classification performance.
Literature
3.
go back to reference Christe A, Aghayev E, Jackowski C, Thali MJ, Vock P: Drowning—post-mortem imaging findings by computed tomography. European radiology 18:283-290, 2008.CrossRefPubMed Christe A, Aghayev E, Jackowski C, Thali MJ, Vock P: Drowning—post-mortem imaging findings by computed tomography. European radiology 18:283-290, 2008.CrossRefPubMed
4.
go back to reference Usui A, Kawasumi Y, Funayama M, Saito H: Postmortem lung features in drowning cases on computed tomography. Japanese journal of radiology 32:414-420, 2014.CrossRefPubMed Usui A, Kawasumi Y, Funayama M, Saito H: Postmortem lung features in drowning cases on computed tomography. Japanese journal of radiology 32:414-420, 2014.CrossRefPubMed
5.
go back to reference Homma N, Zhang X, Qureshi A, Konno T, Kawasumi Y, Usui A, Funayama M, Bukovsky I, Ichiji K, Sugita N, Yoshizawa M: A deep learning aided drowning diagnosis for forensic investigations using post-mortem lung CT images. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, pp.1262–1265. https://doi.org/10.1109/EMBC44109.2020.9175731, Jul 20, 2020. Homma N, Zhang X, Qureshi A, Konno T, Kawasumi Y, Usui A, Funayama M, Bukovsky I, Ichiji K, Sugita N, Yoshizawa M: A deep learning aided drowning diagnosis for forensic investigations using post-mortem lung CT images. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, pp.1262–1265. https://​doi.​org/​10.​1109/​EMBC44109.​2020.​9175731, Jul 20, 2020.
6.
go back to reference Zeng Y, Zhang X, Kawasumi Y, Usui A, Ichiji K, Funayama M, Homma N: Deep learning-based interpretable computer-aided diagnosis of drowning for forensic radiology. In 2021 60th Annual Conference of the Society of Instrument and Control Engineers of Japan, pp. 820–824, Sep 8, 2021. Zeng Y, Zhang X, Kawasumi Y, Usui A, Ichiji K, Funayama M, Homma N: Deep learning-based interpretable computer-aided diagnosis of drowning for forensic radiology. In 2021 60th Annual Conference of the Society of Instrument and Control Engineers of Japan, pp. 820–824, Sep 8, 2021.
7.
go back to reference Ogawara T, Usui A, Homma N, Funayama M: Diagnosing drowning in postmortem CT images using artificial intelligence. The Tohoku Journal of Experimental Medicine 259(1): 65-75, 2023.CrossRef Ogawara T, Usui A, Homma N, Funayama M: Diagnosing drowning in postmortem CT images using artificial intelligence. The Tohoku Journal of Experimental Medicine 259(1): 65-75, 2023.CrossRef
8.
go back to reference Sadre R, Sundaram B, Majumdar S, Ushizima D: Validating deep learning inference during chest X-ray classification for COVID-19 screening. Scientific reports 11(1):16075, 2021.ADSCrossRefPubMedPubMedCentral Sadre R, Sundaram B, Majumdar S, Ushizima D: Validating deep learning inference during chest X-ray classification for COVID-19 screening. Scientific reports 11(1):16075, 2021.ADSCrossRefPubMedPubMedCentral
9.
go back to reference Bae J, Yu S, Oh J, Kim TH, Chung JH, Byun H, Yoon MS, Ahn C, Lee DK: External validation of deep learning algorithm for detecting and visualizing femoral neck fracture including displaced and non-displaced fracture on plain X-ray. Journal of Digital Imaging 34(5):1099-109, 2021.CrossRefPubMedPubMedCentral Bae J, Yu S, Oh J, Kim TH, Chung JH, Byun H, Yoon MS, Ahn C, Lee DK: External validation of deep learning algorithm for detecting and visualizing femoral neck fracture including displaced and non-displaced fracture on plain X-ray. Journal of Digital Imaging 34(5):1099-109, 2021.CrossRefPubMedPubMedCentral
10.
go back to reference Singh V, Danda V, Gorniak R, Flanders A, Lakhani P: Assessment of critical feeding tube malpositions on radiographs using deep learning. Journal of digital imaging 32:651-655, 2019.CrossRefPubMedPubMedCentral Singh V, Danda V, Gorniak R, Flanders A, Lakhani P: Assessment of critical feeding tube malpositions on radiographs using deep learning. Journal of digital imaging 32:651-655, 2019.CrossRefPubMedPubMedCentral
11.
go back to reference Erten M, Tuncer I, Barua PD, Yildirim K, Dogan S, Tuncer T, Tan RS, Fujita H, Acharya UR: Automated urine cell image classification model using chaotic mixer deep feature extraction. Journal of Digital Imaging 2:1-2, 2023. Erten M, Tuncer I, Barua PD, Yildirim K, Dogan S, Tuncer T, Tan RS, Fujita H, Acharya UR: Automated urine cell image classification model using chaotic mixer deep feature extraction. Journal of Digital Imaging 2:1-2, 2023.
12.
go back to reference Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, Chang GH, Joshi AS, Dwyer B, Zhu S, Kaku M: Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain 143(6):1920-1933, 2020.CrossRefPubMedPubMedCentral Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, Chang GH, Joshi AS, Dwyer B, Zhu S, Kaku M: Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain 143(6):1920-1933, 2020.CrossRefPubMedPubMedCentral
13.
go back to reference Liu H, Li L, Wormstone IM, Qiao C, Zhang C, Liu P, Li S, Wang H, Mou D, Pang R, Yang D: Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA ophthalmology 137(12):1353-1360, 2019.CrossRefPubMedPubMedCentral Liu H, Li L, Wormstone IM, Qiao C, Zhang C, Liu P, Li S, Wang H, Mou D, Pang R, Yang D: Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA ophthalmology 137(12):1353-1360, 2019.CrossRefPubMedPubMedCentral
14.
15.
go back to reference Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Proceedings of Computer Vision–ECCV, pp. 818–833, September 6–12, 2014. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Proceedings of Computer Vision–ECCV, pp. 818–833, September 6–12, 2014.
17.
go back to reference Zeng Y, Zhang X, Kawasumi Y, Usui A, Ichiji K, Funayama M, Homma N: A 2.5D deep learning-based method for drowning diagnosis using post-mortem computed tomography. IEEE Journal of Biomedical and Health Informatics 27(2):1026–1035, 2023. Zeng Y, Zhang X, Kawasumi Y, Usui A, Ichiji K, Funayama M, Homma N: A 2.5D deep learning-based method for drowning diagnosis using post-mortem computed tomography. IEEE Journal of Biomedical and Health Informatics 27(2):1026–1035, 2023.
18.
go back to reference Arun N, Gaw N, Singh P, Chang K, Aggarwal M, Chen B, Hoebel K, Gupta S, Patel J, Gidwani M, Adebayo J: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiology: Artificial Intelligence 3(6): e200267, 2021. Arun N, Gaw N, Singh P, Chang K, Aggarwal M, Chen B, Hoebel K, Gupta S, Patel J, Gidwani M, Adebayo J: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiology: Artificial Intelligence 3(6): e200267, 2021.
19.
go back to reference Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Communications of the ACM 60(6): 84-90, 2017.CrossRef Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Communications of the ACM 60(6): 84-90, 2017.CrossRef
20.
go back to reference Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, Sep 4, 2014. Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556, Sep 4, 2014.
21.
go back to reference Szegedy C, Ioffe A, Vanhoucke V, Alemi AA: Inception-v4, inception-resnet and the impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence, pp. 4278–4284, 2017. Szegedy C, Ioffe A, Vanhoucke V, Alemi AA: Inception-v4, inception-resnet and the impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence, pp. 4278–4284, 2017.
22.
go back to reference Ribeiro MT, Singh S, Guestrin C: Why should I trust you?" Explaining the predictions of any classifier: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.1135–1144, 2016. Ribeiro MT, Singh S, Guestrin C: Why should I trust you?" Explaining the predictions of any classifier: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.1135–1144, 2016.
23.
go back to reference Lundberg SM, Lee SI: A unified approach to interpreting model predictions. Advances in neural information processing systems (NIPS) 30, 2017. Lundberg SM, Lee SI: A unified approach to interpreting model predictions. Advances in neural information processing systems (NIPS) 30, 2017.
24.
go back to reference Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M: Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, Dec 21, 2014 Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M: Striving for simplicity: The all convolutional net. arXiv preprint arXiv:​1412.​6806, Dec 21, 2014
25.
go back to reference Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. IEEE winter conference on applications of computer vision (WACV),pp. 839–847, 2018. Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. IEEE winter conference on applications of computer vision (WACV),pp. 839–847, 2018.
26.
go back to reference Reyes M, Meier R, Pereira S, Silva CA, Dahlweid FM, Tengg-Kobligk HV, Summers RM, Wiest R: On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiology: artificial intelligence 27;2(3):e190043, 2020. Reyes M, Meier R, Pereira S, Silva CA, Dahlweid FM, Tengg-Kobligk HV, Summers RM, Wiest R: On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiology: artificial intelligence 27;2(3):e190043, 2020.
27.
go back to reference Wang H, Wang Z, Du M, Yang F, Zhang Z, Ding S, Mardziel P, Hu X: Score-CAM: Score-weighted visual explanations for convolutional neural networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 24–33, 2020. Wang H, Wang Z, Du M, Yang F, Zhang Z, Ding S, Mardziel P, Hu X: Score-CAM: Score-weighted visual explanations for convolutional neural networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 24–33, 2020.
29.
go back to reference Armato III SG, McLennan G, Bidaut L, McNitt‐Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics 38(2): 915-931, 2011.ADSCrossRefPubMedPubMedCentral Armato III SG, McLennan G, Bidaut L, McNitt‐Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics 38(2): 915-931, 2011.ADSCrossRefPubMedPubMedCentral
30.
go back to reference Boggust A, Hoover B, Satyanarayan A, Strobelt H: Shared interest: Measuring human-AI alignment to identify recurring patterns in model behavior. Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–17, 2022. Boggust A, Hoover B, Satyanarayan A, Strobelt H: Shared interest: Measuring human-AI alignment to identify recurring patterns in model behavior. Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–17, 2022.
31.
go back to reference Hoiem D, Chodpathumwan Y, Dai Q: Diagnosing error in object detectors. In European conference on computer vision, pp. 340–353, Oct 7, 2012. Hoiem D, Chodpathumwan Y, Dai Q: Diagnosing error in object detectors. In European conference on computer vision, pp. 340–353, Oct 7, 2012.
32.
go back to reference Redmon J, Divvala S, Girshick R, Farhadi A: You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788. 2016. Redmon J, Divvala S, Girshick R, Farhadi A: You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788. 2016.
33.
go back to reference Otsu N. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9(1): 62-66, 1979.CrossRef Otsu N. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9(1): 62-66, 1979.CrossRef
34.
go back to reference Hausman NL, Javed N, Bednar MK, Guell M, Schaller E, Nevill RE, Kahng S: Interobserver consistency: A preliminary investigation into how much is enough? Journal of applied behavior analysis 55(2): 357-368, 2022.CrossRefPubMed Hausman NL, Javed N, Bednar MK, Guell M, Schaller E, Nevill RE, Kahng S: Interobserver consistency: A preliminary investigation into how much is enough? Journal of applied behavior analysis 55(2): 357-368, 2022.CrossRefPubMed
35.
go back to reference Amgad M, Atteya LA, Hussein H, Mohammed KH, Hafiz E, Elsebaie MA, Alhusseiny AM, AlMoslemany MA, Elmatboly AM, Pappalardo PA, Sakr RA: NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. Giga Science 11: 1-12, 2022.CrossRef Amgad M, Atteya LA, Hussein H, Mohammed KH, Hafiz E, Elsebaie MA, Alhusseiny AM, AlMoslemany MA, Elmatboly AM, Pappalardo PA, Sakr RA: NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. Giga Science 11: 1-12, 2022.CrossRef
Metadata
Title
Inconsistency between Human Observation and Deep Learning Models: Assessing Validity of Postmortem Computed Tomography Diagnosis of Drowning
Authors
Yuwen Zeng
Xiaoyong Zhang
Jiaoyang Wang
Akihito Usui
Kei Ichiji
Ivo Bukovsky
Shuoyan Chou
Masato Funayama
Noriyasu Homma
Publication date
09-02-2024
Publisher
Springer International Publishing
Published in
Journal of Imaging Informatics in Medicine
Print ISSN: 2948-2925
Electronic ISSN: 2948-2933
DOI
https://doi.org/10.1007/s10278-024-00974-6