Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2019 | Autopsy | Research article

Automatically determining cause of death from verbal autopsy narratives

Authors: Serena Jeblee, Mireille Gomes, Prabhat Jha, Frank Rudzicz, Graeme Hirst

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Abstract

Background

A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person’s death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone.

Methods

After preprocessing and spelling correction, our method extracts word frequency counts from the narratives and uses them as input to four different machine learning classifiers: naïve Bayes, random forest, support vector machines, and a neural network.

Results

For individual CoD classification, our best classifier achieves a sensitivity of.770 for adult deaths for 15 CoD categories (as compared to the current best reported sensitivity of.57), and.662 with 48 WHO categories. When predicting the CoD distribution at the population level, our best classifier achieves.962 cause-specific mortality fraction accuracy for 15 categories and.908 for 48 categories, which is on par with leading CoD distribution estimation methods.

Conclusions

Our narrative-based machine learning classifier performs as well as classifiers based on structured data at the individual level. Moreover, our method demonstrates that VA narratives provide important information that can be used by a machine learning system for automated CoD classification. Unlike the structured questionnaire-based methods, this method can be applied to any verbal autopsy dataset, regardless of the collection process or country of origin.

Available only for authorised users

Danso et al. [18] also lowercased the text in their dataset but removed punctuation and did not remove stopwords or perform spelling correction.

We use the implementation of the Porter Stemmer provided in NLTK [33].

We use scikit-learn’s SelectKBest module with the f_classif function [30].

For optimization we use the hyperopt Python library[34].

Department of Economic and Social Affairs PopulationDivision United Nations. World Population Prospects: The 2012 revision.United Nations, Department of Economic and Social Affairs, Population Division; 2013.

Jha P. Reliable direct measurement of causes of death in low- and middle-income countries. BMC Med. 2014; 12:19.CrossRef

Aleksandrowicz L, Malhotra V, Dikshit R, Prakash C Gupta RK, Sheth J, Rathi SK, et al. Performance criteria for verbal autopsy-based systems to estimate national causes of death: Development and application to the Indian Million Death Study. BMC Med. 2014; 12:21.CrossRef

Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJ. Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metrics. 2011; 9(32):1–13.

Ram U, Dikshit R, Jha P. Level of evidence of verbal autopsy–Authors’ reply. Lancet Glob Health. 2016; 4(6):e368—e9.CrossRef

Berkley JA, Lowe BS, Mwangi I, Williams T, Bauni E, Mwarumba S, et al. Bacteremia among children admitted to a rural hospital in Kenya. New Engl J Med. 2005; 352(1):39–47.CrossRef

Desai N, Aleksandrowicz L, Miasnikof P, Lu Y, Leitao J, Byass P, et al. Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries. BMC Med. 2014; 12:20.CrossRef

King C, Zamawe C, Banda M, Bar-Zeev N, Beard J, Bird J, et al. The quality and diagnostic value of open narratives in verbal autopsy: A mixed-methods analysis of partnered interviews from Malawi. BMC Med Res Methodol. 2016; 16:13.CrossRef

Gajalakshmi V, Commentary PR. Verbal autopsy procedure for adult deaths. Int J Epidemiol. 2006; 35(3):748–50.CrossRef

10.

Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metrics. 2011; 9:28. Erratum [11].CrossRef

11.

Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Erratum To: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metrics. 2014; 12:7.CrossRef

12.

Flaxman AD, Serina PT, Hernandez B, Murray CJ, Riley I, Lopez AD. Measuring causes of death in populations: a new metric that corrects cause-specific mortality fractions for chance. Popul Health Metrics. 2015; 13:28.CrossRef

13.

Boulle A, Chandramohan D, Weller P. A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol. 2001; 30(3):515–20.CrossRef

14.

Byass P, Chandramohan D, Clark S, D’Ambruoso L, Fottrell E, Graham W, et al. Strengthening standardised interpretation of verbal autopsy data: The new InterVA-4 tool. Glob Health Action. 2012; 5:19281.CrossRef

15.

McCormick TH, Li ZR, Calvert C, Crampin AC, Kahn K, Clark S. Probabilistic cause-of-death assignment using verbal autopsies. J Am Stat Assoc. 2016; 111(15):1036–49.CrossRef

16.

James SL, Flaxman AD, Murray CJ. Performance of the Tariff Method: Validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metrics. 2011; 9(1):31–47.CrossRef

17.

Miasnikof P, Giannakeas V, Gomes M, Aleksandrowicz L, Shestopaloff AY, Alam D, et al. Naïve Bayes classifiers for verbal autopsies: Comparison to physician-based classification for 21,000 child and adult deaths. BMC Med. 2015; 13(1):286–94.CrossRef

18.

Danso S, Atwell E, Johnson O. A comparative study of machine learning methods for verbal autopsy text classification. Int J Comput Sci Issues. 2013; 10(6):1–10.

19.

Danso S, Atwell E, Johnson O. Linguistic and Statistically Derived Features for Cause of Death Prediction from Verbal Autopsy Text. Linguistic Processing and Knowledge in the Web. Springer: 2013. p. 47–60.

20.

Nichols EK, Byass P, Chandramohan D, Clark SJ, Flaxman AD, Jakob R, et al. The WHO 2016 verbal autopsy instrument: An international standard suitable for automated analysis by InterVA, InSilicoVA, and Tariff 2.0. PLOS Med. 2018; 01;15(1):1–9.

21.

King G, Lu Y. Verbal autopsy methods with multiple causes of death. Stat Sci. 2008; 23(1):78–91.CrossRef

22.

World Health Organization. The 2012 WHO Verbal Autopsy Instrument. Geneva: World Health Organization; 2012.

23.

Serina P, Riley I, Stewart A, James SL, Flaxman AD, Lozano R, et al. Improving performance of the Tariff Method for assigning causes of death to verbal autopsies. BMC Med. 2015; 13(1):291.CrossRef

24.

Population Health Metrics Research Consortium (PHMRC). Population Health Metrics Research Consortium Gold Standard Verbal Autopsy Data 2005-2011. 2013. http://ghdx.healthdata.org/record/population-health-metrics-research-consortium-gold-standard-verbal-autopsy-data-2005-2011. Accessed 1 Nov 2018.

25.

Gomes M, Begum R, Sati P, Dikshit R, Gupta PC, Kumar R, et al. Nationwide Mortality Studies to Quantify Causes of Death: Relevant Lessons from India’s Million Death Study. Health Aff. 2017; 36(11):1887–95.CrossRef

26.

Gomes M, Kumar D, Budukh A, et al. Computer versus Physician Coding of Cause of Deaths using Verbal Autopsies: a randomised trial of 9374 deaths in four districts of India.BMC Medicine. In press.

27.

World Health Organization. International statistical classifications of diseases and related health problems. 10th rev. vol. 1. Geneva: World Health Organization; 2008.

28.

Kahn K, Collinson M, Gómez-Olivé F, Mokoena O, Twine R, Mee P, et al. Profile: Agincourt health and socio-demographic surveillance system. Int J Epidemiol. 2012; 41(4):988–1001.CrossRef

29.

Kelly R. PyEnchant. 2015. http://pythonhosted.org/pyenchant/. Accessed 1 Sept 2017.

30.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12(Oct):2825–30.

31.

Chollet FK. GitHub. 2015. https://github.com/fchollet/keras. Accessed 1 Sept 2017.

32.

Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints. 2016:abs/1605.02688. http://arxiv.org/abs/1605.02688.

33.

Bird S, Klein E, Loper E. Natural Language Processing with Python: O’Reilly Media; 2009, pp. 1–504.

34.

Bergstra J, Yamins D, Cox DD. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013): 2013. p. 115–23.

Title: Automatically determining cause of death from verbal autopsy narratives
Authors: Serena Jeblee
Mireille Gomes
Prabhat Jha
Frank Rudzicz
Graeme Hirst
Publication date: 01-12-2019
Publisher: BioMed Central
Keyword: Autopsy
Published in: BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-019-0841-9

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Automatically determining cause of death from verbal autopsy narratives

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2019

Correction to: ThalPred: a web-based prediction tool for discriminating thalassemia trait and iron deficiency anemia

Development process of a mobile electronic medical record for nurses: a single case study

Risk factor analysis of device-related infections: value of re-sampling method on the real-world imbalanced dataset

Assessing factors militating against the acceptance and successful implementation of a cloud based health center from the healthcare professionals’ perspective: a survey of hospitals in Benue state, northcentral Nigeria

Effectiveness of a decision aid for promoting colorectal cancer screening in Spain: a randomized trial

The state of research on cyberattacks against hospitals and available best practice recommendations: a scoping review