Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Autopsy | Research article

Automatically determining cause of death from verbal autopsy narratives

Authors: Serena Jeblee, Mireille Gomes, Prabhat Jha, Frank Rudzicz, Graeme Hirst

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person’s death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone.

Methods

After preprocessing and spelling correction, our method extracts word frequency counts from the narratives and uses them as input to four different machine learning classifiers: naïve Bayes, random forest, support vector machines, and a neural network.

Results

For individual CoD classification, our best classifier achieves a sensitivity of.770 for adult deaths for 15 CoD categories (as compared to the current best reported sensitivity of.57), and.662 with 48 WHO categories. When predicting the CoD distribution at the population level, our best classifier achieves.962 cause-specific mortality fraction accuracy for 15 categories and.908 for 48 categories, which is on par with leading CoD distribution estimation methods.

Conclusions

Our narrative-based machine learning classifier performs as well as classifiers based on structured data at the individual level. Moreover, our method demonstrates that VA narratives provide important information that can be used by a machine learning system for automated CoD classification. Unlike the structured questionnaire-based methods, this method can be applied to any verbal autopsy dataset, regardless of the collection process or country of origin.
Appendix
Available only for authorised users
Footnotes
1
Danso et al. [18] also lowercased the text in their dataset but removed punctuation and did not remove stopwords or perform spelling correction.
 
2
We use the implementation of the Porter Stemmer provided in NLTK [33].
 
3
We use scikit-learn’s SelectKBest module with the f_classif function [30].
 
4
For optimization we use the hyperopt Python library[34].
 
Literature
1.
go back to reference Department of Economic and Social Affairs PopulationDivision United Nations. World Population Prospects: The 2012 revision.United Nations, Department of Economic and Social Affairs, Population Division; 2013. Department of Economic and Social Affairs PopulationDivision United Nations. World Population Prospects: The 2012 revision.United Nations, Department of Economic and Social Affairs, Population Division; 2013.
2.
go back to reference Jha P. Reliable direct measurement of causes of death in low- and middle-income countries. BMC Med. 2014; 12:19.CrossRef Jha P. Reliable direct measurement of causes of death in low- and middle-income countries. BMC Med. 2014; 12:19.CrossRef
3.
go back to reference Aleksandrowicz L, Malhotra V, Dikshit R, Prakash C Gupta RK, Sheth J, Rathi SK, et al. Performance criteria for verbal autopsy-based systems to estimate national causes of death: Development and application to the Indian Million Death Study. BMC Med. 2014; 12:21.CrossRef Aleksandrowicz L, Malhotra V, Dikshit R, Prakash C Gupta RK, Sheth J, Rathi SK, et al. Performance criteria for verbal autopsy-based systems to estimate national causes of death: Development and application to the Indian Million Death Study. BMC Med. 2014; 12:21.CrossRef
4.
go back to reference Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJ. Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metrics. 2011; 9(32):1–13. Lozano R, Lopez AD, Atkinson C, Naghavi M, Flaxman AD, Murray CJ. Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metrics. 2011; 9(32):1–13.
5.
go back to reference Ram U, Dikshit R, Jha P. Level of evidence of verbal autopsy–Authors’ reply. Lancet Glob Health. 2016; 4(6):e368—e9.CrossRef Ram U, Dikshit R, Jha P. Level of evidence of verbal autopsy–Authors’ reply. Lancet Glob Health. 2016; 4(6):e368—e9.CrossRef
6.
go back to reference Berkley JA, Lowe BS, Mwangi I, Williams T, Bauni E, Mwarumba S, et al. Bacteremia among children admitted to a rural hospital in Kenya. New Engl J Med. 2005; 352(1):39–47.CrossRef Berkley JA, Lowe BS, Mwangi I, Williams T, Bauni E, Mwarumba S, et al. Bacteremia among children admitted to a rural hospital in Kenya. New Engl J Med. 2005; 352(1):39–47.CrossRef
7.
go back to reference Desai N, Aleksandrowicz L, Miasnikof P, Lu Y, Leitao J, Byass P, et al. Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries. BMC Med. 2014; 12:20.CrossRef Desai N, Aleksandrowicz L, Miasnikof P, Lu Y, Leitao J, Byass P, et al. Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries. BMC Med. 2014; 12:20.CrossRef
8.
go back to reference King C, Zamawe C, Banda M, Bar-Zeev N, Beard J, Bird J, et al. The quality and diagnostic value of open narratives in verbal autopsy: A mixed-methods analysis of partnered interviews from Malawi. BMC Med Res Methodol. 2016; 16:13.CrossRef King C, Zamawe C, Banda M, Bar-Zeev N, Beard J, Bird J, et al. The quality and diagnostic value of open narratives in verbal autopsy: A mixed-methods analysis of partnered interviews from Malawi. BMC Med Res Methodol. 2016; 16:13.CrossRef
9.
go back to reference Gajalakshmi V, Commentary PR. Verbal autopsy procedure for adult deaths. Int J Epidemiol. 2006; 35(3):748–50.CrossRef Gajalakshmi V, Commentary PR. Verbal autopsy procedure for adult deaths. Int J Epidemiol. 2006; 35(3):748–50.CrossRef
10.
go back to reference Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metrics. 2011; 9:28. Erratum [11].CrossRef Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metrics. 2011; 9:28. Erratum [11].CrossRef
11.
go back to reference Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Erratum To: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metrics. 2014; 12:7.CrossRef Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Erratum To: Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metrics. 2014; 12:7.CrossRef
12.
go back to reference Flaxman AD, Serina PT, Hernandez B, Murray CJ, Riley I, Lopez AD. Measuring causes of death in populations: a new metric that corrects cause-specific mortality fractions for chance. Popul Health Metrics. 2015; 13:28.CrossRef Flaxman AD, Serina PT, Hernandez B, Murray CJ, Riley I, Lopez AD. Measuring causes of death in populations: a new metric that corrects cause-specific mortality fractions for chance. Popul Health Metrics. 2015; 13:28.CrossRef
13.
go back to reference Boulle A, Chandramohan D, Weller P. A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol. 2001; 30(3):515–20.CrossRef Boulle A, Chandramohan D, Weller P. A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol. 2001; 30(3):515–20.CrossRef
14.
go back to reference Byass P, Chandramohan D, Clark S, D’Ambruoso L, Fottrell E, Graham W, et al. Strengthening standardised interpretation of verbal autopsy data: The new InterVA-4 tool. Glob Health Action. 2012; 5:19281.CrossRef Byass P, Chandramohan D, Clark S, D’Ambruoso L, Fottrell E, Graham W, et al. Strengthening standardised interpretation of verbal autopsy data: The new InterVA-4 tool. Glob Health Action. 2012; 5:19281.CrossRef
15.
go back to reference McCormick TH, Li ZR, Calvert C, Crampin AC, Kahn K, Clark S. Probabilistic cause-of-death assignment using verbal autopsies. J Am Stat Assoc. 2016; 111(15):1036–49.CrossRef McCormick TH, Li ZR, Calvert C, Crampin AC, Kahn K, Clark S. Probabilistic cause-of-death assignment using verbal autopsies. J Am Stat Assoc. 2016; 111(15):1036–49.CrossRef
16.
go back to reference James SL, Flaxman AD, Murray CJ. Performance of the Tariff Method: Validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metrics. 2011; 9(1):31–47.CrossRef James SL, Flaxman AD, Murray CJ. Performance of the Tariff Method: Validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metrics. 2011; 9(1):31–47.CrossRef
17.
go back to reference Miasnikof P, Giannakeas V, Gomes M, Aleksandrowicz L, Shestopaloff AY, Alam D, et al. Naïve Bayes classifiers for verbal autopsies: Comparison to physician-based classification for 21,000 child and adult deaths. BMC Med. 2015; 13(1):286–94.CrossRef Miasnikof P, Giannakeas V, Gomes M, Aleksandrowicz L, Shestopaloff AY, Alam D, et al. Naïve Bayes classifiers for verbal autopsies: Comparison to physician-based classification for 21,000 child and adult deaths. BMC Med. 2015; 13(1):286–94.CrossRef
18.
go back to reference Danso S, Atwell E, Johnson O. A comparative study of machine learning methods for verbal autopsy text classification. Int J Comput Sci Issues. 2013; 10(6):1–10. Danso S, Atwell E, Johnson O. A comparative study of machine learning methods for verbal autopsy text classification. Int J Comput Sci Issues. 2013; 10(6):1–10.
19.
go back to reference Danso S, Atwell E, Johnson O. Linguistic and Statistically Derived Features for Cause of Death Prediction from Verbal Autopsy Text. Linguistic Processing and Knowledge in the Web. Springer: 2013. p. 47–60. Danso S, Atwell E, Johnson O. Linguistic and Statistically Derived Features for Cause of Death Prediction from Verbal Autopsy Text. Linguistic Processing and Knowledge in the Web. Springer: 2013. p. 47–60.
20.
go back to reference Nichols EK, Byass P, Chandramohan D, Clark SJ, Flaxman AD, Jakob R, et al. The WHO 2016 verbal autopsy instrument: An international standard suitable for automated analysis by InterVA, InSilicoVA, and Tariff 2.0. PLOS Med. 2018; 01;15(1):1–9. Nichols EK, Byass P, Chandramohan D, Clark SJ, Flaxman AD, Jakob R, et al. The WHO 2016 verbal autopsy instrument: An international standard suitable for automated analysis by InterVA, InSilicoVA, and Tariff 2.0. PLOS Med. 2018; 01;15(1):1–9.
21.
go back to reference King G, Lu Y. Verbal autopsy methods with multiple causes of death. Stat Sci. 2008; 23(1):78–91.CrossRef King G, Lu Y. Verbal autopsy methods with multiple causes of death. Stat Sci. 2008; 23(1):78–91.CrossRef
22.
go back to reference World Health Organization. The 2012 WHO Verbal Autopsy Instrument. Geneva: World Health Organization; 2012. World Health Organization. The 2012 WHO Verbal Autopsy Instrument. Geneva: World Health Organization; 2012.
23.
go back to reference Serina P, Riley I, Stewart A, James SL, Flaxman AD, Lozano R, et al. Improving performance of the Tariff Method for assigning causes of death to verbal autopsies. BMC Med. 2015; 13(1):291.CrossRef Serina P, Riley I, Stewart A, James SL, Flaxman AD, Lozano R, et al. Improving performance of the Tariff Method for assigning causes of death to verbal autopsies. BMC Med. 2015; 13(1):291.CrossRef
24.
go back to reference Population Health Metrics Research Consortium (PHMRC). Population Health Metrics Research Consortium Gold Standard Verbal Autopsy Data 2005-2011. 2013. http://ghdx.healthdata.org/record/population-health-metrics-research-consortium-gold-standard-verbal-autopsy-data-2005-2011. Accessed 1 Nov 2018. Population Health Metrics Research Consortium (PHMRC). Population Health Metrics Research Consortium Gold Standard Verbal Autopsy Data 2005-2011. 2013. http://​ghdx.​healthdata.​org/​record/​population-health-metrics-research-consortium-gold-standard-verbal-autopsy-data-2005-2011. Accessed 1 Nov 2018.
25.
go back to reference Gomes M, Begum R, Sati P, Dikshit R, Gupta PC, Kumar R, et al. Nationwide Mortality Studies to Quantify Causes of Death: Relevant Lessons from India’s Million Death Study. Health Aff. 2017; 36(11):1887–95.CrossRef Gomes M, Begum R, Sati P, Dikshit R, Gupta PC, Kumar R, et al. Nationwide Mortality Studies to Quantify Causes of Death: Relevant Lessons from India’s Million Death Study. Health Aff. 2017; 36(11):1887–95.CrossRef
26.
go back to reference Gomes M, Kumar D, Budukh A, et al. Computer versus Physician Coding of Cause of Deaths using Verbal Autopsies: a randomised trial of 9374 deaths in four districts of India.BMC Medicine. In press. Gomes M, Kumar D, Budukh A, et al. Computer versus Physician Coding of Cause of Deaths using Verbal Autopsies: a randomised trial of 9374 deaths in four districts of India.BMC Medicine. In press.
27.
go back to reference World Health Organization. International statistical classifications of diseases and related health problems. 10th rev. vol. 1. Geneva: World Health Organization; 2008. World Health Organization. International statistical classifications of diseases and related health problems. 10th rev. vol. 1. Geneva: World Health Organization; 2008.
28.
go back to reference Kahn K, Collinson M, Gómez-Olivé F, Mokoena O, Twine R, Mee P, et al. Profile: Agincourt health and socio-demographic surveillance system. Int J Epidemiol. 2012; 41(4):988–1001.CrossRef Kahn K, Collinson M, Gómez-Olivé F, Mokoena O, Twine R, Mee P, et al. Profile: Agincourt health and socio-demographic surveillance system. Int J Epidemiol. 2012; 41(4):988–1001.CrossRef
30.
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12(Oct):2825–30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12(Oct):2825–30.
33.
go back to reference Bird S, Klein E, Loper E. Natural Language Processing with Python: O’Reilly Media; 2009, pp. 1–504. Bird S, Klein E, Loper E. Natural Language Processing with Python: O’Reilly Media; 2009, pp. 1–504.
34.
go back to reference Bergstra J, Yamins D, Cox DD. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013): 2013. p. 115–23. Bergstra J, Yamins D, Cox DD. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013): 2013. p. 115–23.
Metadata
Title
Automatically determining cause of death from verbal autopsy narratives
Authors
Serena Jeblee
Mireille Gomes
Prabhat Jha
Frank Rudzicz
Graeme Hirst
Publication date
01-12-2019
Publisher
BioMed Central
Keyword
Autopsy
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0841-9

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue