Skip to main content
Top
Published in: BMC Medicine 1/2015

Open Access 01-12-2015 | Research article

Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Authors: Pierre Miasnikof, Vasily Giannakeas, Mireille Gomes, Lukasz Aleksandrowicz, Alexander Y. Shestopaloff, Dewan Alam, Stephen Tollman, Akram Samarikhalaj, Prabhat Jha

Published in: BMC Medicine | Issue 1/2015

Login to get access

Abstract

Background

Verbal autopsies (VA) are increasingly used in low- and middle-income countries where most causes of death (COD) occur at home without medical attention, and home deaths differ substantially from hospital deaths. Hence, there is no plausible “standard” against which VAs for home deaths may be validated. Previous studies have shown contradictory performance of automated methods compared to physician-based classification of CODs. We sought to compare the performance of the classic naive Bayes classifier (NBC) versus existing automated classifiers, using physician-based classification as the reference.

Methods

We compared the performance of NBC, an open-source Tariff Method (OTM), and InterVA-4 on three datasets covering about 21,000 child and adult deaths: the ongoing Million Death Study in India, and health and demographic surveillance sites in Agincourt, South Africa and Matlab, Bangladesh. We applied several training and testing splits of the data to quantify the sensitivity and specificity compared to physician coding for individual CODs and to test the cause-specific mortality fractions at the population level.

Results

The NBC achieved comparable sensitivity (median 0.51, range 0.48-0.58) to OTM (median 0.50, range 0.41-0.51), with InterVA-4 having lower sensitivity (median 0.43, range 0.36-0.47) in all three datasets, across all CODs. Consistency of CODs was comparable for NBC and InterVA-4 but lower for OTM. NBC and OTM achieved better performance when using a local rather than a non-local training dataset. At the population level, NBC scored the highest cause-specific mortality fraction accuracy across the datasets (median 0.88, range 0.87-0.93), followed by InterVA-4 (median 0.66, range 0.62-0.73) and OTM (median 0.57, range 0.42-0.58).

Conclusions

NBC outperforms current similar COD classifiers at the population level. Nevertheless, no current automated classifier adequately replicates physician classification for individual CODs. There is a need for further research on automated classifiers using local training and test data in diverse settings prior to recommending any replacement of physician-based classification of verbal autopsies.
Appendix
Available only for authorised users
Literature
2.
go back to reference Setel PW, Sankoh O, Rao C, Velkoff VA, Mathers C, Gonghuan Y, et al. Sample registration of vital events with verbal autopsy: a renewed commitment to measuring and monitoring vital statistics. Bull World Health Organ. 2005;83(8):611–7.PubMedPubMedCentral Setel PW, Sankoh O, Rao C, Velkoff VA, Mathers C, Gonghuan Y, et al. Sample registration of vital events with verbal autopsy: a renewed commitment to measuring and monitoring vital statistics. Bull World Health Organ. 2005;83(8):611–7.PubMedPubMedCentral
3.
4.
go back to reference Aleksandrowicz L, Malhotra V, Dikshit R, Gupta PC, Kumar R, Sheth J, et al. Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian Million Death Study. BMC Med. 2014;12:21.CrossRefPubMedPubMedCentral Aleksandrowicz L, Malhotra V, Dikshit R, Gupta PC, Kumar R, Sheth J, et al. Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian Million Death Study. BMC Med. 2014;12:21.CrossRefPubMedPubMedCentral
5.
go back to reference Murray CJ, Lozano R, Flaxman AD, Serina P, Phillips D, Stewart A, et al. Using verbal autopsy to measure causes of death: the comparative performance of existing methods. BMC Med. 2014;12:5.CrossRefPubMedPubMedCentral Murray CJ, Lozano R, Flaxman AD, Serina P, Phillips D, Stewart A, et al. Using verbal autopsy to measure causes of death: the comparative performance of existing methods. BMC Med. 2014;12:5.CrossRefPubMedPubMedCentral
7.
go back to reference James SL, Flaxman AD, Murray CJ. Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr. 2011;9:31.CrossRefPubMedPubMedCentral James SL, Flaxman AD, Murray CJ. Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr. 2011;9:31.CrossRefPubMedPubMedCentral
8.
go back to reference Desai N, Aleksandrowicz L, Miasnikof P, Byass P, Tollman S, Alam D, et al. Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries. BMC Med. 2014;12:20.CrossRefPubMedPubMedCentral Desai N, Aleksandrowicz L, Miasnikof P, Byass P, Tollman S, Alam D, et al. Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries. BMC Med. 2014;12:20.CrossRefPubMedPubMedCentral
10.
go back to reference Rish I. An empirical study of the naive Bayes classifier. Technical Report RC22230, IBM T.J. Watson Research Center; 2001. Rish I. An empirical study of the naive Bayes classifier. Technical Report RC22230, IBM T.J. Watson Research Center; 2001.
11.
go back to reference Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. New York, USA: Springer; 2009. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. New York, USA: Springer; 2009.
12.
go back to reference Bayes T, Price R. An essay towards solving a problem in the doctrine of chances: by the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M.A. and F.R.S. Philos Trans R Soc Lond. 1763;53:370–418. Bayes T, Price R. An essay towards solving a problem in the doctrine of chances: by the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M.A. and F.R.S. Philos Trans R Soc Lond. 1763;53:370–418.
13.
go back to reference Danso S, Atwell E, Johnson O. A comparative study of machine learning methods for verbal autopsy text classification. Int J Comput Sci Issues. 2013;10(2):47–60. Danso S, Atwell E, Johnson O. A comparative study of machine learning methods for verbal autopsy text classification. Int J Comput Sci Issues. 2013;10(2):47–60.
14.
go back to reference Hailemariam T. Application of data mining for predicting adult mortality. Master’s thesis. Addis Ababa, Ethiopia: Addis Ababa University; 2012. Hailemariam T. Application of data mining for predicting adult mortality. Master’s thesis. Addis Ababa, Ethiopia: Addis Ababa University; 2012.
15.
go back to reference Tekabe B. Predicting the pattern of under-five mortality in Ethiopia using data mining technology: the case of Butajira Rural Health Program. Master’s thesis. Addis Ababa, Ethiopia: Addis Ababa University, 2012. Tekabe B. Predicting the pattern of under-five mortality in Ethiopia using data mining technology: the case of Butajira Rural Health Program. Master’s thesis. Addis Ababa, Ethiopia: Addis Ababa University, 2012.
16.
go back to reference Byass P, Chandramohan D, Clark SJ, D'Ambruoso L, Fottrell E, Graham WJ, et al. Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool. Glob Health Action. 2012;5:1–8.PubMed Byass P, Chandramohan D, Clark SJ, D'Ambruoso L, Fottrell E, Graham WJ, et al. Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool. Glob Health Action. 2012;5:1–8.PubMed
17.
go back to reference Jha P, Gajalakshmi V, Gupta PC, Kumar R, Mony P, Dhingra N, et al. Prospective study of one million deaths in India: rationale, design, and validation results. PLoS Med. 2006;3(2):e18.CrossRefPubMed Jha P, Gajalakshmi V, Gupta PC, Kumar R, Mony P, Dhingra N, et al. Prospective study of one million deaths in India: rationale, design, and validation results. PLoS Med. 2006;3(2):e18.CrossRefPubMed
18.
go back to reference Kahn K, Collinson MA, Gomez-Olive FX, Mokoena O, Twine R, Mee P, et al. Profile: Agincourt health and socio-demographic surveillance system. Int J Epidemiol. 2012;41(4):988–1001.CrossRefPubMedPubMedCentral Kahn K, Collinson MA, Gomez-Olive FX, Mokoena O, Twine R, Mee P, et al. Profile: Agincourt health and socio-demographic surveillance system. Int J Epidemiol. 2012;41(4):988–1001.CrossRefPubMedPubMedCentral
19.
go back to reference Health and Demographic Surveillance System. Matlab. Volume 36: Registration of Health and Demographic Events 2003. Scientific Report No. 92. Dhaka: ICDDR,B; 2005. Health and Demographic Surveillance System. Matlab. Volume 36: Registration of Health and Demographic Events 2003. Scientific Report No. 92. Dhaka: ICDDR,B; 2005.
20.
go back to reference Byass P. Usefulness of the population health metrics research consortium gold standard verbal autopsy data for general verbal autopsy methods. BMC Med. 2014;12:23.CrossRefPubMedPubMedCentral Byass P. Usefulness of the population health metrics research consortium gold standard verbal autopsy data for general verbal autopsy methods. BMC Med. 2014;12:23.CrossRefPubMedPubMedCentral
21.
go back to reference Alam DS, Jha P, Ramasundarahettige C, Streatfield PK, Niessen LW, Chowdhury MA, et al. Smoking-attributable mortality in Bangladesh: proportional mortality study. Bull World Health Organ. 2013;91(10):757–64.CrossRefPubMedPubMedCentral Alam DS, Jha P, Ramasundarahettige C, Streatfield PK, Niessen LW, Chowdhury MA, et al. Smoking-attributable mortality in Bangladesh: proportional mortality study. Bull World Health Organ. 2013;91(10):757–64.CrossRefPubMedPubMedCentral
22.
go back to reference Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr. 2011;9:28.CrossRefPubMedPubMedCentral Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr. 2011;9:28.CrossRefPubMedPubMedCentral
23.
go back to reference R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014. R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014.
24.
go back to reference Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJ. Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr. 2011;9:35.CrossRefPubMedPubMedCentral Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJ. Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr. 2011;9:35.CrossRefPubMedPubMedCentral
25.
go back to reference King G, Lu Y. Verbal autopsy methods with multiple causes of death. Stat Sci. 2008;23:78–91.CrossRef King G, Lu Y. Verbal autopsy methods with multiple causes of death. Stat Sci. 2008;23:78–91.CrossRef
26.
go back to reference Byass P, Herbst K, Fottrell E, Ali MM, Odhiambo F, Amek N, et al. Comparing verbal autopsy cause of death findings as determined by physician coding and probabilistic modelling: a public health analysis of 54 000 deaths in Africa and Asia. J Glob Health. 2015;5(1):010402.PubMedPubMedCentral Byass P, Herbst K, Fottrell E, Ali MM, Odhiambo F, Amek N, et al. Comparing verbal autopsy cause of death findings as determined by physician coding and probabilistic modelling: a public health analysis of 54 000 deaths in Africa and Asia. J Glob Health. 2015;5(1):010402.PubMedPubMedCentral
28.
go back to reference AbouZahr C, de Savigny D, Mikkelsen L, Setel PW, Lozano R, Lopez AD. Towards universal civil registration and vital statistics systems: the time is now. Lancet. 2015;386(1000):1407–18.CrossRefPubMed AbouZahr C, de Savigny D, Mikkelsen L, Setel PW, Lozano R, Lopez AD. Towards universal civil registration and vital statistics systems: the time is now. Lancet. 2015;386(1000):1407–18.CrossRefPubMed
29.
go back to reference Hill K, Lopez AD, Shibuya K, Jha P. Interim measures for meeting needs for health sector data: births, deaths, and causes of death. Lancet. 2007;370(9600):1726–35.CrossRefPubMed Hill K, Lopez AD, Shibuya K, Jha P. Interim measures for meeting needs for health sector data: births, deaths, and causes of death. Lancet. 2007;370(9600):1726–35.CrossRefPubMed
31.
go back to reference Jha P. Reliable direct measurement of causes of death in low and middle-income countries. BMC Med. 2013;12:19.CrossRef Jha P. Reliable direct measurement of causes of death in low and middle-income countries. BMC Med. 2013;12:19.CrossRef
32.
go back to reference Byass P, de Savigny D, Lopez AD. Essential evidence for guiding health system priorities and policies: anticipating epidemiological transition in Africa. Glob Health Action. 2014;7:23359.PubMed Byass P, de Savigny D, Lopez AD. Essential evidence for guiding health system priorities and policies: anticipating epidemiological transition in Africa. Glob Health Action. 2014;7:23359.PubMed
33.
go back to reference Flaxman AD, Serina P, Stewart A, James SL, Vahdatpour A, Hernandez B, et al. Ensemble modelling in verbal autopsy: the popular voting method. Lancet. 2013: 381 Suppl 2:S48. Flaxman AD, Serina P, Stewart A, James SL, Vahdatpour A, Hernandez B, et al. Ensemble modelling in verbal autopsy: the popular voting method. Lancet. 2013: 381 Suppl 2:S48.
Metadata
Title
Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths
Authors
Pierre Miasnikof
Vasily Giannakeas
Mireille Gomes
Lukasz Aleksandrowicz
Alexander Y. Shestopaloff
Dewan Alam
Stephen Tollman
Akram Samarikhalaj
Prabhat Jha
Publication date
01-12-2015
Publisher
BioMed Central
Published in
BMC Medicine / Issue 1/2015
Electronic ISSN: 1741-7015
DOI
https://doi.org/10.1186/s12916-015-0521-2

Other articles of this Issue 1/2015

BMC Medicine 1/2015 Go to the issue