Top

Published in:

Open Access 01-12-2015 | Research article

Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Authors: Pierre Miasnikof, Vasily Giannakeas, Mireille Gomes, Lukasz Aleksandrowicz, Alexander Y. Shestopaloff, Dewan Alam, Stephen Tollman, Akram Samarikhalaj, Prabhat Jha

Published in: BMC Medicine | Issue 1/2015

Abstract

Background

Verbal autopsies (VA) are increasingly used in low- and middle-income countries where most causes of death (COD) occur at home without medical attention, and home deaths differ substantially from hospital deaths. Hence, there is no plausible “standard” against which VAs for home deaths may be validated. Previous studies have shown contradictory performance of automated methods compared to physician-based classification of CODs. We sought to compare the performance of the classic naive Bayes classifier (NBC) versus existing automated classifiers, using physician-based classification as the reference.

Methods

We compared the performance of NBC, an open-source Tariff Method (OTM), and InterVA-4 on three datasets covering about 21,000 child and adult deaths: the ongoing Million Death Study in India, and health and demographic surveillance sites in Agincourt, South Africa and Matlab, Bangladesh. We applied several training and testing splits of the data to quantify the sensitivity and specificity compared to physician coding for individual CODs and to test the cause-specific mortality fractions at the population level.

Results

The NBC achieved comparable sensitivity (median 0.51, range 0.48-0.58) to OTM (median 0.50, range 0.41-0.51), with InterVA-4 having lower sensitivity (median 0.43, range 0.36-0.47) in all three datasets, across all CODs. Consistency of CODs was comparable for NBC and InterVA-4 but lower for OTM. NBC and OTM achieved better performance when using a local rather than a non-local training dataset. At the population level, NBC scored the highest cause-specific mortality fraction accuracy across the datasets (median 0.88, range 0.87-0.93), followed by InterVA-4 (median 0.66, range 0.62-0.73) and OTM (median 0.57, range 0.42-0.58).

Conclusions

NBC outperforms current similar COD classifiers at the population level. Nevertheless, no current automated classifier adequately replicates physician classification for individual CODs. There is a need for further research on automated classifiers using local training and test data in diverse settings prior to recommending any replacement of physician-based classification of verbal autopsies.

Available only for authorised users

Jha P. Counting the dead is one of the world’s best investments to reduce premature mortality. Hypothesis. 2012;10(1):e3. doi:10.5779/hypothesis.v5710i5771.5254.CrossRef

Setel PW, Sankoh O, Rao C, Velkoff VA, Mathers C, Gonghuan Y, et al. Sample registration of vital events with verbal autopsy: a renewed commitment to measuring and monitoring vital statistics. Bull World Health Organ. 2005;83(8):611–7.PubMedPubMedCentral

Fottrell E, Byass P. Verbal autopsy: methods in transition. Epidemiol Rev. 2010;32(1):38–55.CrossRefPubMed

Aleksandrowicz L, Malhotra V, Dikshit R, Gupta PC, Kumar R, Sheth J, et al. Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian Million Death Study. BMC Med. 2014;12:21.CrossRefPubMedPubMedCentral

Murray CJ, Lozano R, Flaxman AD, Serina P, Phillips D, Stewart A, et al. Using verbal autopsy to measure causes of death: the comparative performance of existing methods. BMC Med. 2014;12:5.CrossRefPubMedPubMedCentral

Chandramohan D. Validation and validity of verbal autopsy procedures. Popul Health Metr. 2011;9:22.CrossRefPubMedPubMedCentral

James SL, Flaxman AD, Murray CJ. Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies. Popul Health Metr. 2011;9:31.CrossRefPubMedPubMedCentral

Desai N, Aleksandrowicz L, Miasnikof P, Byass P, Tollman S, Alam D, et al. Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries. BMC Med. 2014;12:20.CrossRefPubMedPubMedCentral

Garenne M. Prospects for automated diagnosis of verbal autopsies. BMC Med. 2014;12:18.CrossRefPubMedPubMedCentral

10.

Rish I. An empirical study of the naive Bayes classifier. Technical Report RC22230, IBM T.J. Watson Research Center; 2001.

11.

Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. New York, USA: Springer; 2009.

12.

Bayes T, Price R. An essay towards solving a problem in the doctrine of chances: by the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M.A. and F.R.S. Philos Trans R Soc Lond. 1763;53:370–418.

13.

Danso S, Atwell E, Johnson O. A comparative study of machine learning methods for verbal autopsy text classification. Int J Comput Sci Issues. 2013;10(2):47–60.

14.

Hailemariam T. Application of data mining for predicting adult mortality. Master’s thesis. Addis Ababa, Ethiopia: Addis Ababa University; 2012.

15.

Tekabe B. Predicting the pattern of under-five mortality in Ethiopia using data mining technology: the case of Butajira Rural Health Program. Master’s thesis. Addis Ababa, Ethiopia: Addis Ababa University, 2012.

16.

Byass P, Chandramohan D, Clark SJ, D'Ambruoso L, Fottrell E, Graham WJ, et al. Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool. Glob Health Action. 2012;5:1–8.PubMed

17.

Jha P, Gajalakshmi V, Gupta PC, Kumar R, Mony P, Dhingra N, et al. Prospective study of one million deaths in India: rationale, design, and validation results. PLoS Med. 2006;3(2):e18.CrossRefPubMed

18.

Kahn K, Collinson MA, Gomez-Olive FX, Mokoena O, Twine R, Mee P, et al. Profile: Agincourt health and socio-demographic surveillance system. Int J Epidemiol. 2012;41(4):988–1001.CrossRefPubMedPubMedCentral

19.

Health and Demographic Surveillance System. Matlab. Volume 36: Registration of Health and Demographic Events 2003. Scientific Report No. 92. Dhaka: ICDDR,B; 2005.

20.

Byass P. Usefulness of the population health metrics research consortium gold standard verbal autopsy data for general verbal autopsy methods. BMC Med. 2014;12:23.CrossRefPubMedPubMedCentral

21.

Alam DS, Jha P, Ramasundarahettige C, Streatfield PK, Niessen LW, Chowdhury MA, et al. Smoking-attributable mortality in Bangladesh: proportional mortality study. Bull World Health Organ. 2013;91(10):757–64.CrossRefPubMedPubMedCentral

22.

Murray CJ, Lozano R, Flaxman AD, Vahdatpour A, Lopez AD. Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies. Popul Health Metr. 2011;9:28.CrossRefPubMedPubMedCentral

23.

R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014.

24.

Flaxman AD, Vahdatpour A, James SL, Birnbaum JK, Murray CJ. Direct estimation of cause-specific mortality fractions from verbal autopsies: multisite validation study using clinical diagnostic gold standards. Popul Health Metr. 2011;9:35.CrossRefPubMedPubMedCentral

25.

King G, Lu Y. Verbal autopsy methods with multiple causes of death. Stat Sci. 2008;23:78–91.CrossRef

26.

Byass P, Herbst K, Fottrell E, Ali MM, Odhiambo F, Amek N, et al. Comparing verbal autopsy cause of death findings as determined by physician coding and probabilistic modelling: a public health analysis of 54 000 deaths in Africa and Asia. J Glob Health. 2015;5(1):010402.PubMedPubMedCentral

27.

McCormick T, Li Z, Calvert C, Crampin A, Kahn K, Clark S. Probabilstic cause-of-death assignment using verbal autopsies. Available: http://arxiv.org/pdf/1411.3042v2.pdf. In press.

28.

AbouZahr C, de Savigny D, Mikkelsen L, Setel PW, Lozano R, Lopez AD. Towards universal civil registration and vital statistics systems: the time is now. Lancet. 2015;386(1000):1407–18.CrossRefPubMed

29.

Hill K, Lopez AD, Shibuya K, Jha P. Interim measures for meeting needs for health sector data: births, deaths, and causes of death. Lancet. 2007;370(9600):1726–35.CrossRefPubMed

30.

Bloomberg Philanthropies. Data for Health. http://www.bloomberg.org/program/public-health/data-health/. Accessed on 17/11/2015.

31.

Jha P. Reliable direct measurement of causes of death in low and middle-income countries. BMC Med. 2013;12:19.CrossRef

32.

Byass P, de Savigny D, Lopez AD. Essential evidence for guiding health system priorities and policies: anticipating epidemiological transition in Africa. Glob Health Action. 2014;7:23359.PubMed

33.

Flaxman AD, Serina P, Stewart A, James SL, Vahdatpour A, Hernandez B, et al. Ensemble modelling in verbal autopsy: the popular voting method. Lancet. 2013: 381 Suppl 2:S48.

Title: Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths
Authors: Pierre Miasnikof
Vasily Giannakeas
Mireille Gomes
Lukasz Aleksandrowicz
Alexander Y. Shestopaloff
Dewan Alam
Stephen Tollman
Akram Samarikhalaj
Prabhat Jha
Publication date: 01-12-2015
Publisher: BioMed Central
Published in: BMC Medicine / Issue 1/2015
Electronic ISSN: 1741-7015
DOI: https://doi.org/10.1186/s12916-015-0521-2

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2015

Thyroid function and age-related macular degeneration: a prospective population-based cohort study - the Rotterdam Study

Seeking effective interventions to treat complex wounds: an overview of systematic reviews

Current tobacco use is associated with higher rates of implant revision and deep infection after total hip or knee arthroplasty: a prospective cohort study

Ready-to-use therapeutic food with elevated n-3 polyunsaturated fatty acid content, with or without fish oil, to treat severe acute malnutrition: a randomized controlled trial

Impact of statin therapy on mortality in patients with sepsis-associated acute respiratory distress syndrome (ARDS) depends on ARDS severity: a prospective observational cohort study

Immunity and clinical efficacy of an inactivated enterovirus 71 vaccine in healthy Chinese children: a report of further observations