Skip to main content
Top
Published in: BMC Infectious Diseases 1/2014

Open Access 01-12-2014 | Research article

A Dirichlet process model for classifying and forecasting epidemic curves

Authors: Elaine O Nsoesie, Scotland C Leman, Madhav V Marathe

Published in: BMC Infectious Diseases | Issue 1/2014

Login to get access

Abstract

Background

A forecast can be defined as an endeavor to quantitatively estimate a future event or probabilities assigned to a future occurrence. Forecasting stochastic processes such as epidemics is challenging since there are several biological, behavioral, and environmental factors that influence the number of cases observed at each point during an epidemic. However, accurate forecasts of epidemics would impact timely and effective implementation of public health interventions. In this study, we introduce a Dirichlet process (DP) model for classifying and forecasting influenza epidemic curves.

Methods

The DP model is a nonparametric Bayesian approach that enables the matching of current influenza activity to simulated and historical patterns, identifies epidemic curves different from those observed in the past and enables prediction of the expected epidemic peak time. The method was validated using simulated influenza epidemics from an individual-based model and the accuracy was compared to that of the tree-based classification technique, Random Forest (RF), which has been shown to achieve high accuracy in the early prediction of epidemic curves using a classification approach. We also applied the method to forecasting influenza outbreaks in the United States from 1997–2013 using influenza-like illness (ILI) data from the Centers for Disease Control and Prevention (CDC).

Results

We made the following observations. First, the DP model performed as well as RF in identifying several of the simulated epidemics. Second, the DP model correctly forecasted the peak time several days in advance for most of the simulated epidemics. Third, the accuracy of identifying epidemics different from those already observed improved with additional data, as expected. Fourth, both methods correctly classified epidemics with higher reproduction numbers (R) with a higher accuracy compared to epidemics with lower R values. Lastly, in the classification of seasonal influenza epidemics based on ILI data from the CDC, the methods’ performance was comparable.

Conclusions

Although RF requires less computational time compared to the DP model, the algorithm is fully supervised implying that epidemic curves different from those previously observed will always be misclassified. In contrast, the DP model can be unsupervised, semi-supervised or fully supervised. Since both methods have their relative merits, an approach that uses both RF and the DP model could be beneficial.
Appendix
Available only for authorised users
Literature
4.
go back to reference Salathé M, Kazandjieva M, Lee JW, Levis P, Feldman MW, Jones JH: A high-resolution human contact network for infectious disease transmission. Proc Nat Acad Sci. 2010, 107: 22020-22025. 10.1073/pnas.1009094108. doi:10.1073/pnas.1009094108CrossRefPubMedPubMedCentral Salathé M, Kazandjieva M, Lee JW, Levis P, Feldman MW, Jones JH: A high-resolution human contact network for infectious disease transmission. Proc Nat Acad Sci. 2010, 107: 22020-22025. 10.1073/pnas.1009094108. doi:10.1073/pnas.1009094108CrossRefPubMedPubMedCentral
5.
go back to reference Nishiura H: Real-time forecasting of an epidemic using a discrete time stochastic model: a case study of pandemic influenza (H1N1-2009). BioMed Eng Online. 2011, 10: 15-10.1186/1475-925X-10-15. doi:10.1186/1475-925X-10-15CrossRefPubMedPubMedCentral Nishiura H: Real-time forecasting of an epidemic using a discrete time stochastic model: a case study of pandemic influenza (H1N1-2009). BioMed Eng Online. 2011, 10: 15-10.1186/1475-925X-10-15. doi:10.1186/1475-925X-10-15CrossRefPubMedPubMedCentral
6.
go back to reference Ohkusa Y, Sugawara T, Taniguchi K, Okabe N: Real-time estimation and prediction for pandemic A/H1N1(2009) in Japan. J Infect Chemo. 2011, 17 (4): 468-472. 10.1007/s10156-010-0200-3. doi:10.1007/s10156-010-0200-3CrossRef Ohkusa Y, Sugawara T, Taniguchi K, Okabe N: Real-time estimation and prediction for pandemic A/H1N1(2009) in Japan. J Infect Chemo. 2011, 17 (4): 468-472. 10.1007/s10156-010-0200-3. doi:10.1007/s10156-010-0200-3CrossRef
7.
go back to reference Ong JBS, Chen MIC, Cook AR, Lee HC, Lee VJ, Lin RTP, Tambyah PA, Goh LG: Real-time epidemic monitoring and forecasting of H1N1-2009 using influenza-like illness from general practice and family doctor clinics in Singapore. PLoS ONE. 2010, 5 (4): e10036-10.1371/journal.pone.0010036. DOI:10.1371/journal.pone.0010036CrossRefPubMedPubMedCentral Ong JBS, Chen MIC, Cook AR, Lee HC, Lee VJ, Lin RTP, Tambyah PA, Goh LG: Real-time epidemic monitoring and forecasting of H1N1-2009 using influenza-like illness from general practice and family doctor clinics in Singapore. PLoS ONE. 2010, 5 (4): e10036-10.1371/journal.pone.0010036. DOI:10.1371/journal.pone.0010036CrossRefPubMedPubMedCentral
9.
go back to reference Bisset K, Chen J, Feng X, Kumar VSA, Marathe M: EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems. Proceedings of the 23rd International Conference on Supercomputing, ICS ’09. 2009, 430-439.CrossRef Bisset K, Chen J, Feng X, Kumar VSA, Marathe M: EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems. Proceedings of the 23rd International Conference on Supercomputing, ICS ’09. 2009, 430-439.CrossRef
10.
go back to reference Deardon R, Brooks SP, Grenfell BT, Keeling MJ, Tildesley MJ, Savill N: Inference for individual level models of infectious diseases in large populations. Statistica Sinica. 2010, 20: 239-261.PubMedPubMedCentral Deardon R, Brooks SP, Grenfell BT, Keeling MJ, Tildesley MJ, Savill N: Inference for individual level models of infectious diseases in large populations. Statistica Sinica. 2010, 20: 239-261.PubMedPubMedCentral
12.
go back to reference Ferguson TS: A Bayesian analysis of some nonparametric problems. Ann Stat. 1973, 1 (2): 209-230. 10.1214/aos/1176342360.CrossRef Ferguson TS: A Bayesian analysis of some nonparametric problems. Ann Stat. 1973, 1 (2): 209-230. 10.1214/aos/1176342360.CrossRef
13.
go back to reference Ghosal S: The Dirichlet process, related priors and posterior asymptotics. Bayesian Nonparametrics. Edited by: et al Hjort. 2010, New York: Cambridge University Press, Ch.2, 35-79.CrossRef Ghosal S: The Dirichlet process, related priors and posterior asymptotics. Bayesian Nonparametrics. Edited by: et al Hjort. 2010, New York: Cambridge University Press, Ch.2, 35-79.CrossRef
14.
go back to reference Kim S, Smyth P: Hierarchical Dirichlet processes with random effects. Proceedings of Advances in Neural Information Processing Systems (NIPS). 2006, 697-704. Kim S, Smyth P: Hierarchical Dirichlet processes with random effects. Proceedings of Advances in Neural Information Processing Systems (NIPS). 2006, 697-704.
15.
go back to reference Teh YW, Jordan MI, Beal MJ, Blei DM: Hierarchical Dirichlet processes. J Am Stat Assoc. 2006, 101 (476): 1566-1581. 10.1198/016214506000000302.CrossRef Teh YW, Jordan MI, Beal MJ, Blei DM: Hierarchical Dirichlet processes. J Am Stat Assoc. 2006, 101 (476): 1566-1581. 10.1198/016214506000000302.CrossRef
17.
go back to reference Huelsenbeck J, Jain S, Frost S, Pond S: A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. PNAS. 2006, 103: 6263-6268. 10.1073/pnas.0508279103.CrossRefPubMedPubMedCentral Huelsenbeck J, Jain S, Frost S, Pond S: A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. PNAS. 2006, 103: 6263-6268. 10.1073/pnas.0508279103.CrossRefPubMedPubMedCentral
18.
go back to reference Goldstein E, Apolloni A, Lewis B, Miller J, Macauley M, Eubank S, Lipsitch M, Wallinga J: Distribution of vaccine/antivirals and the “least spread line” in a stratified population. J R Soc Interface. 2010, 7 (46): 755-764. 10.1098/rsif.2009.0393.CrossRefPubMed Goldstein E, Apolloni A, Lewis B, Miller J, Macauley M, Eubank S, Lipsitch M, Wallinga J: Distribution of vaccine/antivirals and the “least spread line” in a stratified population. J R Soc Interface. 2010, 7 (46): 755-764. 10.1098/rsif.2009.0393.CrossRefPubMed
19.
go back to reference Barrett CL, Beckman R, Berkbigler K, Bisset K, Bush K, Campbell K, Eubank S, Henson K, Hurford J, Kubicek D, Marathe M, Romero P, Smith J, Smith L, Speckman P, Stretz P, Thayer G, Van Eeckhout E, Williams M: TRANSIMS: Transportation analysis simulation system. Technical Report, LA-UR-00-1725, Los Alamos National Laboratory Unclassified Report. 2001,, 3. http://ndssl.vbi.vt.edu/transims.php Barrett CL, Beckman R, Berkbigler K, Bisset K, Bush K, Campbell K, Eubank S, Henson K, Hurford J, Kubicek D, Marathe M, Romero P, Smith J, Smith L, Speckman P, Stretz P, Thayer G, Van Eeckhout E, Williams M: TRANSIMS: Transportation analysis simulation system. Technical Report, LA-UR-00-1725, Los Alamos National Laboratory Unclassified Report. 2001,, 3. http://​ndssl.​vbi.​vt.​edu/​transims.​php
20.
go back to reference Beckman R, Baggerly K, Mckay M: Creating synthetic baseline populations. Trans Res Part A: Policy Practice. 1996, 30 (6): 415-429. 10.1016/0965-8564(96)00004-3. Beckman R, Baggerly K, Mckay M: Creating synthetic baseline populations. Trans Res Part A: Policy Practice. 1996, 30 (6): 415-429. 10.1016/0965-8564(96)00004-3.
21.
go back to reference Barrett C, Bisset K, Leidig J, Marathe A, Marathe M: Economic and social impact of influenza mitigation strategies by demographic class. Epidemics. 2011, 3: 19-31. 10.1016/j.epidem.2010.11.002.CrossRefPubMedPubMedCentral Barrett C, Bisset K, Leidig J, Marathe A, Marathe M: Economic and social impact of influenza mitigation strategies by demographic class. Epidemics. 2011, 3: 19-31. 10.1016/j.epidem.2010.11.002.CrossRefPubMedPubMedCentral
22.
go back to reference Halloran ME, Ferguson N, Eubank S, Longini I, Cummings D, Lewis B, Xu S, Fraser C, Vullikanti A, Germann T, Wagener D, Beckman R, Kadau K, Barrett C, Macken C, Burke D, Cooley P: Modeling targeted layered containment of an influenza pandemic in the United States. Proc Nat Acad Sci. 2008, 105 (12): 4639-4644. 10.1073/pnas.0706849105.CrossRefPubMedPubMedCentral Halloran ME, Ferguson N, Eubank S, Longini I, Cummings D, Lewis B, Xu S, Fraser C, Vullikanti A, Germann T, Wagener D, Beckman R, Kadau K, Barrett C, Macken C, Burke D, Cooley P: Modeling targeted layered containment of an influenza pandemic in the United States. Proc Nat Acad Sci. 2008, 105 (12): 4639-4644. 10.1073/pnas.0706849105.CrossRefPubMedPubMedCentral
23.
go back to reference Bailey NTJ: Some stochastic models for small epidemics in large populations. Appl Stat. 1964, 13: 9-19. 10.2307/2985218.CrossRef Bailey NTJ: Some stochastic models for small epidemics in large populations. Appl Stat. 1964, 13: 9-19. 10.2307/2985218.CrossRef
24.
go back to reference Teh YW: Dirichlet processes. Encyclopedia of Machine Learning. 2010, New York: Springer, 280-287. Teh YW: Dirichlet processes. Encyclopedia of Machine Learning. 2010, New York: Springer, 280-287.
25.
go back to reference Blackwell D, Macqueen JB: Ferguson distributions via Pólya urn schemes. Ann Stat. 1973, 1: 353-355. 10.1214/aos/1176342372.CrossRef Blackwell D, Macqueen JB: Ferguson distributions via Pólya urn schemes. Ann Stat. 1973, 1: 353-355. 10.1214/aos/1176342372.CrossRef
28.
go back to reference Hall IM, Gani R, Hughes HE, Leach S: Real-time epidemic forecasting for pandemic influenza. Epidemiol Infect. 2007, 135 (3): 372-385. 10.1017/S0950268806007084.CrossRefPubMed Hall IM, Gani R, Hughes HE, Leach S: Real-time epidemic forecasting for pandemic influenza. Epidemiol Infect. 2007, 135 (3): 372-385. 10.1017/S0950268806007084.CrossRefPubMed
29.
go back to reference Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2009, New York: SpringerCrossRef Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2009, New York: SpringerCrossRef
Metadata
Title
A Dirichlet process model for classifying and forecasting epidemic curves
Authors
Elaine O Nsoesie
Scotland C Leman
Madhav V Marathe
Publication date
01-12-2014
Publisher
BioMed Central
Published in
BMC Infectious Diseases / Issue 1/2014
Electronic ISSN: 1471-2334
DOI
https://doi.org/10.1186/1471-2334-14-12

Other articles of this Issue 1/2014

BMC Infectious Diseases 1/2014 Go to the issue
Live Webinar | 27-06-2024 | 18:00 (CEST)

Keynote webinar | Spotlight on medication adherence

Live: Thursday 27th June 2024, 18:00-19:30 (CEST)

WHO estimates that half of all patients worldwide are non-adherent to their prescribed medication. The consequences of poor adherence can be catastrophic, on both the individual and population level.

Join our expert panel to discover why you need to understand the drivers of non-adherence in your patients, and how you can optimize medication adherence in your clinics to drastically improve patient outcomes.

Prof. Kevin Dolgin
Prof. Florian Limbourg
Prof. Anoop Chauhan
Developed by: Springer Medicine
Obesity Clinical Trial Summary

At a glance: The STEP trials

A round-up of the STEP phase 3 clinical trials evaluating semaglutide for weight loss in people with overweight or obesity.

Developed by: Springer Medicine

Highlights from the ACC 2024 Congress

Year in Review: Pediatric cardiology

Watch Dr. Anne Marie Valente present the last year's highlights in pediatric and congenital heart disease in the official ACC.24 Year in Review session.

Year in Review: Pulmonary vascular disease

The last year's highlights in pulmonary vascular disease are presented by Dr. Jane Leopold in this official video from ACC.24.

Year in Review: Valvular heart disease

Watch Prof. William Zoghbi present the last year's highlights in valvular heart disease from the official ACC.24 Year in Review session.

Year in Review: Heart failure and cardiomyopathies

Watch this official video from ACC.24. Dr. Biykem Bozkurt discusses last year's major advances in heart failure and cardiomyopathies.