Skip to main content
Top
Published in: BMC Infectious Diseases 1/2017

Open Access 01-12-2017 | Research article

Using electronic health records and Internet search information for accurate influenza forecasting

Authors: Shihao Yang, Mauricio Santillana, John S. Brownstein, Josh Gray, Stewart Richardson, S. C. Kou

Published in: BMC Infectious Diseases | Issue 1/2017

Login to get access

Abstract

Background

Accurate influenza activity forecasting helps public health officials prepare and allocate resources for unusual influenza activity. Traditional flu surveillance systems, such as the Centers for Disease Control and Prevention’s (CDC) influenza-like illnesses reports, lag behind real-time by one to 2 weeks, whereas information contained in cloud-based electronic health records (EHR) and in Internet users’ search activity is typically available in near real-time. We present a method that combines the information from these two data sources with historical flu activity to produce national flu forecasts for the United States up to 4 weeks ahead of the publication of CDC’s flu reports.

Methods

We extend a method originally designed to track flu using Google searches, named ARGO, to combine information from EHR and Internet searches with historical flu activities. Our regularized multivariate regression model dynamically selects the most appropriate variables for flu prediction every week. The model is assessed for the flu seasons within the time period 2013–2016 using multiple metrics including root mean squared error (RMSE).

Results

Our method reduces the RMSE of the publicly available alternative (Healthmap flutrends) method by 33, 20, 17 and 21%, for the four time horizons: real-time, one, two, and 3 weeks ahead, respectively. Such accuracy improvements are statistically significant at the 5% level. Our real-time estimates correctly identified the peak timing and magnitude of the studied flu seasons.

Conclusions

Our method significantly reduces the prediction error when compared to historical publicly available Internet-based prediction systems, demonstrating that: (1) the method to combine data sources is as important as data quality; (2) effectively extracting information from a cloud-based EHR and Internet search activity leads to accurate forecast of flu.
Appendix
Available only for authorised users
Literature
1.
go back to reference WHO. Influenza (Seasonal) [Internet]. Fact Sheet Number 211. 2015 [cited 2016 May 10]. Available from:http://www. who.int/mediacentre/factsheets/fs211/en/index.html. WHO. Influenza (Seasonal) [Internet]. Fact Sheet Number 211. 2015 [cited 2016 May 10]. Available from:http://​www.​ who.int/mediacentre/factsheets/fs211/en/index.html.
2.
go back to reference Lipsitch M, Finelli L, Heffernan RT, Leung GM, Redd SC. 2009 H1n1 Surveillance Group. Improving the evidence base for decision making during a pandemic: the example of 2009 influenza A/H1N1. Biosecur Bioterror. 2011;9:89–115.PubMedPubMedCentral Lipsitch M, Finelli L, Heffernan RT, Leung GM, Redd SC. 2009 H1n1 Surveillance Group. Improving the evidence base for decision making during a pandemic: the example of 2009 influenza A/H1N1. Biosecur Bioterror. 2011;9:89–115.PubMedPubMedCentral
4.
go back to reference Baker L, Wagner TH, Singer S, Bundorf MK. Use of the Internet and e-mail for health care information: results from a national survey. JAMA. 2003;289:2400–6.CrossRefPubMed Baker L, Wagner TH, Singer S, Bundorf MK. Use of the Internet and e-mail for health care information: results from a national survey. JAMA. 2003;289:2400–6.CrossRefPubMed
5.
go back to reference Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection--harnessing the Web for public health surveillance. N Engl J Med. 2009;360:2153–5. 2157CrossRefPubMedPubMedCentral Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection--harnessing the Web for public health surveillance. N Engl J Med. 2009;360:2153–5. 2157CrossRefPubMedPubMedCentral
7.
go back to reference Eysenbach G. Infodemiology: tracking flu-related searches on the web for syndromic surveillance. AMIA Annu Symp Proc. 2006:244–8. Eysenbach G. Infodemiology: tracking flu-related searches on the web for syndromic surveillance. AMIA Annu Symp Proc. 2006:244–8.
8.
go back to reference Lampos V, Zou B, Cox IJ. Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance. Proceedings of the 26th International Conference on World Wide Web. In: International World Wide Web Conferences Steering Committee; 2017. p. 695–704. Lampos V, Zou B, Cox IJ. Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance. Proceedings of the 26th International Conference on World Wide Web. In: International World Wide Web Conferences Steering Committee; 2017. p. 695–704.
9.
go back to reference Santillana M, Zhang DW, Althouse BM, Ayers JW. What can digital disease detection learn from (an external revision to) Google Flu Trends? Am J Prev Med. 2014;47:341–7.CrossRefPubMed Santillana M, Zhang DW, Althouse BM, Ayers JW. What can digital disease detection learn from (an external revision to) Google Flu Trends? Am J Prev Med. 2014;47:341–7.CrossRefPubMed
10.
go back to reference Yang S, Santillana M, Kou SC. Accurate estimation of influenza epidemics using Google search data via ARGO. Proc Natl Acad Sci U S A. 2015;112:14473–8.CrossRefPubMedPubMedCentral Yang S, Santillana M, Kou SC. Accurate estimation of influenza epidemics using Google search data via ARGO. Proc Natl Acad Sci U S A. 2015;112:14473–8.CrossRefPubMedPubMedCentral
11.
go back to reference Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLoS Comput Biol. 2015;11:e1004513.CrossRefPubMedPubMedCentral Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLoS Comput Biol. 2015;11:e1004513.CrossRefPubMedPubMedCentral
12.
go back to reference Santillana M, Nguyen AT, Louie T, Zink A, Gray J, Sung I, et al. Cloud-based Electronic Health Records for Real-time. Region-specific Influenza Surveillance Sci Rep. 2016;6 Santillana M, Nguyen AT, Louie T, Zink A, Gray J, Sung I, et al. Cloud-based Electronic Health Records for Real-time. Region-specific Influenza Surveillance Sci Rep. 2016;6
13.
go back to reference Lazarus R, Kleinman KP, Dashevsky I, DeMaria A, Platt R. Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): the example of lower respiratory infection. BMC Public Health. 2001;1:9.CrossRefPubMedPubMedCentral Lazarus R, Kleinman KP, Dashevsky I, DeMaria A, Platt R. Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): the example of lower respiratory infection. BMC Public Health. 2001;1:9.CrossRefPubMedPubMedCentral
14.
go back to reference Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, et al. Syndromic surveillance using ambulatory electronic health records. J Am Med Inform Assoc. 2009;16:354–61.CrossRefPubMedPubMedCentral Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, et al. Syndromic surveillance using ambulatory electronic health records. J Am Med Inform Assoc. 2009;16:354–61.CrossRefPubMedPubMedCentral
15.
go back to reference Viboud C, Charu V, Olson D, Ballesteros S, Gog J, Khan F, et al. Demonstrating the use of high-volume electronic medical claims data to monitor local and regional influenza activity in the US. PLoS One. 2014;9:e102429.CrossRefPubMedPubMedCentral Viboud C, Charu V, Olson D, Ballesteros S, Gog J, Khan F, et al. Demonstrating the use of high-volume electronic medical claims data to monitor local and regional influenza activity in the US. PLoS One. 2014;9:e102429.CrossRefPubMedPubMedCentral
16.
go back to reference Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012–4.CrossRefPubMed Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012–4.CrossRefPubMed
17.
go back to reference Nsoesie EO, Brownstein JS, Ramakrishnan N, Marathe MV. A systematic review of studies on forecasting the dynamics of influenza outbreaks. Influenza Other Respir Viruses. 2014;8:309–16.CrossRefPubMed Nsoesie EO, Brownstein JS, Ramakrishnan N, Marathe MV. A systematic review of studies on forecasting the dynamics of influenza outbreaks. Influenza Other Respir Viruses. 2014;8:309–16.CrossRefPubMed
18.
19.
go back to reference Biggerstaff M, Alper D, Dredze M, Fox S, Fung IC-H, Hickmann KS, et al. Results from the centers for disease control and prevention’s predict the 2013--2014 Influenza Season Challenge. BMC Infect Dis BioMed Central. 2016;16:357.CrossRef Biggerstaff M, Alper D, Dredze M, Fox S, Fung IC-H, Hickmann KS, et al. Results from the centers for disease control and prevention’s predict the 2013--2014 Influenza Season Challenge. BMC Infect Dis BioMed Central. 2016;16:357.CrossRef
21.
22.
go back to reference Pandemic Influenza Preparedness and Response: A WHO Guidance Document. World Health Organization; 2009. Pandemic Influenza Preparedness and Response: A WHO Guidance Document. World Health Organization; 2009.
23.
go back to reference Dalton C, Durrheim D, Fejsa J, Francis L, Carlson S, d’Espaignet ET, et al. Flutracking: a weekly Australian community online survey of influenza-like illness in 2006, 2007 and 2008. Commun Dis Intell Q Rep search.informit.com.au. 2009;33:316–22.PubMed Dalton C, Durrheim D, Fejsa J, Francis L, Carlson S, d’Espaignet ET, et al. Flutracking: a weekly Australian community online survey of influenza-like illness in 2006, 2007 and 2008. Commun Dis Intell Q Rep search.informit.com.au. 2009;33:316–22.PubMed
25.
go back to reference Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible Modeling of Epidemics with an Empirical Bayes Framework. PLoS Comput Biol. 2015;11:e1004382.CrossRefPubMedPubMedCentral Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible Modeling of Epidemics with an Empirical Bayes Framework. PLoS Comput Biol. 2015;11:e1004382.CrossRefPubMedPubMedCentral
27.
go back to reference Yang W, Karspeck A, Shaman J. Comparison of filtering methods for the modeling and retrospective forecasting of influenza epidemics. PLoS Comput Biol. 2014;10:e1003583.CrossRefPubMedPubMedCentral Yang W, Karspeck A, Shaman J. Comparison of filtering methods for the modeling and retrospective forecasting of influenza epidemics. PLoS Comput Biol. 2014;10:e1003583.CrossRefPubMedPubMedCentral
30.
go back to reference Chakraborty P, Khadivi P, Lewis B, Mahendiran A, Chen J, Butler P, et al. Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions. Proceedings of the 2014 SIAM International Conference on Data Mining. 2014. p. 262–70. Chakraborty P, Khadivi P, Lewis B, Mahendiran A, Chen J, Butler P, et al. Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions. Proceedings of the 2014 SIAM International Conference on Data Mining. 2014. p. 262–70.
31.
go back to reference Hickmann KS, Fairchild G, Priedhorsky R, Generous N, Hyman JM, Deshpande A, et al. Forecasting the 2013–2014 Influenza Season Using Wikipedia. PLoS Comput Biol Public Library of Science. 2015;11:e1004239.CrossRef Hickmann KS, Fairchild G, Priedhorsky R, Generous N, Hyman JM, Deshpande A, et al. Forecasting the 2013–2014 Influenza Season Using Wikipedia. PLoS Comput Biol Public Library of Science. 2015;11:e1004239.CrossRef
32.
go back to reference Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Series B Stat Methodol [R Stat Soc Wiley]. 1996;58:267–88. Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Series B Stat Methodol [R Stat Soc Wiley]. 1996;58:267–88.
33.
go back to reference Politis DN, Romano JP. The Stationary Bootstrap. J Am Stat Assoc. 1994;89:1303–13.CrossRef Politis DN, Romano JP. The Stationary Bootstrap. J Am Stat Assoc. 1994;89:1303–13.CrossRef
34.
go back to reference Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS One. 2011;6:e19467.CrossRefPubMedPubMedCentral Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS One. 2011;6:e19467.CrossRefPubMedPubMedCentral
35.
go back to reference Smolinski MS, Crawley AW, Kristin B, Rumi C, Olsen JM, Oktawia W, et al. Flu Near You: Crowdsourced Symptom Reporting Spanning 2 Influenza Seasons. Am J Public Health. 2015;105:2124–30.CrossRefPubMedPubMedCentral Smolinski MS, Crawley AW, Kristin B, Rumi C, Olsen JM, Oktawia W, et al. Flu Near You: Crowdsourced Symptom Reporting Spanning 2 Influenza Seasons. Am J Public Health. 2015;105:2124–30.CrossRefPubMedPubMedCentral
36.
go back to reference Santillana M, Nsoesie EO, Mekaru SR, Scales D, Brownstein JS. Using clinicians’ search query data to monitor influenza epidemics. Clin Infect Dis. 2014;59:1446–50.CrossRefPubMedPubMedCentral Santillana M, Nsoesie EO, Mekaru SR, Scales D, Brownstein JS. Using clinicians’ search query data to monitor influenza epidemics. Clin Infect Dis. 2014;59:1446–50.CrossRefPubMedPubMedCentral
37.
go back to reference McIver DJ, Brownstein JS. Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLoS Comput Biol. 2014;10:e1003581.CrossRefPubMedPubMedCentral McIver DJ, Brownstein JS. Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLoS Comput Biol. 2014;10:e1003581.CrossRefPubMedPubMedCentral
38.
go back to reference Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global disease monitoring and forecasting with Wikipedia. PLoS Comput Biol. 2014;10:e1003892.CrossRefPubMedPubMedCentral Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global disease monitoring and forecasting with Wikipedia. PLoS Comput Biol. 2014;10:e1003892.CrossRefPubMedPubMedCentral
39.
go back to reference Broeck WVD, Van den Broeck W, Gioannini C, Gonçalves B, Quaggiotto M, Colizza V, et al. The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infect. Dis. [Internet]. 2011;11:37. doi:10.1186/1471-2334-11-37. Broeck WVD, Van den Broeck W, Gioannini C, Gonçalves B, Quaggiotto M, Colizza V, et al. The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infect. Dis. [Internet]. 2011;11:37. doi:10.​1186/​1471-2334-11-37.
40.
go back to reference Polgreen PM, Yiling C, Pennock DM, Nelson FD. Using Internet Searches for Influenza Surveillance. Clin Infect Dis. 2008;47:1443–8.CrossRefPubMed Polgreen PM, Yiling C, Pennock DM, Nelson FD. Using Internet Searches for Influenza Surveillance. Clin Infect Dis. 2008;47:1443–8.CrossRefPubMed
41.
go back to reference Yuan Q, Nsoesie EO, Lv B, Peng G, Chunara R, Brownstein JS. Monitoring influenza epidemics in china with search query from baidu. PLoS One. 2013;8:e64323.CrossRefPubMedPubMedCentral Yuan Q, Nsoesie EO, Lv B, Peng G, Chunara R, Brownstein JS. Monitoring influenza epidemics in china with search query from baidu. PLoS One. 2013;8:e64323.CrossRefPubMedPubMedCentral
43.
go back to reference Cook S, Conrad C, Fowlkes AL, Mohebbi MH. Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One. 2011;6:e23610.CrossRefPubMedPubMedCentral Cook S, Conrad C, Fowlkes AL, Mohebbi MH. Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One. 2011;6:e23610.CrossRefPubMedPubMedCentral
44.
go back to reference Olson DR, Konty KJ, Paladini M, Viboud C, Simonsen L. Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales. PLoS Comput Biol. 2013;9:e1003256.CrossRefPubMedPubMedCentral Olson DR, Konty KJ, Paladini M, Viboud C, Simonsen L. Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales. PLoS Comput Biol. 2013;9:e1003256.CrossRefPubMedPubMedCentral
45.
go back to reference Lazer D, Kennedy R, King G, Vespignani A. The Parable of Google Flu: Traps in Big Data Analysis. Science. 2014;343:1203–5.CrossRefPubMed Lazer D, Kennedy R, King G, Vespignani A. The Parable of Google Flu: Traps in Big Data Analysis. Science. 2014;343:1203–5.CrossRefPubMed
46.
go back to reference Kuehn BM. Scientists mine web search data to identify epidemics and adverse events. JAMA. 2013;309:1883–4.CrossRefPubMed Kuehn BM. Scientists mine web search data to identify epidemics and adverse events. JAMA. 2013;309:1883–4.CrossRefPubMed
Metadata
Title
Using electronic health records and Internet search information for accurate influenza forecasting
Authors
Shihao Yang
Mauricio Santillana
John S. Brownstein
Josh Gray
Stewart Richardson
S. C. Kou
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Infectious Diseases / Issue 1/2017
Electronic ISSN: 1471-2334
DOI
https://doi.org/10.1186/s12879-017-2424-7

Other articles of this Issue 1/2017

BMC Infectious Diseases 1/2017 Go to the issue