Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2023

Open Access 01-12-2023 | Lyme Disease | Research

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter

Authors: Srikanth Boligarla, Elda Kokoè Elolo Laison, Jiaxin Li, Raja Mahadevan, Austen Ng, Yangming Lin, Mamadou Yamar Thioub, Bruce Huang, Mohamed Hamza Ibrahim, Bouchra Nasri

Published in: BMC Medical Informatics and Decision Making | Issue 1/2023

Login to get access

Abstract

Background

Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than \(90\%\) of all vector-borne diseases in North America.

Objective

In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US.

Methods

The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied.

Results

Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of \(90\%\). There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time.

Conclusions

We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets.
Appendix
Available only for authorised users
Literature
1.
go back to reference Murphree Bacon R, Kugeler KJ, Mead PS. Surveillance for Lyme disease--United States, 1992-2006. 2008. Murphree Bacon R, Kugeler KJ, Mead PS. Surveillance for Lyme disease--United States, 1992-2006. 2008.
3.
go back to reference Kumar D, Downs LP, Adegoke A, Machtinger E, Oggenfuss K, Ostfeld RS, et al. An Exploratory Study on the Microbiome of Northern and Southern Populations of Ixodes scapularis Ticks Predicts Changes and Unique Bacterial Interactions. Pathogens. 2022;11(2):130. https://doi.org/10.3390/pathogens11020130. Accessed 17 Sep 2022. Kumar D, Downs LP, Adegoke A, Machtinger E, Oggenfuss K, Ostfeld RS, et al. An Exploratory Study on the Microbiome of Northern and Southern Populations of Ixodes scapularis Ticks Predicts Changes and Unique Bacterial Interactions. Pathogens. 2022;11(2):130. https://​doi.​org/​10.​3390/​pathogens1102013​0. Accessed 17 Sep 2022.
12.
go back to reference Alkishe A, Raghavan RK, Peterson AT. Likely Geographic Distributional Shifts among Medically Important Tick Species and Tick-Associated Diseases under Climate Change in North America: A Review. Insects. 2021;12(3):225. https://doi.org/10.3390/insects12030225. Accessed 17 Sep 2022. Alkishe A, Raghavan RK, Peterson AT. Likely Geographic Distributional Shifts among Medically Important Tick Species and Tick-Associated Diseases under Climate Change in North America: A Review. Insects. 2021;12(3):225. https://​doi.​org/​10.​3390/​insects12030225. Accessed 17 Sep 2022.
14.
15.
go back to reference Kilpatrick AM, Dobson ADM, Levi T, Salkeld DJ, Swei A, Ginsberg HS, et al. Lyme disease ecology in a changing world: consensus, uncertainty and critical gaps for improving control. Phil Trans R Soc B Biol Sci. 2017;372(1722):20160117. https://doi.org/10.1098/rstb.2016.0117. Accessed 17 Sep 2022. Kilpatrick AM, Dobson ADM, Levi T, Salkeld DJ, Swei A, Ginsberg HS, et al. Lyme disease ecology in a changing world: consensus, uncertainty and critical gaps for improving control. Phil Trans R Soc B Biol Sci. 2017;372(1722):20160117. https://​doi.​org/​10.​1098/​rstb.​2016.​0117. Accessed 17 Sep 2022.
19.
go back to reference Eisen RJ, Piesman J, Zielinski-Gutierrez E, Eisen L. What Do We Need to Know About Disease Ecology to Prevent Lyme Disease in the Northeastern United States?: Table 1. J Med Entomol. 2012;49(1):11–22. https://doi.org/10.1603/ME11138. Accessed 17 Sep 2022. Eisen RJ, Piesman J, Zielinski-Gutierrez E, Eisen L. What Do We Need to Know About Disease Ecology to Prevent Lyme Disease in the Northeastern United States?: Table 1. J Med Entomol. 2012;49(1):11–22. https://​doi.​org/​10.​1603/​ME11138. Accessed 17 Sep 2022.
23.
go back to reference Lantos PM, Tsao J, Janko M, Arab A, von Fricken ME, Auwaerter PG, et al. Environmental Correlates of Lyme Disease Emergence in Southwest Virginia, 2005–2014. J Med Entomol. 2021;58(4):1680–1685. https://doi.org/10.1093/jme/tjab038. Accessed 17 Sep 2022. Lantos PM, Tsao J, Janko M, Arab A, von Fricken ME, Auwaerter PG, et al. Environmental Correlates of Lyme Disease Emergence in Southwest Virginia, 2005–2014. J Med Entomol. 2021;58(4):1680–1685. https://​doi.​org/​10.​1093/​jme/​tjab038. Accessed 17 Sep 2022.
45.
46.
48.
go back to reference Eysenbach G. Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. J Med Internet Res. 2009;11(1):11. https://doi.org/10.2196/jmir.1157. Accessed 17 Sep 2022. Eysenbach G. Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. J Med Internet Res. 2009;11(1):11. https://​doi.​org/​10.​2196/​jmir.​1157. Accessed 17 Sep 2022.
49.
go back to reference Pereira-Sanchez V, Alvarez-Mon MA, Del Barco AA, Alvarez-Mon M, Teo A, et al. Exploring the extent of the hikikomori phenomenon on twitter: Mixed methods study of western language tweets. J Med Internet Res. 2019;21(5):14167.CrossRef Pereira-Sanchez V, Alvarez-Mon MA, Del Barco AA, Alvarez-Mon M, Teo A, et al. Exploring the extent of the hikikomori phenomenon on twitter: Mixed methods study of western language tweets. J Med Internet Res. 2019;21(5):14167.CrossRef
52.
go back to reference Mavragani A, Ochoa G, et al. Google Trends in infodemiology and infoveillance: methodology framework. JMIR Public Health Surveill. 2019;5(2):13439.CrossRef Mavragani A, Ochoa G, et al. Google Trends in infodemiology and infoveillance: methodology framework. JMIR Public Health Surveill. 2019;5(2):13439.CrossRef
53.
go back to reference Nuti SV, Wayda B, Ranasinghe I, Wang S, Dreyer RP, Chen SI, et al. The use of google trends in health care research: a systematic review. PLoS ONE. 2014;9(10):109583.CrossRef Nuti SV, Wayda B, Ranasinghe I, Wang S, Dreyer RP, Chen SI, et al. The use of google trends in health care research: a systematic review. PLoS ONE. 2014;9(10):109583.CrossRef
56.
go back to reference Aslam AA, Tsou MH, Spitzberg BH, An L, Gawron JM, Gupta DK, et al. The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance. J Med Internet Res. 2014;16(11):250. https://doi.org/10.2196/jmir.3532. Accessed 17 Sep 2022. Aslam AA, Tsou MH, Spitzberg BH, An L, Gawron JM, Gupta DK, et al. The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance. J Med Internet Res. 2014;16(11):250. https://​doi.​org/​10.​2196/​jmir.​3532. Accessed 17 Sep 2022.
57.
go back to reference Kramer CM, Barkhausen J, Flamm SD, Kim RJ, Nagel E. Standardized cardiovascular magnetic resonance (CMR) protocols 2013 update. J Cardiovasc Magn Reson. 2013;15(1):1–10.CrossRef Kramer CM, Barkhausen J, Flamm SD, Kim RJ, Nagel E. Standardized cardiovascular magnetic resonance (CMR) protocols 2013 update. J Cardiovasc Magn Reson. 2013;15(1):1–10.CrossRef
58.
go back to reference Pollett S, Althouse BM, Forshey B, Rutherford GW, Jarman RG. Internet-based biosurveillance methods for vector-borne diseases: Are they novel public health tools or just novelties? PLoS Negl Trop Dis. 2017;11(11):0005871.CrossRef Pollett S, Althouse BM, Forshey B, Rutherford GW, Jarman RG. Internet-based biosurveillance methods for vector-borne diseases: Are they novel public health tools or just novelties? PLoS Negl Trop Dis. 2017;11(11):0005871.CrossRef
60.
go back to reference Kapitány-Fövény M, Ferenci T, Sulyok Z, Kegele J, Richter H, Vályi-Nagy I, et al. Can Google Trends data improve forecasting of Lyme disease incidence? Zoonoses Public Health. 2019;66(1):101–7. https://doi.org/10.1111/zph.12539. Accessed 17 Sep 2022. Kapitány-Fövény M, Ferenci T, Sulyok Z, Kegele J, Richter H, Vályi-Nagy I, et al. Can Google Trends data improve forecasting of Lyme disease incidence? Zoonoses Public Health. 2019;66(1):101–7. https://​doi.​org/​10.​1111/​zph.​12539. Accessed 17 Sep 2022.
62.
63.
go back to reference Pesälä S, Virtanen MJ, Sane J, Mustonen P, Kaila M, Helve O. Health Information–Seeking Patterns of the General Public and Indications for Disease Surveillance: Register-Based Study Using Lyme Disease. JMIR Public Health Surveill. 2017;3(4):86. https://doi.org/10.2196/publichealth.8306. Accessed 17 Sep 2022. Pesälä S, Virtanen MJ, Sane J, Mustonen P, Kaila M, Helve O. Health Information–Seeking Patterns of the General Public and Indications for Disease Surveillance: Register-Based Study Using Lyme Disease. JMIR Public Health Surveill. 2017;3(4):86. https://​doi.​org/​10.​2196/​publichealth.​8306. Accessed 17 Sep 2022.
65.
go back to reference Scheerer C, Rüth M, Tizek L, Köberle M, Biedermann T, Zink A. Googling for Ticks and Borreliosis in Germany: Nationwide Google Search Analysis From 2015 to 2018. J Med Internet Res. 2020;22(10):18581. https://doi.org/10.2196/18581. Accessed 17 Sep 2022. Scheerer C, Rüth M, Tizek L, Köberle M, Biedermann T, Zink A. Googling for Ticks and Borreliosis in Germany: Nationwide Google Search Analysis From 2015 to 2018. J Med Internet Res. 2020;22(10):18581. https://​doi.​org/​10.​2196/​18581. Accessed 17 Sep 2022.
70.
go back to reference Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. 2013. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781. 2013.
71.
go back to reference Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. pp. 785-794. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. pp. 785-794.
72.
go back to reference Rehurek R, Sojka P. Gensim–python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic. 2011;3(2):2. Rehurek R, Sojka P. Gensim–python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic. 2011;3(2):2.
73.
go back to reference Ramos J, et al. Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242. Citeseer; 2003. p. 29–48. Ramos J, et al. Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242. Citeseer; 2003. p. 29–48.
74.
go back to reference Menard S. Applied logistic regression analysis, vol 106. Sage; 2002. Menard S. Applied logistic regression analysis, vol 106. Sage; 2002.
75.
go back to reference Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805. 2018.
76.
Metadata
Title
Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter
Authors
Srikanth Boligarla
Elda Kokoè Elolo Laison
Jiaxin Li
Raja Mahadevan
Austen Ng
Yangming Lin
Mamadou Yamar Thioub
Bruce Huang
Mohamed Hamza Ibrahim
Bouchra Nasri
Publication date
01-12-2023
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2023
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-023-02315-z

Other articles of this Issue 1/2023

BMC Medical Informatics and Decision Making 1/2023 Go to the issue