Top

BMC Medical Research Methodology

Published in:

Open Access 01-12-2021 | COVID-19 | Research

Need of care in interpreting Google Trends-based COVID-19 infodemiological study results: potential risk of false-positivity

Authors: Kenichiro Sato, Tatsuo Mano, Atsushi Iwata, Tatsushi Toda

Published in: BMC Medical Research Methodology | Issue 1/2021

Abstract

Background

Google Trends (GT) is being used as an epidemiological tool to study coronavirus disease (COVID-19) by identifying keywords in search trends that are predictive for the COVID-19 epidemiological burden. However, many of the earlier GT-based studies include potential statistical fallacies by measuring the correlation between non-stationary time sequences without adjusting for multiple comparisons or the confounding of media coverage, leading to concerns about the increased risk of obtaining false-positive results. In this study, we aimed to apply statistically more favorable methods to validate the earlier GT-based COVID-19 study results.

Methods

We extracted the relative GT search volume for keywords associated with COVID-19 symptoms, and evaluated their Granger-causality to weekly COVID-19 positivity in eight English-speaking countries and Japan. In addition, the impact of media coverage on keywords with significant Granger-causality was further evaluated using Japanese regional data.

Results

Our Granger causality-based approach largely decreased (by up to approximately one-third) the number of keywords identified as having a significant temporal relationship with the COVID-19 trend when compared to those identified by Pearson or Spearman’s rank correlation-based approach. “Sense of smell” and “loss of smell” were the most reliable GT keywords across all the evaluated countries; however, when adjusted with their media coverage, these keyword trends did not Granger-cause the COVID-19 positivity trends (in Japan).

Conclusions

Our results suggest that some of the search keywords reported as candidate predictive measures in earlier GT-based COVID-19 studies may potentially be unreliable; therefore, caution is necessary when interpreting published GT-based study results.

Available only for authorised users

Mavragani A, Ochoa G, Tsagarakis KP. Assessing the methods, tools, and statistical approaches in Google Trends research: systematic review. J Med Internet Res. 2018;20(11):e270.CrossRef

Tenforde MW, Kim SS, Lindsell CJ, Billig Rose E, Shapiro NI, Files DC, et al. Symptom duration and risk factors for delayed return to usual health among outpatients with COVID-19 in a multistate health care systems network - United States, March-June 2020. MMWR Morb Mortal Wkly Rep. 2020;69(30):993–8.CrossRef

Timeline: WHO’s COVID-19 response. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/interactive-timeline. Accessed 6 Apr 2021.

Ayyoubzadeh SM, Ayyoubzadeh SM, Zahedi H, Ahmadi M, Kalhori SRN. Predicting COVID-19 incidence through analysis of Google Trends data in Iran: data mining and deep learning pilot study. JMIR Public Health Surveill. 2020;6(2):e18828.CrossRef

Mavragani A. Tracking COVID-19 in Europe: infodemiology approach. JMIR Public Health Surveill. 2020;6(2):e18941.CrossRef

Cherry G, Rocke J, Chu M, Liu J, Lechner M, Lund VJ, et al. Loss of smell and taste: a new marker of COVID-19? Tracking reduced sense of smell during the coronavirus pandemic using search trends. Expert Rev Anti Infect Ther. 2020;16:1–6.

Ciofani JL, Han D, Allahwala UK, Asrress KN, Bhindi R. Internet search volume for chest pain during the COVID-19 pandemic. Am Heart J. 2020;S0002–8703(20):30258–61.

Higgins TS, Wu AW, Sharma D, Illing EA, Rubel K, Ting JY, Snot Force Alliance. Correlations of online search engine trends with coronavirus disease (COVID-19) incidence: infodemiology study. JMIR Public Health Surveill. 2020;6(2):e19702.CrossRef

Panuganti BA, Jafari A, MacDonald B, DeConde AS. Predicting COVID-19 incidence using anosmia and other COVID-19 symptomatology: preliminary analysis using Google and Twitter. Otolaryngol Head Neck Surg. 2020;163(3):491–7 .CrossRef

10.

Sousa-Pinto B, Anto A, Czarlewski W, Anto JM, Fonseca JA, Bousquet J. Assessment of the impact of media coverage on COVID-19-related Google Trends data: infodemiology study. J Med Internet Res. 2020;22(8):e19611.CrossRef

11.

Chiu APY, Lin Q, He D. News trends and web search query of HIV/AIDS in Hong Kong. PLoS One. 2017;12(9):e0185004.CrossRef

12.

Crowson MG, Witsell D, Eskander A. Using Google Trends to predict pediatric respiratory syncytial virus encounters at a major health care system. J Med Syst. 2020;44(3):57.CrossRef

13.

Syamsuddin M, Fakhruddin M, Sahetapy-Engel JTM, Soewono E. Causality analysis of Google Trends and dengue incidence in Bandung, Indonesia with linkage of digital data modeling: longitudinal observational study. J Med Internet Res. 2020;22(7):e17633.CrossRef

14.

Rehman AU, Malik MI. The modified R a robust measure of association for time series. In: MPRA paper 60025. Germany; University Library of Munich; 2014.

15.

Cervellin G, Comelli I, Lippi G. Is Google Trends a reliable tool for digital epidemiology? Insights from different clinical settings. J Epidemiol Glob Health. 2017;7(3):185–9.CrossRef

16.

Rovetta A, Bhagavathula AS. Global infodemiology of COVID-19: analysis of Google web searches and Instagram hashtags. J Med Internet Res. 2020;22(8):e20673.CrossRef

17.

Massicotte P, Eddelbuettel D. gtrendsR: perform and display Google Trends queries. R package version 1.4.2. 2018. https://CRAN.R-project.org/package=gtrendsR.

18.

Trapletti A, Hornik K. tseries: time series analysis and computational finance. R package version 0.10–47. 2019.

19.

Pfaff B. VAR, SVAR and SVEC models: implementation within R package vars. J Stat Softw. 2008;27(4):1–32. http://www.jstatsoft.org/v27/i04/.CrossRef

20.

Liew VK-S. Which lag length selection criteria should we employ? Econ Bull. 2004;3(33):1–9.

21.

Expert meeting on the novel coronavirus disease control analysis of the response to the novel coronavirus (COVID-19) and recommendations (Exerpt), in March 19, 2020. https://www.mhlw.go.jp/content/10900000/000611515.pdf. Accessed 6 Apr 2021.

22.

Johnson KD, Beiglböck M, Eder M, Grass A, Hermisson J, Pammer G, Polechová J, Toneian D, Wölfl B. Disease momentum: estimating the reproduction number in the presence of superspreading. Infect Dis Model. 2021. https://doi.org/10.1016/j.idm.2021.03.006.CrossRefPubMedPubMedCentral

23.

Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I. Controlling the false discovery rate in behavior genetics research. Behav Brain Res. 2001;125(1–2):279–84.CrossRef

24.

Roelstraete B, Rosseel Y. FIAR: an R package for analyzing functional integration in the brain. J Stat Softw. 2011;44(13):1–32. http://www.jstatsoft.org/v44/i13/.CrossRef

25.

Husain I, Briggs B, Lefebvre C, Cline DM, Stopyra JP, O’Brien MC, et al. Fluctuation of public interest in COVID-19 in the United States: retrospective analysis of Google Trends search data. JMIR Public Health Surveill. 2020;6(3):e19969.CrossRef

26.

Kobayashi G, Sugasawa S, Tamae H, Ozu T. Predicting intervention effect for COVID-19 in Japan: state space modeling approach. Biosci Trends. 2020;14(3):174–81.CrossRef

Title: Need of care in interpreting Google Trends-based COVID-19 infodemiological study results: potential risk of false-positivity
Authors: Kenichiro Sato
Tatsuo Mano
Atsushi Iwata
Tatsushi Toda
Publication date: 01-12-2021
Publisher: BioMed Central
Keyword: COVID-19
Published in: BMC Medical Research Methodology / Issue 1/2021
Electronic ISSN: 1471-2288
DOI: https://doi.org/10.1186/s12874-021-01338-2

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Need of care in interpreting Google Trends-based COVID-19 infodemiological study results: potential risk of false-positivity

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2021

A roadmap to using randomization in clinical trials

A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers

A framework for exploring non-response patterns over time in health surveys

Evaluating the impact of covariate lookback times on performance of patient-level prediction models

Challenges, facilitators and barriers to screening study participants in early disease stages-experience from the MACUSTAR study

Analysis of COVID-19 data using neutrosophic Kruskal Wallis H test