Skip to main content
Top
Published in: Acta Diabetologica 10/2019

01-10-2019 | Original Article

Can internet search engine queries be used to diagnose diabetes? Analysis of archival search data

Authors: Irit Hochberg, Deeb Daoud, Naim Shehadeh, Elad Yom-Tov

Published in: Acta Diabetologica | Issue 10/2019

Login to get access

Abstract

Aims

Diabetes is often diagnosed late. This study aimed to assess the possibility for earlier detection of diabetes from search data, using predictive models trained on large-scale data.

Methods

We extracted all English-language queries made by people in the USA to Bing during 1 year and identified queries containing symptoms of diabetes. We compared the ability of four different prediction models (linear regression, logistic regression, decision tree and random forest) to distinguish between users who stated that they were diagnosed with diabetes and users who did not refer to diabetes or diabetes drugs but queried about at least one of the symptoms.

Results

We identified 11,050 “new diabetes users” who stated they had been diagnosed with diabetes and approximately 11.5 million “control users” who queried about symptoms without querying for terms related to diabetes. Both the logistic regression and the random forest models were able to distinguish between the populations with an area under curve of 0.92 which translates to a positive predictive value of 56% at a false-positive rate of 1%. The model could identify patients up to 240 days before they mentioned being diagnosed.

Conclusions

Some undiagnosed diabetes patients can be detected accurately according to their symptom queries to a search engine. Such earlier diagnosis, especially in cases of type 1 diabetes, could be clinically meaningful. The ability of search engines to serve as a population-wide screening tool could potentially be improved using additional data provided by users.
Appendix
Available only for authorised users
Literature
1.
go back to reference National Diabetes Statistics Report (2017) CDC, Alanta National Diabetes Statistics Report (2017) CDC, Alanta
2.
go back to reference Rodbard HW, Green AJ, Fox KM, Grandy S (2009) Trends in method of diagnosis of type 2 diabetes mellitus: results from SHIELD. Int J Endocrinol 2009:796206CrossRefPubMedPubMedCentral Rodbard HW, Green AJ, Fox KM, Grandy S (2009) Trends in method of diagnosis of type 2 diabetes mellitus: results from SHIELD. Int J Endocrinol 2009:796206CrossRefPubMedPubMedCentral
4.
go back to reference International Diabetes Federation (2017) IDF diabetes atlas, 8th edn. International Diabetes Federation, Brussels International Diabetes Federation (2017) IDF diabetes atlas, 8th edn. International Diabetes Federation, Brussels
5.
go back to reference Bertuzzi F et al (2018) Teleconsultation in type 1 diabetes mellitus (TELEDIABE). Acta Diabetol 55(2):185–192CrossRefPubMed Bertuzzi F et al (2018) Teleconsultation in type 1 diabetes mellitus (TELEDIABE). Acta Diabetol 55(2):185–192CrossRefPubMed
6.
go back to reference Di Bartolo P, Nicolucci A, Cherubini V, Iafusco D, Scardapane M, Rossi MC (2017) Young patients with type 1 diabetes poorly controlled and poorly compliant with self-monitoring of blood glucose: can technology help? Results of the i-NewTrend randomized clinical trial. Acta Diabetol 54(4):393–402CrossRefPubMed Di Bartolo P, Nicolucci A, Cherubini V, Iafusco D, Scardapane M, Rossi MC (2017) Young patients with type 1 diabetes poorly controlled and poorly compliant with self-monitoring of blood glucose: can technology help? Results of the i-NewTrend randomized clinical trial. Acta Diabetol 54(4):393–402CrossRefPubMed
8.
go back to reference Fox S, Duggan M (2013) Health online. Pew Research Center, Washington Fox S, Duggan M (2013) Health online. Pew Research Center, Washington
9.
go back to reference Yom-Tov E, Gabrilovich E (2013) Postmarket drug surveillance without trial costs: discovery of adverse drug reactions through large-scale analysis of web search queries. J Med Internet Res 15(6):e124CrossRefPubMedPubMedCentral Yom-Tov E, Gabrilovich E (2013) Postmarket drug surveillance without trial costs: discovery of adverse drug reactions through large-scale analysis of web search queries. J Med Internet Res 15(6):e124CrossRefPubMedPubMedCentral
10.
go back to reference Yom-Tov E, Borsa D, Hayward AC, McKendry RA, Cox IJ (2015) Automatic identification of web-based risk markers for health events. J Med Internet Res 17(1):e29CrossRefPubMedPubMedCentral Yom-Tov E, Borsa D, Hayward AC, McKendry RA, Cox IJ (2015) Automatic identification of web-based risk markers for health events. J Med Internet Res 17(1):e29CrossRefPubMedPubMedCentral
11.
go back to reference Soldaini L, Yom-Tov E (2017) Inferring individual attributes from search engine queries and auxiliary information, pp 293–301 Soldaini L, Yom-Tov E (2017) Inferring individual attributes from search engine queries and auxiliary information, pp 293–301
12.
go back to reference White RW, Horvitz E (2017) Evaluation of the feasibility of screening patients for early signs of lung carcinoma in web search logs. JAMA Oncol 3(3):398CrossRefPubMed White RW, Horvitz E (2017) Evaluation of the feasibility of screening patients for early signs of lung carcinoma in web search logs. JAMA Oncol 3(3):398CrossRefPubMed
14.
go back to reference Allerhand L, Youngmann B, Yom-Tov E, Arkadir D (2018) Detecting Parkinson’s disease from interactions with a search engine: is expert knowledge sufficient? In: Proceedings of the 27th ACM international conference on information and knowledge management—CIKM’18, Torino, Italy, pp 1539–1542 Allerhand L, Youngmann B, Yom-Tov E, Arkadir D (2018) Detecting Parkinson’s disease from interactions with a search engine: is expert knowledge sufficient? In: Proceedings of the 27th ACM international conference on information and knowledge management—CIKM’18, Torino, Italy, pp 1539–1542
15.
go back to reference Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
16.
go back to reference Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36CrossRefPubMed Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36CrossRefPubMed
Metadata
Title
Can internet search engine queries be used to diagnose diabetes? Analysis of archival search data
Authors
Irit Hochberg
Deeb Daoud
Naim Shehadeh
Elad Yom-Tov
Publication date
01-10-2019
Publisher
Springer Milan
Published in
Acta Diabetologica / Issue 10/2019
Print ISSN: 0940-5429
Electronic ISSN: 1432-5233
DOI
https://doi.org/10.1007/s00592-019-01350-5

Other articles of this Issue 10/2019

Acta Diabetologica 10/2019 Go to the issue
Live Webinar | 27-06-2024 | 18:00 (CEST)

Keynote webinar | Spotlight on medication adherence

Live: Thursday 27th June 2024, 18:00-19:30 (CEST)

WHO estimates that half of all patients worldwide are non-adherent to their prescribed medication. The consequences of poor adherence can be catastrophic, on both the individual and population level.

Join our expert panel to discover why you need to understand the drivers of non-adherence in your patients, and how you can optimize medication adherence in your clinics to drastically improve patient outcomes.

Prof. Kevin Dolgin
Prof. Florian Limbourg
Prof. Anoop Chauhan
Developed by: Springer Medicine
Obesity Clinical Trial Summary

At a glance: The STEP trials

A round-up of the STEP phase 3 clinical trials evaluating semaglutide for weight loss in people with overweight or obesity.

Developed by: Springer Medicine