Abstract
Since the advent of Web 2.0 and social media, anyone with an Internet connection can create content online, even if it is uncertain or fake information, which has attracted significant attention recently. In this study, we address the challenge of uncertain online health information by automating systematic approaches borrowed from evidence-based medicine. Our proposed algorithm, MedFact, enables recommendation of trusted medical information within health-related social media discussions and empowers online users to make informed decisions about the credibility of online health information. MedFact automatically extracts relevant keywords from online discussions and queries trusted medical literature with the aim of embedding related factual information into the discussion. Our retrieval model takes into account layperson terminology and hierarchy of evidence. Consequently, MedFact is a departure from current consensus-based approaches for determining credibility using “wisdom of the crowd”, binary “Like” votes and ratings, popular in social media. Moving away from subjective metrics, MedFact introduces objective metrics. We also present preliminary work towards a granular veracity score by using supervised machine learning to compare statements within uncertain social media text and trusted medical text. We evaluate our proposed algorithm on various data sets from existing health social media involving both patient and medic discussions, with promising results and suggestions for ongoing improvements and future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The GenSim Python API includes the TextRank algorithm [21] implementation
https://radimrehurek.com/gensim/summarization/keywords.html.
- 2.
SNOMED CT data set available from U.S. National Library of Medicine (NLM)
- 3.
CHV data set available from the Consumer Health Vocabulary Initiative
- 4.
SEW historical data set available via PIKES home page
- 5.
The TRIP database is accessible programmatically via web services that were most kindly made available to the authors by Jon Brassey, the TRIP database creator
- 6.
POS tagging is done using the Penn Treebank tags set, all steps in this particular pipeline are programmed with the NLTK Python library http://nltk.org.
- 7.
Sentiment analysis is performed using the TextBlob Python library
- 8.
The spaCy Python library is used for generating dependency trees https://spacy.io.
- 9.
We implement a shallow CNN with the ConText tool
- 10.
Health Stack Exchange’s beta web site https://health.stackexchange.com.
- 11.
Data set curated from the Stack Exchange Data Dump from the Internet Archive
- 12.
QuackWatch web site http://quackwatch.org.
- 13.
DocCheck web site http://doccheck.com.
References
Kata, A.: Anti-vaccine activists, web 2.0, and the postmodern paradigm-an overview of tactics and tropes used online by the anti-vaccination movement. Vaccine 30(25), 3778–3789 (2012)
Rippen, H., Risk, A.: e-Health code of ethics (May 24). J. Med. Internet Res. 2(2) (2000)
Greenhalgh, T.: How to Read a Paper: The Basics of Evidence-Based Medicine. Wiley, Chichester (2010)
Ackley, B.J.: Evidence-Based Nursing Care Guidelines: Medical-Surgical Interventions. Elsevier Health Sciences, St. Louis (2008)
Child, J.: Trust-the fundamental bond in global collaboration. Organ. Dyn. 29(4), 274–288 (2001)
Varlamis, I., Eirinaki, M., Louta, M.: A study on social network metrics and their application in trust networks. In: Proceedings of the IEEE International Conference on Advances in Social Networks Analysis and Mining, pp. 168–175 (2010)
Abdaoui, A., Azé, J., Bringay, S., Poncelet, P.: Collaborative content-based method for estimating user reputation in online forums. In: Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.-C., Li, T., Zhang, Y. (eds.) WISE 2015. LNCS, vol. 9419, pp. 292–299. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26187-4_26
Grant, S., Betts, B.: Encouraging user behaviour with achievements: an empirical study. In: IEEE International Working Conference on Mining Software Repositories (MSR), pp. 65–68 (2013)
Aljazzaf, Z.M.: Trust-Based Service Selection. Ph.D. thesis. University of Western Ontario (2011)
Park, M.: HealthTrust: Assessing the Trustworthiness of Healthcare Information on the Internet. Ph.D. thesis. University of Kansas (2013)
Aphinyanaphongs, Y., Aliferis, C., et al.: Text categorization models for identifying unproven cancer treatments on the web. In: World Congress on Medical Informatics (MedInfo), p. 968. IOS Press (2007)
Oliphant, T.: “I am making my decision on the basis of my experience”: constructing authoritative knowledge about treatments for depression. Can. J. Inf. Libr. Sci. 33(3–4), 215–232 (2009)
Stephens, G.J., Silbert, L.J., Hasson, U.: Speaker-listener neural coupling underlies successful communication. Proc. Natl. Acad. Sci. 107(32), 14425–14430 (2010)
Nyhan, B., Reifler, J., Richey, S., Freed, G.L.: Effective messages in vaccine promotion: a randomized trial. Pediatrics 133(4) (2014)
Nyhan, B., Reifler, J.: When corrections fail: the persistence of political misperceptions. Polit. Behav. 32(2), 303–330 (2010)
Plous, S.: The Psychology of Judgment and Decision Making. McGraw-Hill, New York (1993)
Dunning, D.: The dunning-kruger effect: on being ignorant of one’s own ignorance. Adv. Exp. Soc. Psychol. 44, 247 (2011)
Proctor, R., Schiebinger, L.L.: Agnotology: The Making and Unmaking of Ignorance. Stanford University Press, Stanford (2008)
Henderson, J.: Expert and lay knowledge: a sociological perspective. Nutr. Diet. 67(1), 4–5 (2010)
Straus, S.E., Richardson, S.W., Glasziou, P., Haynes, B.R.: Evidence-Based Medicine: How to Practice and Teach EBM. Elsevier/Churchill Livingstone, New York (2005)
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP, vol. 4, pp. 404–411 (2004)
Cornet, R., de Keizer, N.: Forty years of SNOMED: a literature review. BMC Med. Inform. Decis. Mak. 8(1), S2 (2008)
Smith, C., Stavri, P.: Consumer health vocabulary. In: Consumer Health Informatics, pp. 122–128 (2005)
Corcoglioniti, F., Rospocher, M., Aprosio, A.P.: Extracting knowledge from text with PIKES. In: International Semantic Web Conference (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv (2013)
Brassey, J.: TRIP database: identifying high quality medical literature from a range of sources. New Rev. Inf. Netw. 11(2), 229–234 (2005)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends\({\textregistered }\) Inf. Retr. 2(1–2), 1–135 (2008)
De Marneffe, M.C., Manning, C.D.: Stanford Typed Dependencies Manual. Technical report, Stanford University (2008)
Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. In: North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT) (2015)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Acknowledgement
We thank the Alberta Machine Intelligence Institute (Amii) for funding this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Samuel, H., Zaïane, O. (2018). MedFact: Towards Improving Veracity of Medical Information in Social Media Using Applied Machine Learning. In: Bagheri, E., Cheung, J. (eds) Advances in Artificial Intelligence. Canadian AI 2018. Lecture Notes in Computer Science(), vol 10832. Springer, Cham. https://doi.org/10.1007/978-3-319-89656-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-89656-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89655-7
Online ISBN: 978-3-319-89656-4
eBook Packages: Computer ScienceComputer Science (R0)