Skip to main content

MedFact: Towards Improving Veracity of Medical Information in Social Media Using Applied Machine Learning

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2018)

Abstract

Since the advent of Web 2.0 and social media, anyone with an Internet connection can create content online, even if it is uncertain or fake information, which has attracted significant attention recently. In this study, we address the challenge of uncertain online health information by automating systematic approaches borrowed from evidence-based medicine. Our proposed algorithm, MedFact, enables recommendation of trusted medical information within health-related social media discussions and empowers online users to make informed decisions about the credibility of online health information. MedFact automatically extracts relevant keywords from online discussions and queries trusted medical literature with the aim of embedding related factual information into the discussion. Our retrieval model takes into account layperson terminology and hierarchy of evidence. Consequently, MedFact is a departure from current consensus-based approaches for determining credibility using “wisdom of the crowd”, binary “Like” votes and ratings, popular in social media. Moving away from subjective metrics, MedFact introduces objective metrics. We also present preliminary work towards a granular veracity score by using supervised machine learning to compare statements within uncertain social media text and trusted medical text. We evaluate our proposed algorithm on various data sets from existing health social media involving both patient and medic discussions, with promising results and suggestions for ongoing improvements and future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The GenSim Python API includes the TextRank algorithm [21] implementation

    https://radimrehurek.com/gensim/summarization/keywords.html.

  2. 2.

    SNOMED CT data set available from U.S. National Library of Medicine (NLM)

    https://nlm.nih.gov/healthit/snomedct.

  3. 3.

    CHV data set available from the Consumer Health Vocabulary Initiative

    http://consumerhealthvocab.org.

  4. 4.

    SEW historical data set available via PIKES home page

    http://pikes.fbk.eu/eval-sew.html.

  5. 5.

    The TRIP database is accessible programmatically via web services that were most kindly made available to the authors by Jon Brassey, the TRIP database creator

    https://tripdatabase.com/addtrip.

  6. 6.

    POS tagging is done using the Penn Treebank tags set, all steps in this particular pipeline are programmed with the NLTK Python library http://nltk.org.

  7. 7.

    Sentiment analysis is performed using the TextBlob Python library

    http://textblob.readthedocs.io.

  8. 8.

    The spaCy Python library is used for generating dependency trees https://spacy.io.

  9. 9.

    We implement a shallow CNN with the ConText tool

    https://github.com/riejohnson/ConText.

  10. 10.

    Health Stack Exchange’s beta web site https://health.stackexchange.com.

  11. 11.

    Data set curated from the Stack Exchange Data Dump from the Internet Archive

    https://archive.org/details/stackexchange.

  12. 12.

    QuackWatch web site http://quackwatch.org.

  13. 13.

    DocCheck web site http://doccheck.com.

References

  1. Kata, A.: Anti-vaccine activists, web 2.0, and the postmodern paradigm-an overview of tactics and tropes used online by the anti-vaccination movement. Vaccine 30(25), 3778–3789 (2012)

    Article  Google Scholar 

  2. Rippen, H., Risk, A.: e-Health code of ethics (May 24). J. Med. Internet Res. 2(2) (2000)

    Google Scholar 

  3. Greenhalgh, T.: How to Read a Paper: The Basics of Evidence-Based Medicine. Wiley, Chichester (2010)

    Google Scholar 

  4. Ackley, B.J.: Evidence-Based Nursing Care Guidelines: Medical-Surgical Interventions. Elsevier Health Sciences, St. Louis (2008)

    Google Scholar 

  5. Child, J.: Trust-the fundamental bond in global collaboration. Organ. Dyn. 29(4), 274–288 (2001)

    Article  Google Scholar 

  6. Varlamis, I., Eirinaki, M., Louta, M.: A study on social network metrics and their application in trust networks. In: Proceedings of the IEEE International Conference on Advances in Social Networks Analysis and Mining, pp. 168–175 (2010)

    Google Scholar 

  7. Abdaoui, A., Azé, J., Bringay, S., Poncelet, P.: Collaborative content-based method for estimating user reputation in online forums. In: Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.-C., Li, T., Zhang, Y. (eds.) WISE 2015. LNCS, vol. 9419, pp. 292–299. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26187-4_26

    Chapter  Google Scholar 

  8. Grant, S., Betts, B.: Encouraging user behaviour with achievements: an empirical study. In: IEEE International Working Conference on Mining Software Repositories (MSR), pp. 65–68 (2013)

    Google Scholar 

  9. Aljazzaf, Z.M.: Trust-Based Service Selection. Ph.D. thesis. University of Western Ontario (2011)

    Google Scholar 

  10. Park, M.: HealthTrust: Assessing the Trustworthiness of Healthcare Information on the Internet. Ph.D. thesis. University of Kansas (2013)

    Google Scholar 

  11. Aphinyanaphongs, Y., Aliferis, C., et al.: Text categorization models for identifying unproven cancer treatments on the web. In: World Congress on Medical Informatics (MedInfo), p. 968. IOS Press (2007)

    Google Scholar 

  12. Oliphant, T.: “I am making my decision on the basis of my experience”: constructing authoritative knowledge about treatments for depression. Can. J. Inf. Libr. Sci. 33(3–4), 215–232 (2009)

    Google Scholar 

  13. Stephens, G.J., Silbert, L.J., Hasson, U.: Speaker-listener neural coupling underlies successful communication. Proc. Natl. Acad. Sci. 107(32), 14425–14430 (2010)

    Article  Google Scholar 

  14. Nyhan, B., Reifler, J., Richey, S., Freed, G.L.: Effective messages in vaccine promotion: a randomized trial. Pediatrics 133(4) (2014)

    Google Scholar 

  15. Nyhan, B., Reifler, J.: When corrections fail: the persistence of political misperceptions. Polit. Behav. 32(2), 303–330 (2010)

    Article  Google Scholar 

  16. Plous, S.: The Psychology of Judgment and Decision Making. McGraw-Hill, New York (1993)

    Google Scholar 

  17. Dunning, D.: The dunning-kruger effect: on being ignorant of one’s own ignorance. Adv. Exp. Soc. Psychol. 44, 247 (2011)

    Article  Google Scholar 

  18. Proctor, R., Schiebinger, L.L.: Agnotology: The Making and Unmaking of Ignorance. Stanford University Press, Stanford (2008)

    Google Scholar 

  19. Henderson, J.: Expert and lay knowledge: a sociological perspective. Nutr. Diet. 67(1), 4–5 (2010)

    Article  Google Scholar 

  20. Straus, S.E., Richardson, S.W., Glasziou, P., Haynes, B.R.: Evidence-Based Medicine: How to Practice and Teach EBM. Elsevier/Churchill Livingstone, New York (2005)

    Google Scholar 

  21. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: EMNLP, vol. 4, pp. 404–411 (2004)

    Google Scholar 

  22. Cornet, R., de Keizer, N.: Forty years of SNOMED: a literature review. BMC Med. Inform. Decis. Mak. 8(1), S2 (2008)

    Article  Google Scholar 

  23. Smith, C., Stavri, P.: Consumer health vocabulary. In: Consumer Health Informatics, pp. 122–128 (2005)

    Google Scholar 

  24. Corcoglioniti, F., Rospocher, M., Aprosio, A.P.: Extracting knowledge from text with PIKES. In: International Semantic Web Conference (2015)

    Google Scholar 

  25. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv (2013)

    Google Scholar 

  26. Brassey, J.: TRIP database: identifying high quality medical literature from a range of sources. New Rev. Inf. Netw. 11(2), 229–234 (2005)

    Article  Google Scholar 

  27. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  28. Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends\({\textregistered }\) Inf. Retr. 2(1–2), 1–135 (2008)

    Google Scholar 

  29. De Marneffe, M.C., Manning, C.D.: Stanford Typed Dependencies Manual. Technical report, Stanford University (2008)

    Google Scholar 

  30. Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. In: North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT) (2015)

    Google Scholar 

  31. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)

    Article  MATH  Google Scholar 

Download references

Acknowledgement

We thank the Alberta Machine Intelligence Institute (Amii) for funding this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamman Samuel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Samuel, H., Zaïane, O. (2018). MedFact: Towards Improving Veracity of Medical Information in Social Media Using Applied Machine Learning. In: Bagheri, E., Cheung, J. (eds) Advances in Artificial Intelligence. Canadian AI 2018. Lecture Notes in Computer Science(), vol 10832. Springer, Cham. https://doi.org/10.1007/978-3-319-89656-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-89656-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-89655-7

  • Online ISBN: 978-3-319-89656-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics