Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2024

Open Access 01-12-2024 | Research

A patient safety knowledge graph supporting vaccine product development

Authors: Andrew M. Simms, Anshul Kanakia, Muhammad Sipra, Bhaskar Dutta, Noel Southall

Published in: BMC Medical Informatics and Decision Making | Issue 1/2024

Login to get access

Abstract

Background

Knowledge graphs are well-suited for modeling complex, unstructured, and multi-source data and facilitating their analysis. During the COVID-19 pandemic, adverse event data were integrated into a knowledge graph to support vaccine safety surveillance and nimbly respond to urgent health authority questions. Here, we provide details of this post-marketing safety system using public data sources. In addition to challenges with varied data representations, adverse event reporting on the COVID-19 vaccines generated an unprecedented volume of data; an order of magnitude larger than adverse events for all previous vaccines. The Patient Safety Knowledge Graph (PSKG) is a robust data store to accommodate the volume of adverse event data and harmonize primary surveillance data sources.

Methods

We designed a semantic model to represent key safety concepts. We built an extract-transform-load (ETL) data pipeline to parse and import primary public data sources; align key elements such as vaccine names; integrated the Medical Dictionary for Regulatory Activities (MedDRA); and applied quality metrics. PSKG is deployed in a Neo4J graph database, and made available via a web interface and Application Programming Interfaces (APIs).

Results

We import and align adverse event data and vaccine exposure data from 250 countries on a weekly basis, producing a graph with 4,340,980 nodes and 30,544,475 edges as of July 1, 2022. PSKG is used for ad-hoc analyses and periodic reporting for several widely available COVID-19 vaccines. Analysis code using the knowledge graph is 80% shorter than an equivalent implementation written entirely in Python, and runs over 200 times faster.

Conclusions

Organizing safety data into a concise model of nodes, properties, and edge relationships has greatly simplified analysis code by removing complex parsing and transformation algorithms from individual analyses and instead managing these centrally. The adoption of the knowledge graph transformed how the team answers key scientific and medical questions. Whereas previously an analysis would involve aggregating and transforming primary datasets from scratch to answer a specific question, the team can now iterate easily and respond as quickly as requests evolve (e.g., “Produce vaccine-X safety profile for adverse event-Y by country instead of age-range”).
Appendix
Available only for authorised users
Literature
8.
go back to reference Brown E, Wood L, Wood S. The Medical Dictionary for Regulatory Activities (MedDRA). Drug Saf. 1999;20(2):109–17.CrossRefPubMed Brown E, Wood L, Wood S. The Medical Dictionary for Regulatory Activities (MedDRA). Drug Saf. 1999;20(2):109–17.CrossRefPubMed
9.
go back to reference Mozzicato P. MedDRA An Overview of the Medical Dictionary for Regulatory Activities. Pharm Med. 2009;23(2):65.CrossRef Mozzicato P. MedDRA An Overview of the Medical Dictionary for Regulatory Activities. Pharm Med. 2009;23(2):65.CrossRef
14.
go back to reference Deutsch P. RFC1952: GZIP File Format Specification Version 4.3. USA: RFC Editor; 1996. Deutsch P. RFC1952: GZIP File Format Specification Version 4.3. USA: RFC Editor; 1996.
19.
25.
go back to reference Abadi D, Boncz P, Ieos S, Harizopoulos S, Madden S. The Design and Implementation of Modern Column-Oriented Database Systems. Found Trends Databases. 2013;5(3):197–280.CrossRef Abadi D, Boncz P, Ieos S, Harizopoulos S, Madden S. The Design and Implementation of Modern Column-Oriented Database Systems. Found Trends Databases. 2013;5(3):197–280.CrossRef
27.
go back to reference Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.CrossRefPubMed Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.CrossRefPubMed
28.
go back to reference Huang K, Altosaar J, Ranganath R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342. 2019. Huang K, Altosaar J, Ranganath R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:​1904.​05342. 2019.
29.
go back to reference Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, et al. Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323. 2019. Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, et al. Publicly available clinical BERT embeddings. arXiv preprint arXiv:​1904.​03323. 2019.
30.
go back to reference Johnson AE, Pollard TJ, Shen L, Lehman LwH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1–9. Johnson AE, Pollard TJ, Shen L, Lehman LwH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1–9.
31.
go back to reference Boland MR, Jacunski A, Lorberbaum T, Romano JD, Moskovitch R, Tatonetti NP. Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms. Wiley Interdiscip Rev Syst Biol Med. 2016;8(2):104–22.CrossRefPubMed Boland MR, Jacunski A, Lorberbaum T, Romano JD, Moskovitch R, Tatonetti NP. Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms. Wiley Interdiscip Rev Syst Biol Med. 2016;8(2):104–22.CrossRefPubMed
32.
go back to reference Ho TB, Le L, Thai DT, Taewijit S. Data-driven approach to detect and predict adverse drug reactions. Curr Pharm Des. 2016;22(23):3498–526.CrossRefPubMed Ho TB, Le L, Thai DT, Taewijit S. Data-driven approach to detect and predict adverse drug reactions. Curr Pharm Des. 2016;22(23):3498–526.CrossRefPubMed
33.
go back to reference Bean DM, Wu H, Iqbal E, Dzahini O, Ibrahim ZM, Broadbent M, et al. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep. 2017;7(1):1–11.CrossRef Bean DM, Wu H, Iqbal E, Dzahini O, Ibrahim ZM, Broadbent M, et al. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep. 2017;7(1):1–11.CrossRef
34.
go back to reference Yacoumatos C, Bragaglia S, Kanakia A, Svangård N, Mangion J, Donoghue C, et al. TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of Clinical Trials. arXiv preprint arXiv:2112.08211. 2021. Yacoumatos C, Bragaglia S, Kanakia A, Svangård N, Mangion J, Donoghue C, et al. TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of Clinical Trials. arXiv preprint arXiv:​2112.​08211. 2021.
35.
go back to reference Wang M, Qiu L, Wang X. A survey on knowledge graph embeddings for link prediction. Symmetry. 2021;13(3):485.CrossRef Wang M, Qiu L, Wang X. A survey on knowledge graph embeddings for link prediction. Symmetry. 2021;13(3):485.CrossRef
36.
go back to reference Bhowmik R, Melo Gd. Explainable link prediction for emerging entities in knowledge graphs. In: International Semantic Web Conference. Springer; 2020. p. 39–55. Bhowmik R, Melo Gd. Explainable link prediction for emerging entities in knowledge graphs. In: International Semantic Web Conference. Springer; 2020. p. 39–55.
37.
go back to reference Barbieri N, Bonchi F, Manco G. Who to follow and why: link prediction with explanations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. p. 1266–1275. Barbieri N, Bonchi F, Manco G. Who to follow and why: link prediction with explanations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. p. 1266–1275.
Metadata
Title
A patient safety knowledge graph supporting vaccine product development
Authors
Andrew M. Simms
Anshul Kanakia
Muhammad Sipra
Bhaskar Dutta
Noel Southall
Publication date
01-12-2024
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2024
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-023-02409-8

Other articles of this Issue 1/2024

BMC Medical Informatics and Decision Making 1/2024 Go to the issue