Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2018

Open Access 01-12-2018 | Software

CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Authors: Richard Jackson, Ismail Kartoglu, Clive Stringer, Genevieve Gorrell, Angus Roberts, Xingyi Song, Honghan Wu, Asha Agrawal, Kenneth Lui, Tudor Groza, Damian Lewsley, Doug Northwood, Amos Folarin, Robert Stewart, Richard Dobson

Published in: BMC Medical Informatics and Decision Making | Issue 1/2018

Login to get access

Abstract

Background

Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King’s College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present.

Results

To date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King’s College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall.

Conclusion

We describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.
Literature
1.
go back to reference Simborg DW. An emerging standard for health communications: The HL7 standard. Healthc Comput Commun. 1987; 4(10):58–60.PubMed Simborg DW. An emerging standard for health communications: The HL7 standard. Healthc Comput Commun. 1987; 4(10):58–60.PubMed
2.
go back to reference Klein GO. Standardization of health informatics–results and challenges. Methods Inf Med. 2002; 41(4):261–70.CrossRefPubMed Klein GO. Standardization of health informatics–results and challenges. Methods Inf Med. 2002; 41(4):261–70.CrossRefPubMed
3.
go back to reference Barnes M. Lessons learned from the implementation of clinical messaging systems. AMIA... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. Montgomery: The American Medical Informatics Institution; 2007, pp. 36–40. Barnes M. Lessons learned from the implementation of clinical messaging systems. AMIA... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. Montgomery: The American Medical Informatics Institution; 2007, pp. 36–40.
4.
go back to reference Worden R, Scott P. Simplifying HL7 Version 3 messages. Stud Health Technol Inform. 2011; 169:709–13.PubMed Worden R, Scott P. Simplifying HL7 Version 3 messages. Stud Health Technol Inform. 2011; 169:709–13.PubMed
5.
go back to reference Antolík J. Automatic annotation of medical records. Stud Health Technol Inform. 2005; 116:817–22. Cited by 0003.PubMed Antolík J. Automatic annotation of medical records. Stud Health Technol Inform. 2005; 116:817–22. Cited by 0003.PubMed
6.
go back to reference Albright D, Lanfranchi A, Fredriksen A, Styler WF, Warner C, Hwang JD, Choi JD, Dligach D, Nielsen RD, Martin J, Ward W, Palmer M, Savova GK. Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc. 2013; 20(5):922–30.CrossRefPubMedPubMedCentral Albright D, Lanfranchi A, Fredriksen A, Styler WF, Warner C, Hwang JD, Choi JD, Dligach D, Nielsen RD, Martin J, Ward W, Palmer M, Savova GK. Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc. 2013; 20(5):922–30.CrossRefPubMedPubMedCentral
7.
go back to reference Barrett N, Weber-Jahnke JH. Applying natural language processing toolkits to electronic health records - an experience report. Stud Health Technol Inform. 2009; 143:441–6.PubMed Barrett N, Weber-Jahnke JH. Applying natural language processing toolkits to electronic health records - an experience report. Stud Health Technol Inform. 2009; 143:441–6.PubMed
8.
go back to reference Friedman C, Kra P, Rzhetsky A. Two biomedical sublanguages: A description based on the theories of Zellig Harris. J Biomed Inform. 2002; 35(4):222–35.CrossRefPubMed Friedman C, Kra P, Rzhetsky A. Two biomedical sublanguages: A description based on the theories of Zellig Harris. J Biomed Inform. 2002; 35(4):222–35.CrossRefPubMed
9.
go back to reference Perera G, Broadbent M, Callard F, Chang C-K, Downs J, Dutta R, Fernandes A, Hayes RD, Henderson M, Jackson R, Jewell A, Kadra G, Little R, Pritchard M, Shetty H, Tulloch A, Stewart R. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: Current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open. 2016; 6(3):008721.CrossRef Perera G, Broadbent M, Callard F, Chang C-K, Downs J, Dutta R, Fernandes A, Hayes RD, Henderson M, Jackson R, Jewell A, Kadra G, Little R, Pritchard M, Shetty H, Tulloch A, Stewart R. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: Current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open. 2016; 6(3):008721.CrossRef
10.
go back to reference Jones KH, Ford DV, Jones C, Dsilva R, Thompson S, Brooks CJ, Heaven ML, Thayer DS, McNerney CL, Lyons RA. A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: A privacy-protecting remote access system for health-related research and evaluation. J Biomed Inform. 2014; 50:196–204.CrossRefPubMedPubMedCentral Jones KH, Ford DV, Jones C, Dsilva R, Thompson S, Brooks CJ, Heaven ML, Thayer DS, McNerney CL, Lyons RA. A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: A privacy-protecting remote access system for health-related research and evaluation. J Biomed Inform. 2014; 50:196–204.CrossRefPubMedPubMedCentral
12.
go back to reference Moen H, Ginter F, Marsi E, Peltonen L-M, Salakoski T, Salanterä S. Care episode retrieval: Distributional semantic models for information retrieval in the clinical domain. BMC Med Inform Dec Making. 2015; 15(S2). Moen H, Ginter F, Marsi E, Peltonen L-M, Salakoski T, Salanterä S. Care episode retrieval: Distributional semantic models for information retrieval in the clinical domain. BMC Med Inform Dec Making. 2015; 15(S2).
13.
go back to reference McEwan R, Melton GB, Knoll BC, Wang Y, Hultman G, Dale JL, Meyer T, Pakhomov SV. NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes. AMIA Jt Summits Transl Sci Proc. 2016; 2016:150–9.PubMedPubMedCentral McEwan R, Melton GB, Knoll BC, Wang Y, Hultman G, Dale JL, Meyer T, Pakhomov SV. NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes. AMIA Jt Summits Transl Sci Proc. 2016; 2016:150–9.PubMedPubMedCentral
14.
go back to reference Stewart R, Soremekun M, Perera G, Broadbent M, Callard F, Denis M, Hotopf M, Thornicroft G, Lovestone S. The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: Development and descriptive data. BMC Psychiatry. 2009; 9(1):51.CrossRefPubMedPubMedCentral Stewart R, Soremekun M, Perera G, Broadbent M, Callard F, Denis M, Hotopf M, Thornicroft G, Lovestone S. The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: Development and descriptive data. BMC Psychiatry. 2009; 9(1):51.CrossRefPubMedPubMedCentral
16.
go back to reference Jackson R, Kartoglu I. A Open Pipeline for Masking Patient Identifiers in Electronic Health Records, The Farr Institute International Conference 2015. 2015. Jackson R, Kartoglu I. A Open Pipeline for Masking Patient Identifiers in Electronic Health Records, The Farr Institute International Conference 2015. 2015.
17.
go back to reference Fernandes AC, Cloete D, Broadbent MT, Hayes RD, Chang C-K, Jackson RG, Roberts A, Tsang J, Soncul M, Liebscher J, Stewart R, Callard F. Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records. BMC Med Inf Decis Making. 2013; 13(1):71.CrossRef Fernandes AC, Cloete D, Broadbent MT, Hayes RD, Chang C-K, Jackson RG, Roberts A, Tsang J, Soncul M, Liebscher J, Stewart R, Callard F. Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records. BMC Med Inf Decis Making. 2013; 13(1):71.CrossRef
18.
go back to reference D’Avolio LW, Nguyen TM, Goryachev S, Fiore LD. Automated concept-level information extraction to reduce the need for custom software and rules development. J Am Med Inform Assoc. 2011; 18(5):607–13.CrossRefPubMedPubMedCentral D’Avolio LW, Nguyen TM, Goryachev S, Fiore LD. Automated concept-level information extraction to reduce the need for custom software and rules development. J Am Med Inform Assoc. 2011; 18(5):607–13.CrossRefPubMedPubMedCentral
19.
go back to reference Mattmann C, Zitting J. Tika in Action. Greenwich, CT, USA: Manning Publications Co.; 2011. Mattmann C, Zitting J. Tika in Action. Greenwich, CT, USA: Manning Publications Co.; 2011.
20.
go back to reference Smith R. An Overview of the Tesseract OCR Engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02. ICDAR ’07. Washington, DC, USA: IEEE Computer Society: 2007. p. 629–33. Smith R. An Overview of the Tesseract OCR Engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02. ICDAR ’07. Washington, DC, USA: IEEE Computer Society: 2007. p. 629–33.
21.
go back to reference Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C. Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing. PLoS ONE. 2014; 9(11):112774.CrossRef Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C. Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing. PLoS ONE. 2014; 9(11):112774.CrossRef
22.
go back to reference Friedman C, Hripcsak G. Evaluating natural language processors in the clinical domain. Development. 1998; 22:24. Friedman C, Hripcsak G. Evaluating natural language processors in the clinical domain. Development. 1998; 22:24.
23.
go back to reference Bodenreider O. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004; 32(suppl 1):267–70.CrossRef Bodenreider O. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004; 32(suppl 1):267–70.CrossRef
24.
go back to reference Neamatullah I, Douglass MM, Lehman L-wH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8(1). Neamatullah I, Douglass MM, Lehman L-wH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8(1).
25.
go back to reference Johnson AEW, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Anthony Celi L, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035.CrossRefPubMedPubMedCentral Johnson AEW, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Anthony Celi L, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035.CrossRefPubMedPubMedCentral
27.
go back to reference Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, Black GCM, Brown DL, Brudno M, Campbell J, FitzPatrick DR, Eppig JT, Jackson AP, Freson K, Girdea M, Helbig I, Hurst JA, Jähn J, Jackson LG, Kelly AM, Ledbetter DH, Mansour S, Martin CL, Moss C, Mumford A, Ouwehand WH, Park S-M, Riggs ER, Scott RH, Sisodiya S, Vooren SV, Wapner RJ, Wilkie AOM, Wright CF, Vulto-van Silfhout AT, de Leeuw N, de Vries BBA, Washingthon NL, Smith CL, Westerfield M, Schofield P, Ruef BJ, Gkoutos GV, Haendel M, Smedley D, Lewis SE, Robinson PN. The Human Phenotype Ontology project: Linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014; 42(D1):966–74.CrossRef Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, Black GCM, Brown DL, Brudno M, Campbell J, FitzPatrick DR, Eppig JT, Jackson AP, Freson K, Girdea M, Helbig I, Hurst JA, Jähn J, Jackson LG, Kelly AM, Ledbetter DH, Mansour S, Martin CL, Moss C, Mumford A, Ouwehand WH, Park S-M, Riggs ER, Scott RH, Sisodiya S, Vooren SV, Wapner RJ, Wilkie AOM, Wright CF, Vulto-van Silfhout AT, de Leeuw N, de Vries BBA, Washingthon NL, Smith CL, Westerfield M, Schofield P, Ruef BJ, Gkoutos GV, Haendel M, Smedley D, Lewis SE, Robinson PN. The Human Phenotype Ontology project: Linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014; 42(D1):966–74.CrossRef
28.
go back to reference Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. J Biomed Inform. 2001; 34(5):301–10.CrossRefPubMed Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. J Biomed Inform. 2001; 34(5):301–10.CrossRefPubMed
29.
go back to reference Groza T, Kohler S, Doelken S, Collier N, Oellrich A, Smedley D, Couto FM, Baynam G, Zankl A, Robinson PN. Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora. Database. 2015; 2015(0):005.CrossRef Groza T, Kohler S, Doelken S, Collier N, Oellrich A, Smedley D, Couto FM, Baynam G, Zankl A, Robinson PN. Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora. Database. 2015; 2015(0):005.CrossRef
30.
go back to reference Stubbs A, Uzuner Ö. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. J Biomed Inform. 2015; 58:20–29.CrossRef Stubbs A, Uzuner Ö. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. J Biomed Inform. 2015; 58:20–29.CrossRef
31.
go back to reference Aamot H, Kohl CD, Richter D, Knaup-Gregori P. Pseudonymization of patient identifiers for translational research. BMC Med Inform Dec Making. 2013;13(1). Aamot H, Kohl CD, Richter D, Knaup-Gregori P. Pseudonymization of patient identifiers for translational research. BMC Med Inform Dec Making. 2013;13(1).
33.
34.
go back to reference Munk-Jørgensen P, Okkels N, Golberg D, Ruggeri M, Thornicroft G. Fifty years’ development and future perspectives of psychiatric register research. Acta Psychiatr Scand. 2014; 130(2):87–98.CrossRefPubMed Munk-Jørgensen P, Okkels N, Golberg D, Ruggeri M, Thornicroft G. Fifty years’ development and future perspectives of psychiatric register research. Acta Psychiatr Scand. 2014; 130(2):87–98.CrossRefPubMed
35.
go back to reference Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: Data quality issues and informatics opportunities. AMIA Summits Transl Sci Proc. 2010; 2010:1–5.PubMedPubMedCentral Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: Data quality issues and informatics opportunities. AMIA Summits Transl Sci Proc. 2010; 2010:1–5.PubMedPubMedCentral
36.
go back to reference Mikkelsen G, Aasly J. Consequences of impaired data quality on information retrieval in electronic patient records. Int J Med Inform. 2005; 74(5):387–94.CrossRefPubMed Mikkelsen G, Aasly J. Consequences of impaired data quality on information retrieval in electronic patient records. Int J Med Inform. 2005; 74(5):387–94.CrossRefPubMed
37.
go back to reference Callard F, Broadbent M, Denis M, Hotopf M, Soncul M, Wykes T, Lovestone S, Stewart R. Developing a new model for patient recruitment in mental health services: A cohort study using Electronic Health Records. BMJ Open. 2014; 4(12):005654.CrossRef Callard F, Broadbent M, Denis M, Hotopf M, Soncul M, Wykes T, Lovestone S, Stewart R. Developing a new model for patient recruitment in mental health services: A cohort study using Electronic Health Records. BMJ Open. 2014; 4(12):005654.CrossRef
38.
go back to reference Ferraro JP, Daume H, DuVall SL, Chapman WW, Harkema H, Haug PJ. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J Am Med Inform Assoc. 2013; 20(5):931–9.CrossRefPubMedPubMedCentral Ferraro JP, Daume H, DuVall SL, Chapman WW, Harkema H, Haug PJ. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J Am Med Inform Assoc. 2013; 20(5):931–9.CrossRefPubMedPubMedCentral
39.
40.
go back to reference Stubbs A, Kotfila C, Uzuner Ö. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. J Biomed Inform. 2015; 58:11–9.CrossRef Stubbs A, Kotfila C, Uzuner Ö. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. J Biomed Inform. 2015; 58:11–9.CrossRef
Metadata
Title
CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital
Authors
Richard Jackson
Ismail Kartoglu
Clive Stringer
Genevieve Gorrell
Angus Roberts
Xingyi Song
Honghan Wu
Asha Agrawal
Kenneth Lui
Tudor Groza
Damian Lewsley
Doug Northwood
Amos Folarin
Robert Stewart
Richard Dobson
Publication date
01-12-2018
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2018
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-018-0623-9

Other articles of this Issue 1/2018

BMC Medical Informatics and Decision Making 1/2018 Go to the issue