Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Subarachnoid Hemorrhage | Research article

A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records

Authors: Emily Wheater, Grant Mair, Cathie Sudlow, Beatrice Alex, Claire Grover, William Whiteley

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS).

Methods

We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital.

Results

The agreement between expert readers was excellent (Cohen’s κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81–94); positive predictive value (PPV) 85% (76–90); specificity 100% (95% CI:0.99–1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80–99), PPV 72% (95% CI:55–84); specificity 100% (95% CI:0.99–1.00)]; brain tumours [sensitivity 96% (CI:87–99); PPV 84% (73–91); specificity: 100% (95% CI:0.99–1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings.

Conclusions

An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.
Literature
1.
go back to reference NHS Digital. Diagnostic imaging dataset bodysite provider counts 2016–2017. 2017. NHS Digital. Diagnostic imaging dataset bodysite provider counts 2016–2017. 2017.
2.
go back to reference Sudlow C, Gallacher J, Allen N, Beral V, Burton PP, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12(3):e1001779.CrossRef Sudlow C, Gallacher J, Allen N, Beral V, Burton PP, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12(3):e1001779.CrossRef
3.
go back to reference Gilbert EH, Lowenstein SR, Koziol-McLain J, Barta DC, Steiner J. Chart reviews in emergency medicine research: where are the methods? Ann Emerg Med. 1996;27(3):305–8.CrossRef Gilbert EH, Lowenstein SR, Koziol-McLain J, Barta DC, Steiner J. Chart reviews in emergency medicine research: where are the methods? Ann Emerg Med. 1996;27(3):305–8.CrossRef
4.
go back to reference Pons E, Braun LMM, Hunink MGM, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43.CrossRef Pons E, Braun LMM, Hunink MGM, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43.CrossRef
5.
go back to reference Castro VM, Dligach D, Finan S, Yu S, Can A, Abd-El-Barr M, et al. Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017;88(2):164–8.CrossRef Castro VM, Dligach D, Finan S, Yu S, Can A, Abd-El-Barr M, et al. Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017;88(2):164–8.CrossRef
6.
go back to reference Fu S, Leung LY, Wang Y, Raulli A-O, Kallmes DF, Kinsman KA, et al. Natural language processing for the identification of silent brain infarcts from neuroimaging reports. JMIR Med Informatics. 2019;7(2):e12109.CrossRef Fu S, Leung LY, Wang Y, Raulli A-O, Kallmes DF, Kinsman KA, et al. Natural language processing for the identification of silent brain infarcts from neuroimaging reports. JMIR Med Informatics. 2019;7(2):e12109.CrossRef
7.
go back to reference Whiteley W, Jackson C, Lewis S, Lowe G, Rumley A, Sandercock P, et al. Inflammatory markers and poor outcome after stroke: a prospective cohort study and systematic review of Interleukin-6. PLoS Med. 2009;6(9):e1000145.CrossRef Whiteley W, Jackson C, Lewis S, Lowe G, Rumley A, Sandercock P, et al. Inflammatory markers and poor outcome after stroke: a prospective cohort study and systematic review of Interleukin-6. PLoS Med. 2009;6(9):e1000145.CrossRef
8.
go back to reference Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J. BRAT: a web-based tool for NLP-assisted text annotation. Proceedings of the demonstrations at the 13th conference of the European chapter of the association for computational linguistics: Association for Computational Linguistics; 2012. p. 102–7. Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J. BRAT: a web-based tool for NLP-assisted text annotation. Proceedings of the demonstrations at the 13th conference of the European chapter of the association for computational linguistics: Association for Computational Linguistics; 2012. p. 102–7.
9.
go back to reference Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, et al. Building gold standard corpora for medical natural language processing tasks. AMIA. Annu Symp proceedings AMIA Symp. 2012;2012:144–53. Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, et al. Building gold standard corpora for medical natural language processing tasks. AMIA. Annu Symp proceedings AMIA Symp. 2012;2012:144–53.
10.
go back to reference Grover C, Matthews M, Tobin R, Grover C, Matthews M, Tobin¡ R. Tools to address the interdependence between tokenisation and standoff annotation. Proc NLPXML. 2006:19–26. Grover C, Matthews M, Tobin R, Grover C, Matthews M, Tobin¡ R. Tools to address the interdependence between tokenisation and standoff annotation. Proc NLPXML. 2006:19–26.
11.
go back to reference Curran JR, Clark S. Language independent NER using a maximum entropy tagger. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003. Morristown, NJ, USA: Association for Computational Linguistics; 2003. p. 164–7.CrossRef Curran JR, Clark S. Language independent NER using a maximum entropy tagger. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003. Morristown, NJ, USA: Association for Computational Linguistics; 2003. p. 164–7.CrossRef
12.
go back to reference Kim J-D, Ohta T, Tateisi Y, Tsujii J. GENIA corpus--a semantically annotated corpus for bio-textmining. Bioinformatics. 2003;19(Suppl 1):i180–2.CrossRef Kim J-D, Ohta T, Tateisi Y, Tsujii J. GENIA corpus--a semantically annotated corpus for bio-textmining. Bioinformatics. 2003;19(Suppl 1):i180–2.CrossRef
13.
go back to reference Minnen G, Carroll J, Pearce D. Robust, applied morphological generation. In: Proceedings of the first international conference on natural language generation - INLG ‘00. Morristown, NJ, USA: Association for Computational Linguistics; 2000. p. 201.CrossRef Minnen G, Carroll J, Pearce D. Robust, applied morphological generation. In: Proceedings of the first international conference on natural language generation - INLG ‘00. Morristown, NJ, USA: Association for Computational Linguistics; 2000. p. 201.CrossRef
14.
go back to reference Wilson EB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927;22:209–12.CrossRef Wilson EB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927;22:209–12.CrossRef
15.
go back to reference Gorinski PJ, Wu H, Grover C, Tobin R, Talbot C, Whalley H, et al. Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. arXiv:190303985v2 [csCL]. 2019 Mar 10; Gorinski PJ, Wu H, Grover C, Tobin R, Talbot C, Whalley H, et al. Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. arXiv:190303985v2 [csCL]. 2019 Mar 10;
16.
go back to reference Woodfield R, Grant I, Sudlow CLM. Accuracy of electronic health record data for identifying stroke cases in large-scale epidemiological studies: a systematic review from the UK biobank stroke outcomes group. Quinn TJ, editor. PLoS One. 2015;10(10):e0140533.CrossRef Woodfield R, Grant I, Sudlow CLM. Accuracy of electronic health record data for identifying stroke cases in large-scale epidemiological studies: a systematic review from the UK biobank stroke outcomes group. Quinn TJ, editor. PLoS One. 2015;10(10):e0140533.CrossRef
17.
go back to reference Alex B, Nissim M, Grover C. The impact of annotation on the performance of protein tagging in biomedical text. In: proceedings of the fifth international conference on language resources and evaluation (LREC’06). European Language Resources Association (ELRA). 2006:L06–1235. Alex B, Nissim M, Grover C. The impact of annotation on the performance of protein tagging in biomedical text. In: proceedings of the fifth international conference on language resources and evaluation (LREC’06). European Language Resources Association (ELRA). 2006:L06–1235.
Metadata
Title
A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records
Authors
Emily Wheater
Grant Mair
Cathie Sudlow
Beatrice Alex
Claire Grover
William Whiteley
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0908-7

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue