Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 5/2019

Open Access 01-12-2019 | Lung Cancer | Research

Natural language processing for populating lung cancer clinical research data

Authors: Liwei Wang, Lei Luo, Yanshan Wang, Jason Wampfler, Ping Yang, Hongfang Liu

Published in: BMC Medical Informatics and Decision Making | Special Issue 5/2019

Login to get access

Abstract

Background

Lung cancer is the second most common cancer for men and women; the wide adoption of electronic health records (EHRs) offers a potential to accelerate cohort-related epidemiological studies using informatics approaches. Since manual extraction from large volumes of text materials is time consuming and labor intensive, some efforts have emerged to automatically extract information from text for lung cancer patients using natural language processing (NLP), an artificial intelligence technique.

Methods

In this study, using an existing cohort of 2311 lung cancer patients with information about stage, histology, tumor grade, and therapies (chemotherapy, radiotherapy and surgery) manually ascertained, we developed and evaluated an NLP system to extract information on these variables automatically for the same patients from clinical narratives including clinical notes, pathology reports and surgery reports.

Results

Evaluation showed promising results with the recalls for stage, histology, tumor grade, and therapies achieving 89, 98, 78, and 100% respectively and the precisions were 70, 88, 90, and 100% respectively.

Conclusion

This study demonstrated the feasibility and accuracy of automatically extracting pre-defined information from clinical narratives for lung cancer research.
Appendix
Available only for authorised users
Literature
2.
go back to reference Yang P. Epidemiology of lung cancer prognosis: quantity and quality of life. In: Cancer Epidemiology: Humana Press; 2009. p. 469–86. Yang P. Epidemiology of lung cancer prognosis: quantity and quality of life. In: Cancer Epidemiology: Humana Press; 2009. p. 469–86.
4.
go back to reference Besse B, Ropert S, Soria J. Targeted therapies in lung cancer. Ann Oncol. 2007;18(suppl_9):ix135–42.PubMed Besse B, Ropert S, Soria J. Targeted therapies in lung cancer. Ann Oncol. 2007;18(suppl_9):ix135–42.PubMed
5.
go back to reference Bie F, Qu X, Yang X, Pang Z, Yang Y, Liu S, Dong W, Du J. Appropriate surgical modalities for stages T2a and T2b in the eighth TNM classification of lung cancer. Sci Rep. 2017;7(1):13050.PubMedPubMedCentralCrossRef Bie F, Qu X, Yang X, Pang Z, Yang Y, Liu S, Dong W, Du J. Appropriate surgical modalities for stages T2a and T2b in the eighth TNM classification of lung cancer. Sci Rep. 2017;7(1):13050.PubMedPubMedCentralCrossRef
7.
go back to reference Cetin K, Ettinger DS, Y-j H, D O Malley C. Survival by histologic subtype in stage IV nonsmall cell lung cancer based on data from the surveillance, Epidemiology and End Results Program. Clin Epidemiol. 2011;3:139.PubMedPubMedCentralCrossRef Cetin K, Ettinger DS, Y-j H, D O Malley C. Survival by histologic subtype in stage IV nonsmall cell lung cancer based on data from the surveillance, Epidemiology and End Results Program. Clin Epidemiol. 2011;3:139.PubMedPubMedCentralCrossRef
8.
go back to reference Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: a review of methods and applications. Annu Rev Public Health. 2016;37:61–81.PubMedCrossRef Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: a review of methods and applications. Annu Rev Public Health. 2016;37:61–81.PubMedCrossRef
9.
go back to reference Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S. Clinical information extraction applications: a literature review. J Biomed Inform. 2018;77:34–49.PubMedCrossRef Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S. Clinical information extraction applications: a literature review. J Biomed Inform. 2018;77:34–49.PubMedCrossRef
10.
go back to reference Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S. Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc. 2010;17(4):440–5.PubMedPubMedCentralCrossRef Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S. Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc. 2010;17(4):440–5.PubMedPubMedCentralCrossRef
11.
go back to reference Warner JL, Levy MA, Neuss MN, Warner JL, Levy MA, Neuss MN. ReCAP: feasibility and accuracy of extracting cancer stage information from narrative electronic health record data. J Oncol Pract. 2015;12(2):157–8.PubMedCrossRef Warner JL, Levy MA, Neuss MN, Warner JL, Levy MA, Neuss MN. ReCAP: feasibility and accuracy of extracting cancer stage information from narrative electronic health record data. J Oncol Pract. 2015;12(2):157–8.PubMedCrossRef
12.
go back to reference Zheng S, Jabbour SK, O'Reilly SE, Lu JJ, Dong L, Ding L, Xiao Y, Yue N, Wang F, Zou W. Automated information extraction on treatment and prognosis for non–small cell lung Cancer radiotherapy patients: clinical study. JMIR Med Inform. 2018;6(1):e8.PubMedPubMedCentralCrossRef Zheng S, Jabbour SK, O'Reilly SE, Lu JJ, Dong L, Ding L, Xiao Y, Yue N, Wang F, Zou W. Automated information extraction on treatment and prognosis for non–small cell lung Cancer radiotherapy patients: clinical study. JMIR Med Inform. 2018;6(1):e8.PubMedPubMedCentralCrossRef
13.
go back to reference Soysal E, Warner JL, Denny JC, Xu H. Identifying metastases-related information from pathology reports of lung Cancer patients. AMIA Summits Transl Sci Proc. 2017;2017:268.PubMed Soysal E, Warner JL, Denny JC, Xu H. Identifying metastases-related information from pathology reports of lung Cancer patients. AMIA Summits Transl Sci Proc. 2017;2017:268.PubMed
14.
go back to reference Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G. DeepPhe: a natural language processing system for extracting Cancer phenotypes from clinical records. Cancer Res. 2017;77(21):e115–8.PubMedPubMedCentralCrossRef Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G. DeepPhe: a natural language processing system for extracting Cancer phenotypes from clinical records. Cancer Res. 2017;77(21):e115–8.PubMedPubMedCentralCrossRef
15.
go back to reference Carrell DS, Halgrim S, Tran D-T, Buist DS, Chubak J, Chapman WW, Savova G. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014;179(6):749–58.PubMedPubMedCentralCrossRef Carrell DS, Halgrim S, Tran D-T, Buist DS, Chubak J, Chapman WW, Savova G. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014;179(6):749–58.PubMedPubMedCentralCrossRef
16.
go back to reference Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar K, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits Transl Sci Proc. 2013;2013:149.PubMed Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar K, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits Transl Sci Proc. 2013;2013:149.PubMed
17.
go back to reference Travis WD, Brambilla E, Nicholson AG, Yatabe Y, Austin JH, Beasley MB, Chirieac LR, Dacic S, Duhig E, Flieder DB. The 2015 World Health Organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification. J Thorac Oncol. 2015;10(9):1243–60.PubMedCrossRef Travis WD, Brambilla E, Nicholson AG, Yatabe Y, Austin JH, Beasley MB, Chirieac LR, Dacic S, Duhig E, Flieder DB. The 2015 World Health Organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification. J Thorac Oncol. 2015;10(9):1243–60.PubMedCrossRef
18.
go back to reference Si Y, Roberts K. A frame-based NLP system for cancer-related information extraction. In: AMIA Annual Symposium Proceedings: 2018: American Medical Informatics Association; 2018. p. 1524. Si Y, Roberts K. A frame-based NLP system for cancer-related information extraction. In: AMIA Annual Symposium Proceedings: 2018: American Medical Informatics Association; 2018. p. 1524.
19.
go back to reference LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.CrossRef
20.
go back to reference Mikolov T, W-t Y, Zweig G. Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2013; 2013. p. 746–51. Mikolov T, W-t Y, Zweig G. Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2013; 2013. p. 746–51.
21.
go back to reference Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F, Liu H. A Comparison of Word Embeddings for the Biomedical Natural Language Processing. J Biomed Inform. 2018;87:12.PubMedPubMedCentralCrossRef Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F, Liu H. A Comparison of Word Embeddings for the Biomedical Natural Language Processing. J Biomed Inform. 2018;87:12.PubMedPubMedCentralCrossRef
Metadata
Title
Natural language processing for populating lung cancer clinical research data
Authors
Liwei Wang
Lei Luo
Yanshan Wang
Jason Wampfler
Ping Yang
Hongfang Liu
Publication date
01-12-2019
Publisher
BioMed Central
DOI
https://doi.org/10.1186/s12911-019-0931-8

Other articles of this Special Issue 5/2019

BMC Medical Informatics and Decision Making 5/2019 Go to the issue