Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2018

Open Access 01-12-2018 | Research article

Automated extraction of Biomarker information from pathology reports

Authors: Jeongeun Lee, Hyun-Je Song, Eunsil Yoon, Seong-Bae Park, Sung-Hye Park, Jeong-Wook Seo, Peom Park, Jinwook Choi

Published in: BMC Medical Informatics and Decision Making | Issue 1/2018

Login to get access

Abstract

Background

Pathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports.

Methods

We designed a new data model for representing biomarker knowledge. The automated system parses immunohistochemistry reports based on a “slide paragraph” unit defined as a set of immunohistochemistry findings obtained for the same tissue slide. Pathology reports are parsed using context-free grammar for immunohistochemistry, and using a tree-like structure for surgical pathology. The performance of the approach was validated on manually annotated pathology reports of 100 randomly selected patients managed at Seoul National University Hospital.

Results

High F-scores were obtained for parsing biomarker name and corresponding test results (0.999 and 0.998, respectively) from the immunohistochemistry reports, compared to relatively poor performance for parsing surgical pathology findings. However, applying the proposed approach to our single-center dataset revealed information on 221 unique biomarkers, which represents a richer result than biomarker profiles obtained based on the published literature. Owing to the data representation model, the proposed approach can associate biomarker profiles extracted from an immunohistochemistry report with corresponding pathology findings listed in one or more surgical pathology reports. Term variations are resolved by normalization to corresponding preferred terms determined by expanded dictionary look-up and text similarity-based search.

Conclusions

Our proposed approach for biomarker data extraction addresses key limitations regarding data representation and can handle reports prepared in the clinical setting, which often contain incomplete sentences, typographical errors, and inconsistent formatting.
Appendix
Available only for authorised users
Literature
1.
go back to reference Abeloff MD, Armitage JO, Niederhuber J, Kastan M, McKenna W. Abeloff's clinical oncology. Philadelphia: Churchill Livingstone/Elsevier; 2008. Abeloff MD, Armitage JO, Niederhuber J, Kastan M, McKenna W. Abeloff's clinical oncology. Philadelphia: Churchill Livingstone/Elsevier; 2008.
2.
go back to reference Biomarkers Definitions Working G. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001;69(3):89–95.CrossRef Biomarkers Definitions Working G. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001;69(3):89–95.CrossRef
3.
go back to reference Ray P, Le Manach Y, Riou B, Houle TT. Statistical evaluation of a biomarker. Anesthesiology. 2010;112(4):1023–40.CrossRefPubMed Ray P, Le Manach Y, Riou B, Houle TT. Statistical evaluation of a biomarker. Anesthesiology. 2010;112(4):1023–40.CrossRefPubMed
4.
go back to reference Evans DG, Lalloo F, Cramer A, Jones EA, Knox F, Amir E, Howell A. Addition of pathology and biomarker information significantly improves the performance of the Manchester scoring system for BRCA1 and BRCA2 testing. J Med Genet. 2009;46(12):811–7.CrossRefPubMed Evans DG, Lalloo F, Cramer A, Jones EA, Knox F, Amir E, Howell A. Addition of pathology and biomarker information significantly improves the performance of the Manchester scoring system for BRCA1 and BRCA2 testing. J Med Genet. 2009;46(12):811–7.CrossRefPubMed
5.
go back to reference Siegal T. Clinical impact of molecular biomarkers in gliomas. J Clin Neurosci. 2015;22(3):437–44.CrossRefPubMed Siegal T. Clinical impact of molecular biomarkers in gliomas. J Clin Neurosci. 2015;22(3):437–44.CrossRefPubMed
6.
go back to reference Yotsukura S, Mamitsuka H. Evaluation of serum-based cancer biomarkers: a brief review from a clinical and computational viewpoint. Crit Rev Oncol Hematol. 2015;93(2):103–15.CrossRefPubMed Yotsukura S, Mamitsuka H. Evaluation of serum-based cancer biomarkers: a brief review from a clinical and computational viewpoint. Crit Rev Oncol Hematol. 2015;93(2):103–15.CrossRefPubMed
8.
go back to reference Yun JM, Hwang SJ, Anh SY, Lee SM, Kang P, Lee JE, Yoon ES, Choi JW, Park SH, Seo JW, et al. Development of biomarker positivity analysis system for cancer diagnosis based on clinical data. Biomed Mater Eng. 2015;26(Suppl 1):S2101–11.PubMed Yun JM, Hwang SJ, Anh SY, Lee SM, Kang P, Lee JE, Yoon ES, Choi JW, Park SH, Seo JW, et al. Development of biomarker positivity analysis system for cancer diagnosis based on clinical data. Biomed Mater Eng. 2015;26(Suppl 1):S2101–11.PubMed
9.
go back to reference Younesi E, Toldo L, Muller B, Friedrich CM, Novac N, Scheer A, Hofmann-Apitius M, Fluck J. Mining biomarker information in biomedical literature. BMC Med Inform Decision Making. 2012;12:148.CrossRef Younesi E, Toldo L, Muller B, Friedrich CM, Novac N, Scheer A, Hofmann-Apitius M, Fluck J. Mining biomarker information in biomedical literature. BMC Med Inform Decision Making. 2012;12:148.CrossRef
10.
go back to reference Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J. ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics. 2005;6(Suppl 1):S14.CrossRefPubMedPubMedCentral Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J. ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics. 2005;6(Suppl 1):S14.CrossRefPubMedPubMedCentral
11.
go back to reference Bravo A, Cases M, Queralt-Rosinach N, Sanz F, Furlong LI. A knowledge-driven approach to extract disease-related biomarkers from the literature. Biomed Res Int. 2014;2014:253128.CrossRefPubMedPubMedCentral Bravo A, Cases M, Queralt-Rosinach N, Sanz F, Furlong LI. A knowledge-driven approach to extract disease-related biomarkers from the literature. Biomed Res Int. 2014;2014:253128.CrossRefPubMedPubMedCentral
12.
go back to reference Xu H, Anderson K, Grann VR, Friedman C. Facilitating cancer research using natural language processing of pathology reports. Stud Health Technol Inform. 2004;107(Pt 1):565–72.PubMed Xu H, Anderson K, Grann VR, Friedman C. Facilitating cancer research using natural language processing of pathology reports. Stud Health Technol Inform. 2004;107(Pt 1):565–72.PubMed
13.
go back to reference Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S. Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc. 2010;17(4):440–5.CrossRefPubMedPubMedCentral Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S. Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc. 2010;17(4):440–5.CrossRefPubMedPubMedCentral
14.
go back to reference Napolitano G, Fox C, Middleton R, Connolly D. Pattern-based information extraction from pathology reports for cancer registration. Cancer Causes Control. 2010;21(11):1887–94.CrossRefPubMed Napolitano G, Fox C, Middleton R, Connolly D. Pattern-based information extraction from pathology reports for cancer registration. Cancer Causes Control. 2010;21(11):1887–94.CrossRefPubMed
15.
go back to reference Martinez D, Li Y. Information extraction from pathology reports in a hospital setting. In: International conference on information and knowledge management, proceedings, vol. 2011; 2011. p. 1877–82. Martinez D, Li Y. Information extraction from pathology reports in a hospital setting. In: International conference on information and knowledge management, proceedings, vol. 2011; 2011. p. 1877–82.
16.
go back to reference Buckley JM, Coopey SB, Sharko J, Polubriaginof F, Drohan B, Belli AK, Kim EM, Garber JE, Smith BL, Gadd MA, et al. The feasibility of using natural language processing to extract clinical information from breast pathology reports. J Pathol Inform. 2012;3(1):23.CrossRefPubMedPubMedCentral Buckley JM, Coopey SB, Sharko J, Polubriaginof F, Drohan B, Belli AK, Kim EM, Garber JE, Smith BL, Gadd MA, et al. The feasibility of using natural language processing to extract clinical information from breast pathology reports. J Pathol Inform. 2012;3(1):23.CrossRefPubMedPubMedCentral
17.
go back to reference Strauss JA, Chao CR, Kwan ML, Ahmed SA, Schottinger JE, Quinn VP. Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm. J Am Med Inform Assoc. 2013;20(2):349–55.CrossRefPubMed Strauss JA, Chao CR, Kwan ML, Ahmed SA, Schottinger JE, Quinn VP. Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm. J Am Med Inform Assoc. 2013;20(2):349–55.CrossRefPubMed
18.
go back to reference Spasic I, Livsey J, Keane JA, Nenadic G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform. 2014;83(9):605–23.CrossRefPubMed Spasic I, Livsey J, Keane JA, Nenadic G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform. 2014;83(9):605–23.CrossRefPubMed
19.
go back to reference Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen PC. Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model. J Biomed Inform. 2009;42(5):937–49.CrossRefPubMed Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen PC. Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model. J Biomed Inform. 2009;42(5):937–49.CrossRefPubMed
20.
go back to reference Ashish N, Dahm L, Boicey C. Pathology extraction pipeline: the pathology extraction pipeline for information extraction from pathology reports. Health Inform J. 2014;20(4):288–305.CrossRef Ashish N, Dahm L, Boicey C. Pathology extraction pipeline: the pathology extraction pipeline for information extraction from pathology reports. Health Inform J. 2014;20(4):288–305.CrossRef
21.
go back to reference Moon S, McInnes B, Melton GB. Challenges and practical approaches with word sense disambiguation of acronyms and abbreviations in the clinical domain. Healthc Inform Res. 2015;21(1):35–42.CrossRefPubMedPubMedCentral Moon S, McInnes B, Melton GB. Challenges and practical approaches with word sense disambiguation of acronyms and abbreviations in the clinical domain. Healthc Inform Res. 2015;21(1):35–42.CrossRefPubMedPubMedCentral
22.
go back to reference Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.CrossRefPubMedPubMedCentral Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.CrossRefPubMedPubMedCentral
23.
26.
go back to reference Hammond MEH, Hayes DF, Dowsett M, Allred DC, Hagerty KL, Badve S, Fitzgibbons PL, Francis G, Goldstein NS, Hayes M, et al. American Society of Clinical Oncology/College of American Pathologists Guideline Recommendations for Immunohistochemical testing of estrogen and progesterone receptors in breast Cancer. J Clin Oncol. 2010;28(16):2784–95.CrossRefPubMedPubMedCentral Hammond MEH, Hayes DF, Dowsett M, Allred DC, Hagerty KL, Badve S, Fitzgibbons PL, Francis G, Goldstein NS, Hayes M, et al. American Society of Clinical Oncology/College of American Pathologists Guideline Recommendations for Immunohistochemical testing of estrogen and progesterone receptors in breast Cancer. J Clin Oncol. 2010;28(16):2784–95.CrossRefPubMedPubMedCentral
27.
go back to reference Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J. brat: a Web-based Tool for NLP-Assisted Text Annotation. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, April 2012; Avignon: Association for Computational Linguistics. 2012;102-107. Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J. brat: a Web-based Tool for NLP-Assisted Text Annotation. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, April 2012; Avignon: Association for Computational Linguistics. 2012;102-107.
28.
go back to reference Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, Kouril M, Marsolo K, Solti I. Building gold standard corpora for medical natural language processing tasks. In: AMIA annual symposium proceedings. Chicago: American Medical Informatics Association; 2012;144-53. Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, Kouril M, Marsolo K, Solti I. Building gold standard corpora for medical natural language processing tasks. In: AMIA annual symposium proceedings. Chicago: American Medical Informatics Association; 2012;144-53.
Metadata
Title
Automated extraction of Biomarker information from pathology reports
Authors
Jeongeun Lee
Hyun-Je Song
Eunsil Yoon
Seong-Bae Park
Sung-Hye Park
Jeong-Wook Seo
Peom Park
Jinwook Choi
Publication date
01-12-2018
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2018
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-018-0609-7

Other articles of this Issue 1/2018

BMC Medical Informatics and Decision Making 1/2018 Go to the issue