Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2022

Open Access 01-12-2022 | Tinnitus | Research

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)

Authors: Jia Li, Yucong Lin, Pengfei Zhao, Wenjuan Liu, Linkun Cai, Jing Sun, Lei Zhao, Zhenghan Yang, Hong Song, Han Lv, Zhenchang Wang

Published in: BMC Medical Informatics and Decision Making | Issue 1/2022

Login to get access

Abstract

Background

Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processing (NLP) has shown great potential in big data analytics of medical texts; yet, its application to domain-specific analysis of radiology reports is limited.

Objective

The aim of this study is to propose a novel approach in classifying actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer BERT-based models and evaluate the benefits of in domain pre-training (IDPT) along with a sequence adaptation strategy.

Methods

A total of 5864 temporal bone computed tomography(CT) reports are labeled by two experienced radiologists as follows: (1) normal findings without notable lesions; (2) notable lesions but uncorrelated to tinnitus; and (3) at least one lesion considered as potential cause of tinnitus. We then constructed a framework consisting of deep learning (DL) neural networks and self-supervised BERT models. A tinnitus domain-specific corpus is used to pre-train the BERT model to further improve its embedding weights. In addition, we conducted an experiment to evaluate multiple groups of max sequence length settings in BERT to reduce the excessive quantity of calculations. After a comprehensive comparison of all metrics, we determined the most promising approach through the performance comparison of F1-scores and AUC values.

Results

In the first experiment, the BERT finetune model achieved a more promising result (AUC-0.868, F1-0.760) compared with that of the Word2Vec-based models(AUC-0.767, F1-0.733) on validation data. In the second experiment, the BERT in-domain pre-training model (AUC-0.948, F1-0.841) performed significantly better than the BERT based model(AUC-0.868, F1-0.760). Additionally, in the variants of BERT fine-tuning models, Mengzi achieved the highest AUC of 0.878 (F1-0.764). Finally, we found that the BERT max-sequence-length of 128 tokens achieved an AUC of 0.866 (F1-0.736), which is almost equal to the BERT max-sequence-length of 512 tokens (AUC-0.868,F1-0.760).

Conclusion

In conclusion, we developed a reliable BERT-based framework for tinnitus diagnosis from Chinese radiology reports, along with a sequence adaptation strategy to reduce computational resources while maintaining accuracy. The findings could provide a reference for NLP development in Chinese radiology reports.
Appendix
Available only for authorised users
Literature
15.
17.
go back to reference Datta S, Ulinski M, Godfrey-Stovall J, et al. Rad-spatialnet: a frame-based resource for fine-grained spatial relations in radiology reports. LREC Int Conf Lang Resour Eval. 2020;2020:2251–60.PubMedPubMedCentral Datta S, Ulinski M, Godfrey-Stovall J, et al. Rad-spatialnet: a frame-based resource for fine-grained spatial relations in radiology reports. LREC Int Conf Lang Resour Eval. 2020;2020:2251–60.PubMedPubMedCentral
19.
go back to reference Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Systems, 2017,30. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Systems, 2017,30.
21.
go back to reference Devlin J, Chang M, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Devlin J, Chang M, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805, 2018.
23.
go back to reference Qiu X, Sun T, Xu Y, et al. Pre-trained models for natural language processing: a survey. Science China Technol Sci. 2020;63(10):1872–97.CrossRef Qiu X, Sun T, Xu Y, et al. Pre-trained models for natural language processing: a survey. Science China Technol Sci. 2020;63(10):1872–97.CrossRef
24.
25.
go back to reference Lan Z, Chen M, Goodman S, et al. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019. Lan Z, Chen M, Goodman S, et al. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:​1909.​11942, 2019.
26.
27.
go back to reference Huang K, Altosaar J, Ranganath R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342, 2019. Huang K, Altosaar J, Ranganath R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:​1904.​05342, 2019.
28.
go back to reference Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.PubMed Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.PubMed
30.
31.
32.
go back to reference Soffer S, Glicksberg BS, Zimlichman E, et al. BERT for the processing of radiological reports: an attention-based natural language processing algorithm. Acad Radiol. 2022;29(4):634–5.CrossRef Soffer S, Glicksberg BS, Zimlichman E, et al. BERT for the processing of radiological reports: an attention-based natural language processing algorithm. Acad Radiol. 2022;29(4):634–5.CrossRef
36.
go back to reference Gershanik EF, Lacson R, Khorasani R. Critical finding capture in the impression section of radiology reports. AMIA Symp. 2011;2011:465–9. Gershanik EF, Lacson R, Khorasani R. Critical finding capture in the impression section of radiology reports. AMIA Symp. 2011;2011:465–9.
37.
go back to reference Morioka C, Meng F, Taira R, et al. Automatic classification of ultrasound screening examinations of the abdominal aorta. J Digital Imaging. 2016;29(6):742–8.CrossRef Morioka C, Meng F, Taira R, et al. Automatic classification of ultrasound screening examinations of the abdominal aorta. J Digital Imaging. 2016;29(6):742–8.CrossRef
45.
go back to reference Mosbach M, Andriushchenko M, Klakow D. On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv preprint arXiv:2006.04884, 2020. Mosbach M, Andriushchenko M, Klakow D. On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv preprint arXiv:​2006.​04884, 2020.
46.
go back to reference Cui Y, Che W, Liu T, et al. Revisiting pre-trained models for Chinese natural language processing. arXiv preprint arXiv:2004.13922, 2020. Cui Y, Che W, Liu T, et al. Revisiting pre-trained models for Chinese natural language processing. arXiv preprint arXiv:​2004.​13922, 2020.
47.
go back to reference Zhang Z, Zhang H, Chen K, et al. Mengzi: towards lightweight yet ingenious pre-trained models for Chinese. arXiv preprint arXiv:2110.06696, 2021. Zhang Z, Zhang H, Chen K, et al. Mengzi: towards lightweight yet ingenious pre-trained models for Chinese. arXiv preprint arXiv:​2110.​06696, 2021.
48.
go back to reference Sun C, Qiu X, Xu Y, et al. How to fine-tune bert for text classification? In: China national conference on Chinese computational linguistics, 2019. Springer. Sun C, Qiu X, Xu Y, et al. How to fine-tune bert for text classification? In: China national conference on Chinese computational linguistics, 2019. Springer.
50.
go back to reference Lu W, Jiao J, Zhang R. Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020. Lu W, Jiao J, Zhang R. Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020.
52.
go back to reference Gregory W. Rutecki. Tinnitus recommendations: what to do when there is ringing in the Ears. Consultant. 2016;56(11):1036. Gregory W. Rutecki. Tinnitus recommendations: what to do when there is ringing in the Ears. Consultant. 2016;56(11):1036.
Metadata
Title
Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
Authors
Jia Li
Yucong Lin
Pengfei Zhao
Wenjuan Liu
Linkun Cai
Jing Sun
Lei Zhao
Zhenghan Yang
Hong Song
Han Lv
Zhenchang Wang
Publication date
01-12-2022
Publisher
BioMed Central
Keyword
Tinnitus
Published in
BMC Medical Informatics and Decision Making / Issue 1/2022
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-022-01946-y

Other articles of this Issue 1/2022

BMC Medical Informatics and Decision Making 1/2022 Go to the issue