Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2019 | Research article

Detection of medical text semantic similarity based on convolutional neural network

Authors: Tao Zheng, Yimei Gao, Fei Wang, Chenhao Fan, Xingzhi Fu, Mei Li, Ya Zhang, Shaodian Zhang, Handong Ma

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Abstract

Background

Imaging examinations, such as ultrasonography, magnetic resonance imaging and computed tomography scans, play key roles in healthcare settings. To assess and improve the quality of imaging diagnosis, we need to manually find and compare the pre-existing reports of imaging and pathology examinations which contain overlapping exam body sites from electrical medical records (EMRs). The process of retrieving those reports is time-consuming. In this paper, we propose a convolutional neural network (CNN) based method which can better utilize semantic information contained in report texts to accelerate the retrieving process.

Methods

We included 16,354 imaging and pathology report-pairs from 1926 patients who admitted to Shanghai Tongren Hospital and had ultrasonic examinations between 1st May 2017 and 31st July 2017. We adapted the CNN model to calculate the similarities among the report-pairs to identify target report-pairs with overlapping body sites, and compared the performance with other six conventional models, including keyword mapping, latent semantic analysis (LSA), latent Dirichlet allocation (LDA), Doc2Vec, Siamese long short term memory (LSTM) and a model based on named entity recognition (NER). We also utilized graph embedding method to enhance the word representation by capturing the semantic relations information from medical ontologies. Additionally, we used LIME algorithm to identify which features (or words) are decisive for the prediction results and improved the model interpretability.

Results

Experiment results showed that our CNN model gained significant improvement compared to all other conventional models on area under the receiver operating characteristic (AUROC), precision, recall and F1-score in our test dataset. The AUROC of our CNN models gained approximately 3–7% improvement. The AUROC of CNN model with graph-embedding and ontology based medical concept vectors was 0.8% higher than the model with randomly initialized vectors and 1.5% higher than the one with pre-trained word vectors.

Conclusion

Our study demonstrates that CNN model with pre-trained medical concept vectors could accurately identify target report-pairs with overlapping body sites and potentially accelerate the retrieving process for imaging diagnosis quality measurement.

https://github.com/fxsjy/jieba

https://github.com/Embedding/Chinese-Word-Vectors

https://www.ibm.Com/blogs/watson-health/introducing-ibm-watson-imaging-clinical-review

Brady AP. Error and discrepancy in radiology: inevitable or avoidable?[J]. Insights Imaging. 2017;8(1):171–82.CrossRef

Bruno MA, Walker EA, Abujudeh HH. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction[J]. Radiographics. 2015;35(6):1668–76.CrossRef

He H, Gimpel K, Lin J. Multi-perspective sentence similarity modeling with convolutional neural networks[C]//proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing; 2015. p. 1576–86.CrossRef

Ye X, Shen H, Ma X, et al. From word embeddings to document similarities for improved information retrieval in software engineering[C]//proceedings of the 38th international conference on software engineering. Austin: ACM; 2016:404–415.

Salton G, Wong A, Yang CS. A vector space model for automatic indexing [J]. Commun ACM. 1975;18(11):613–20.CrossRef

Deerwester S, Dumais ST, Furnas GW, et al. Indexing by latent semantic analysis[J]. J Am Soc Inf Sci. 1990;41(6):391–407.CrossRef

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation [J]. J Mach Learn Res. 2003;3:993–1022.

Yih W, Toutanova K, Platt J C, et al. Learning discriminative projections for text similarity measures[C]//proceedings of the fifteenth conference on computational natural language learning. Portland: Association for Computational Linguistics; 2011:247–256.

Guo Q. The similarity computing of documents based on VSM[C]//international conference on network-based information systems. Berlin: Springer; 2008. p. 142–8.CrossRef

10.

Wang ZZ, He M, Du YP. Text similarity computing based on topic model LDA[J]. Computer science. 2013;40(12):229–32.

11.

Kusner M J, Sun Y, Kolkin N I, et al. From word Embeddings to document distances [C]//proceedings of the 32nd international conference on Machine Learning. 2015.

12.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]. Adv Neural Inf Proces Syst. 2012;25:1097–105. https://doi.org/10.1145/3065386.CrossRef

13.

Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[J]. arXiv preprint arXiv:1404.2188, 2014.

14.

Shen Y, He X, Gao J, et al. Learning semantic representations using convolutional neural networks for web search[C]//proceedings of the 23rd international conference on world wide web. Seoul: ACM; 2014. p. 373–4.

15.

Yih W, He X, Meek C. Semantic parsing for single-relation question answering[C]//proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2; 2014. p. 643–8.

16.

Hu B, Lu Z, Li H, et al. Convolutional neural network architectures for matching natural language sentences[C]//advances in neural information processing systems; 2014. p. 2042–50.

17.

Severyn A, Moschitti A. Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and de-velopment in information retrieval. Santiago: ACM; 2015. p. 373–82.

18.

Yin W, Schütze H, Xiang B, Zhou B. Abcnn: attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Computational Linguis-tics. 2016;4:259–72.CrossRef

19.

Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z. Dbpedia: a nucleus for a web of open data. In: The semantic web Springer; 2007. p. 722–35.CrossRef

20.

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. 2016. https://doi.org/10.1145/2939672.2939754 arXiv:1607.00653. Accessed 06 Aug 2019

21.

Le Q, Mikolov T. Distributed representations of sentences and documents. In: International conference on machine learning; 2014. p. 1188–96.

22.

Mueller J, Thyagarajan A. Siamese recurrent architectures for learning sentence similarity[C]//thirtieth AAAI conference on artificial intelligence; 2016.

23.

Wu Y, Jiang M, Lei J, Xu H. Named entity recognition in Chinese clinical text using deep neural network. Stud Health Technol Inform. 2015;216:624.PubMedPubMedCentral

24.

Ribeiro MT, Singh S, Guestrin C. Why should i trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining: ACM; 2016.

25.

Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (NIPS) 2013:3111–3119. Lake Tahoe, Nevada, United States.

26.

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 2013.

27.

Wang F, Casalino LP, Khullar D. Deep learning in medicine—promise, Progress, and challenges. JAMA Intern Med. 2019;179(3):293–94.CrossRef

Title: Detection of medical text semantic similarity based on convolutional neural network
Authors: Tao Zheng
Yimei Gao
Fei Wang
Chenhao Fan
Xingzhi Fu
Mei Li
Ya Zhang
Shaodian Zhang
Handong Ma
Publication date: 01-12-2019
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-019-0880-2

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Detection of medical text semantic similarity based on convolutional neural network

Abstract

Background

Methods

Results

Conclusion

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Abstract

Background

Methods

Results

Conclusion

Please log in to get access to this content

Other articles of this Issue 1/2019

“You have to know why you're doing this”: a mixed methods study of the benefits and burdens of self-tracking in Parkinson's disease

Development process of a mobile electronic medical record for nurses: a single case study

Quality analysis of smart phone sleep apps in China: can apps be used to conveniently screen for obstructive sleep apnea at home?

Correction to: The past, present and future of opioid withdrawal assessment: a scoping review of scales and technologies

Falls Sensei: a serious 3D exploration game to enable the detection of extrinsic home fall hazards for older adults

Integrating patient perspectives in medical decision-making: a qualitative interview study examining potentials within the rare disease information exchange process in practice