Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

Open Access 01-12-2020 | Opioids | Research article

DLI-IT: a deep learning approach to drug label identification through image and text embedding

Authors: Xiangwen Liu, Joe Meehan, Weida Tong, Leihong Wu, Xiaowei Xu, Joshua Xu

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to the end consumer. Image of the label also called Display Panel or label could be used to identify illegal, illicit, unapproved and potentially dangerous drugs. Due to the time-consuming process and high labor cost of investigation, an artificial intelligence-based deep learning model is necessary for fast and accurate identification of the drugs.

Methods

In addition to image-based identification technology, we take advantages of rich text information on the pharmaceutical package insert of drug label images. In this study, we developed the Drug Label Identification through Image and Text embedding model (DLI-IT) to model text-based patterns of historical data for detection of suspicious drugs. In DLI-IT, we first trained a Connectionist Text Proposal Network (CTPN) to crop the raw image into sub-images based on the text. The texts from the cropped sub-images are recognized independently through the Tesseract OCR Engine and combined as one document for each raw image. Finally, we applied universal sentence embedding to transform these documents into vectors and find the most similar reference images to the test image through the cosine similarity.

Results

We trained the DLI-IT model on 1749 opioid and 2365 non-opioid drug label images. The model was then tested on 300 external opioid drug label images, the result demonstrated our model achieves up-to 88% of the precision in drug label identification, which outperforms previous image-based or text-based identification method by up-to 35% improvement.

Conclusion

To conclude, by combining Image and Text embedding analysis under deep learning framework, our DLI-IT approach achieved a competitive performance in advancing drug label identification.
Literature
1.
go back to reference American Society of Health-System Pharmacists (ASHP). Technical assistance bulletin on hospital drug distribution and control. Am J Hosp Pharm. 1980;37(8):1097–103. American Society of Health-System Pharmacists (ASHP). Technical assistance bulletin on hospital drug distribution and control. Am J Hosp Pharm. 1980;37(8):1097–103.
2.
go back to reference Zauner C. "Implementation and benchmarking of perceptual image hash functions." Master’s thesis, Upper Austria University of Applied Sciences. 2010. Zauner C. "Implementation and benchmarking of perceptual image hash functions." Master’s thesis, Upper Austria University of Applied Sciences. 2010.
3.
go back to reference Nagarajan SK, Saravanan S. Content-based medical image annotation and retrieval using perceptual hashing algorithm. IOSR J Eng2.4. 2012:814–8. Nagarajan SK, Saravanan S. Content-based medical image annotation and retrieval using perceptual hashing algorithm. IOSR J Eng2.4. 2012:814–8.
4.
go back to reference Wang, Jiang, et al. "Learning fine-grained image similarity with deep ranking." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. Wang, Jiang, et al. "Learning fine-grained image similarity with deep ranking." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
5.
go back to reference Gordo, Albert, et al. "Deep image retrieval: Learning global representations for image search." European conference on computer vision. Springer, Cham, 2016. Gordo, Albert, et al. "Deep image retrieval: Learning global representations for image search." European conference on computer vision. Springer, Cham, 2016.
6.
go back to reference Wan, Ji, et al. "Deep learning for content-based image retrieval: A comprehensive study." Proceedings of the 22nd ACM international conference on Multimedia. 2014. Wan, Ji, et al. "Deep learning for content-based image retrieval: A comprehensive study." Proceedings of the 22nd ACM international conference on Multimedia. 2014.
8.
go back to reference Smith, R. "An overview of the tesseract ocr engine." International Conference on Document Analysis and Recognition. 2007;2. Smith, R. "An overview of the tesseract ocr engine." International Conference on Document Analysis and Recognition. 2007;2.
9.
go back to reference Chen Y, Mohammed JZ. "Kate: K-competitive autoencoder for text." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017. Chen Y, Mohammed JZ. "Kate: K-competitive autoencoder for text." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017.
10.
go back to reference Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013). Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
11.
go back to reference Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
12.
go back to reference Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018). Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).
13.
go back to reference Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018). Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
14.
go back to reference Lee J, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.PubMed Lee J, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.PubMed
15.
go back to reference Long S, et al. Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. Long S, et al. Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018.
16.
go back to reference Cai, Chenqin, Pin Lv, and Bing Su. "Feature Fusion Network for Scene Text Detection." 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018. Cai, Chenqin, Pin Lv, and Bing Su. "Feature Fusion Network for Scene Text Detection." 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018.
17.
go back to reference Liao, Minghui, et al. “Textboxes: A fast text detector with a single deep neural network.” Thirty-First AAAI Conference on Artificial Intelligence. 2017. Liao, Minghui, et al. “Textboxes: A fast text detector with a single deep neural network.” Thirty-First AAAI Conference on Artificial Intelligence. 2017.
18.
go back to reference Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao. “Detecting Text in Natural Image with Connectionist Text Proposal Network.” ECCV (8) 2016: 56–72. Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao. “Detecting Text in Natural Image with Connectionist Text Proposal Network.” ECCV (8) 2016: 56–72.
19.
go back to reference Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
20.
go back to reference Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005;18(5–6):602–10.CrossRef Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005;18(5–6):602–10.CrossRef
21.
go back to reference Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: Icdar 2015 competition on robust reading (2015), in international conference on document analysis and recognition (ICDAR). 2015. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: Icdar 2015 competition on robust reading (2015), in international conference on document analysis and recognition (ICDAR). 2015.
22.
go back to reference Ho TL, Seung-Rohk O, Kim HJ. A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations. PLoS One. 2017;12(10):e0186251.CrossRef Ho TL, Seung-Rohk O, Kim HJ. A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations. PLoS One. 2017;12(10):e0186251.CrossRef
23.
go back to reference Umar R, Hendriana Y, Budiyono E. Implementation of Levenshtein distance algorithm for E-commerce of Bravoisitees Distro. IJCTT. 2015;27(3):131–6.CrossRef Umar R, Hendriana Y, Budiyono E. Implementation of Levenshtein distance algorithm for E-commerce of Bravoisitees Distro. IJCTT. 2015;27(3):131–6.CrossRef
24.
go back to reference Behara, Krishna NS, and Ashish Bhaskar, Edward Chung. "Levenshtein distance for the structural comparison of OD matrices." Australasian Transport Research Forum (ATRF), 40th, 2018, Darwin, northern territory, Australia 2018. Behara, Krishna NS, and Ashish Bhaskar, Edward Chung. "Levenshtein distance for the structural comparison of OD matrices." Australasian Transport Research Forum (ATRF), 40th, 2018, Darwin, northern territory, Australia 2018.
25.
go back to reference Cer, Daniel, et al. "Universal sentence encoder." arXiv preprint arXiv:1803.11175 (2018). Cer, Daniel, et al. "Universal sentence encoder." arXiv preprint arXiv:1803.11175 (2018).
26.
go back to reference Iyyer, Mohit, et al. "Deep unordered composition rivals syntactic methods for text classification." Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015. Iyyer, Mohit, et al. "Deep unordered composition rivals syntactic methods for text classification." Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015.
Metadata
Title
DLI-IT: a deep learning approach to drug label identification through image and text embedding
Authors
Xiangwen Liu
Joe Meehan
Weida Tong
Leihong Wu
Xiaowei Xu
Joshua Xu
Publication date
01-12-2020
Publisher
BioMed Central
Keywords
Opioids
Opioids
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-020-1078-3

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue