ABSTRACT
Medical coding or classification is the process of transforming information contained in patient medical records into standard predefined medical codes. There are several worldwide accepted medical coding conventions associated with diagnoses and medical procedures; however, in the United States the Ninth Revision of ICD(ICD-9) provides the standard for coding clinical records. Accurate medical coding is important since it is used by hospitals for insurance billing purposes. Since after discharge a patient can be assigned or classified to several ICD-9 codes, the coding problem can be seen as a multi-label classification problem. In this paper, we introduce a multi-label large-margin classifier that automatically learns the underlying inter-code structure and allows the controlled incorporation of prior knowledge about medical code relationships. In addition to refining and learning the code relationships, our classifier can also utilize this shared information to improve its performance. Experiments on a publicly available dataset containing clinical free text and their associated medical codes showed that our proposed multi-label classifier outperforms related multi-label models in this problem.
Supplemental Material
- C. Benesch, D. W. Jr, A. Wilder, P. Duncan, G. Samsa, and D. Matchar. Inaccuracy of the international classification of diseases ICD-9-cm in identifying the diagnosis of ischemic cerebrovascular disease. Neurology, 1997.Google Scholar
- M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classification. Pattern Recognition, 37:9:1757--71, 2004.Google ScholarCross Ref
- K. Crammer and Y. Singer. A new family of online algorithms for category ranking. In ACM SIGIR, pages 151--158, 2002. Google ScholarDigital Library
- I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269--274, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In In Advances in Neural Information Processing Systems 14, pages 681--687. MIT Press, 2001.Google Scholar
- I. Guyon and A. Elisseeff. An introduction to variable and feature selection. JMLR, 3:1157--1182, 2003. Google ScholarDigital Library
- http://www.icd9coding.com/.Google Scholar
- R. B. Jean, J. Charles, and G. J. Nocedal. A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming, 89:149--185, 1996.Google Scholar
- L. Larkey and W. B. Croft. Automatic assignment of ICD9 codes to discharge summaries. IR IR-64, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, 1995.Google Scholar
- L. Lita, S. Yu, S. Niculescu, and J. Bi. Large scale diagnostic code classification for medical patient records. In AIME, pages 331--339, 1995.Google Scholar
- C. Lovis, P. Michel, R. Baud, and J. Scherrer. Use of a conceptual semi-automatic ICD-9 encoding system in a hospital environment. In AIME, pages 331--339, 1995. Google ScholarDigital Library
- J. Nocedal and S. Wright. Numerical Optimization (2nd ed.). Springer-Verlag, Berlin, New York, 2003.Google Scholar
- R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. In Machine Learning, pages 135--168, 2000. Google ScholarDigital Library
- M. Schmidt, G. Fung, and R. Rosales. Fast optimization methods for l1 regularization: A comparative study and two new approaches. In ECML '07: Proceedings of the 18th European conference on Machine Learning, pages 286--297, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- A. Sonel, C. Good, H. Rao, A. Macioce, L. Wall, R. Niculescu, S. Sandilya, P. Giang, S. Krishnan, P. Aloni, and R. Rao. Use of REMIND artificial intelligence software for rapid assessment of adherence to disease specific management guidelines in acute coronary syndromes. AHRQ, 2006.Google Scholar
- J. Weston, A. Elisseeff, B. Scholkopf, and M. Tipping. Use of the zero norm with linear models and kernel methods. JMLR, 3:1439--1461, 2003. Google ScholarDigital Library
- M.-L. Zhang. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. on Knowl. and Data Eng., 18(10):1338--1351, 2006. Google ScholarDigital Library
- M.-L. Zhang and Z.-H. Zhou. A k-nearest neighbor based algorithm for multi-label classification. In IEEE International Conference on Granular Computing, volume 2, pages 718--721 Vol. 2. The IEEE Computational Intelligence Society, 2005.Google Scholar
Index Terms
- Medical coding classification by leveraging inter-code relationships
Recommendations
Improving Medical Code Prediction from Clinical Text via Incorporating Online Knowledge Sources
WWW '19: The World Wide Web ConferenceClinical notes contain detailed information about health status of patients for each of their encounters with a health system. Developing effective models to automatically assign medical codes to clinical notes has been a long-standing active research ...
Coding Medical Information: Classification Versus Nomenclature and Implications to the Israeli Medical System
The efficient retrieval of medical information is essential for all functional aspects of a health system. Such retrieval is possible only by coding data (as it is produced or after it is produced) and entering it into a data-base. The completeness and ...
Neural transfer learning for assigning diagnosis codes to EMRs
Highlights- Transfer learning using convolutional neural networks improves multi-label learning.
Abstract ObjectiveElectronic medical records (EMRs) are manually annotated by healthcare professionals and specialized medical coders with a standardized set of alphanumeric diagnosis and procedure codes, specifically from the ...
Comments