Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2015

Open Access 01-12-2015 | Research article

PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction

Authors: Lawrence WC Chan, Ying Liu, Tao Chan, Helen KW Law, SC Cesar Wong, Andy PH Yeung, KF Lo, SW Yeung, KY Kwok, William YL Chan, Thomas YH Lau, Chi-Ren Shyu

Published in: BMC Medical Informatics and Decision Making | Issue 1/2015

Login to get access

Abstract

Background

Similarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases. Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting approach supported by PubMed search engine to address this issue.

Methods

We collected and studied 112 abdominal computed tomography imaging examination reports from four hospitals in Hong Kong. Clinical terms, which are the image findings related to hepatocellular carcinoma (HCC), were extracted from the reports. Through two systematic PubMed search methods, the generic and specific term weightings were established by estimating the conditional probabilities of clinical terms given HCC. Each report was characterized by an ontological feature vector and there were totally 6216 vector pairs. We optimized the modified direction cosine (mDC) with respect to a regularization constant embedded into the feature vector. Equal, generic and specific term weighting approaches were applied to measure the similarity of each pair and their performances for predicting inter-patient co-occurrence of HCC diagnoses were compared by using Receiver Operating Characteristics (ROC) analysis.

Results

The Areas under the curves (AUROCs) of similarity scores based on equal, generic and specific term weighting approaches were 0.735, 0.728 and 0.743 respectively (p < 0.01). In comparison with equal term weighting, the performance was significantly improved by specific term weighting (p < 0.01) but not by generic term weighting. The clinical terms “Dysplastic nodule”, “nodule of liver” and “equal density (isodense) lesion” were found the top three image findings associated with HCC in PubMed.

Conclusions

Our findings suggest that the optimized similarity measure with specific term weighting to EHRs can improve significantly the accuracy for predicting the inter-patient co-occurrence of diagnosis when compared with equal and generic term weighting approaches.
Appendix
Available only for authorised users
Literature
1.
go back to reference Peter BJ, Lars JJ, Søren B. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012;13(6):395–405.CrossRef Peter BJ, Lars JJ, Søren B. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012;13(6):395–405.CrossRef
2.
go back to reference Ceuster W, Smith B. Strategies for referent tracking in electronic health records. J Biomed Inform. 2006;39:362–78.CrossRef Ceuster W, Smith B. Strategies for referent tracking in electronic health records. J Biomed Inform. 2006;39:362–78.CrossRef
3.
go back to reference Chan LWC, Benzie IFF, Liu Y, et al.: Is the inter-patient coincidence of a subclinical disorder related to EHR similarity? 2011 IEEE 13th International Conference on e-Health Networking, Applications and Services 2011:177–180 doi:10.1109/HEALTH.2011.6026738. Chan LWC, Benzie IFF, Liu Y, et al.: Is the inter-patient coincidence of a subclinical disorder related to EHR similarity? 2011 IEEE 13th International Conference on e-Health Networking, Applications and Services 2011:177–180 doi:10.1109/HEALTH.2011.6026738.
4.
go back to reference Sánchez D, Batet M, Isern D, Valls A. Ontology-based semantic similarity: a new feature-based approach. Expert Systems With Applications. 2012;39(9):7718–28. Sánchez D, Batet M, Isern D, Valls A. Ontology-based semantic similarity: a new feature-based approach. Expert Systems With Applications. 2012;39(9):7718–28.
5.
go back to reference Batet M, Sánchez D, Aida V. An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform. 2011;44:118–25.CrossRefPubMed Batet M, Sánchez D, Aida V. An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform. 2011;44:118–25.CrossRefPubMed
6.
go back to reference Richesson RL, Andrew JE, Krischer JP. Use of SNOMD CT to represent clinical research data: a semantic characterization of data items on case report forms in vasculitis research. J Am Med Inform Assoc. 2006;13(5):536–46.CrossRefPubMedPubMedCentral Richesson RL, Andrew JE, Krischer JP. Use of SNOMD CT to represent clinical research data: a semantic characterization of data items on case report forms in vasculitis research. J Am Med Inform Assoc. 2006;13(5):536–46.CrossRefPubMedPubMedCentral
7.
go back to reference Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G. Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform. 2006;39(6):697–705. Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G. Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform. 2006;39(6):697–705.
8.
go back to reference Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG. Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform. 2007;40(3):288–99. Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG. Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform. 2007;40(3):288–99.
9.
go back to reference Wasserman H, Wang J. An applied evaluation of SNOMED CT as a clinical vocabulary for the computerized diagnosis and problem list. AMIA Symposium. 2003;699–703. Wasserman H, Wang J. An applied evaluation of SNOMED CT as a clinical vocabulary for the computerized diagnosis and problem list. AMIA Symposium. 2003;699–703.
10.
go back to reference Lieberman MI, Ricciardi TN, Masarie FE, Spackman KA. The use of SNOMED CT simplifies querying of a clinical data warehouse. AMIA Symposium. 2003;910. Lieberman MI, Ricciardi TN, Masarie FE, Spackman KA. The use of SNOMED CT simplifies querying of a clinical data warehouse. AMIA Symposium. 2003;910.
11.
go back to reference Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19(10):1275–83. Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19(10):1275–83.
12.
go back to reference Chan LWC, Liu Y, Shyu CR, Benzie IFF. A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity. Eng Appl Artif Intell. 2011;24:1398–409. Chan LWC, Liu Y, Shyu CR, Benzie IFF. A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity. Eng Appl Artif Intell. 2011;24:1398–409.
13.
go back to reference Falda M, Toppo S, Pescarolo A, Lavezzo E, Camillo BD, Facchinetti A, et al. Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics. 2012;13:1–9. Falda M, Toppo S, Pescarolo A, Lavezzo E, Camillo BD, Facchinetti A, et al. Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms. BMC Bioinformatics. 2012;13:1–9.
15.
go back to reference Page AJ, Cosgrove DC, Philosophe B, Pawlik TM. Hepatocellular carcinoma: diagnosis, management, and prognosis. Surg Oncol Clin N Am. 2014;23(2):289–311.CrossRefPubMed Page AJ, Cosgrove DC, Philosophe B, Pawlik TM. Hepatocellular carcinoma: diagnosis, management, and prognosis. Surg Oncol Clin N Am. 2014;23(2):289–311.CrossRefPubMed
16.
go back to reference Kamel IR, Liapi E, Fishman EK. Multidetector CT of hepatocellular carcinoma. Best Pract Res Clin Gastroenterol. 2005;19(1):63–89.CrossRefPubMed Kamel IR, Liapi E, Fishman EK. Multidetector CT of hepatocellular carcinoma. Best Pract Res Clin Gastroenterol. 2005;19(1):63–89.CrossRefPubMed
17.
go back to reference Hanley JA, Mcneil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.CrossRefPubMed Hanley JA, Mcneil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.CrossRefPubMed
18.
go back to reference Hanley JA, Mcneil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148:839–43.CrossRefPubMed Hanley JA, Mcneil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148:839–43.CrossRefPubMed
19.
20.
go back to reference Zhou Z, Wang Y, Gu J. A new model of information content for semantic similarity in WordNet. Second International Conference on Future Generation Communication and Networking Symposia. 2008;2008:85–9.CrossRef Zhou Z, Wang Y, Gu J. A new model of information content for semantic similarity in WordNet. Second International Conference on Future Generation Communication and Networking Symposia. 2008;2008:85–9.CrossRef
21.
go back to reference Gottlieb A, Stein GY, Ruppin E, Altman RB, Sharan R. A method for inferring medical diagnoses from patient similarities. BMC Medicine. 2013;11:194. Gottlieb A, Stein GY, Ruppin E, Altman RB, Sharan R. A method for inferring medical diagnoses from patient similarities. BMC Medicine. 2013;11:194.
Metadata
Title
PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction
Authors
Lawrence WC Chan
Ying Liu
Tao Chan
Helen KW Law
SC Cesar Wong
Andy PH Yeung
KF Lo
SW Yeung
KY Kwok
William YL Chan
Thomas YH Lau
Chi-Ren Shyu
Publication date
01-12-2015
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2015
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-015-0166-2

Other articles of this Issue 1/2015

BMC Medical Informatics and Decision Making 1/2015 Go to the issue