Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2013

Open Access 01-04-2013 | Proceedings

A method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data

Authors: Angelo Restificar, Ioannis Korkontzelos, Sophia Ananiadou

Published in: BMC Medical Informatics and Decision Making | Special Issue 1/2013

Login to get access

Abstract

Background

We consider the user task of designing clinical trial protocols and propose a method that discovers and outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D , | D | | D | , a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D, DD', by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. The appropriateness is measured by the degree to which they are consistent with the user-supplied sample documents D'.

Method

We propose a novel three-step method called LDALR which views documents as a mixture of latent topics. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA). Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score.

Results

Our experiments have shown that LDALR is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments using LDALR, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.

Conclusions

We have proposed LDALR, a practical method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Results from our experiments suggest that LDALR models can be used to effectively find appropriate eligibility criteria from a large repository of clinical trial protocols.
Literature
1.
go back to reference Restificar A, Ananiadou S: Inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics. 2012, New York: ACM, 21-28. 10.1145/2390068.2390074. Restificar A, Ananiadou S: Inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics. 2012, New York: ACM, 21-28. 10.1145/2390068.2390074.
2.
go back to reference Korkontzelos I, Mu T, Restificar A, Ananiadou S: Text mining for efficient search and assisted creation of clinical trials. Proceedings of the ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics. 2011, New York: ACM, 43-50. 10.1145/2064696.2064706. Korkontzelos I, Mu T, Restificar A, Ananiadou S: Text mining for efficient search and assisted creation of clinical trials. Proceedings of the ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics. 2011, New York: ACM, 43-50. 10.1145/2064696.2064706.
3.
go back to reference Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JLMR). 2008, 9: 1871-1874. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JLMR). 2008, 9: 1871-1874.
4.
go back to reference Lin CJ, Weng RC, Keerthi SS: Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research (JLMR). 2008, 9: 627-650. Lin CJ, Weng RC, Keerthi SS: Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research (JLMR). 2008, 9: 627-650.
5.
go back to reference Blei DM, Ng AY, Jordan MI: Latent Dirichlet Allocation. Journal of Machine Learning Research (JLMR). 2003, 3: 993-1022. Blei DM, Ng AY, Jordan MI: Latent Dirichlet Allocation. Journal of Machine Learning Research (JLMR). 2003, 3: 993-1022.
6.
go back to reference Korkontzelos I, Mu T, Ananiadou S: ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials. BMC Medical Informatics and Decision Making. 2012, 12 (Suppl 1): S3-10.1186/1472-6947-12-S1-S3.PubMedCentralCrossRefPubMed Korkontzelos I, Mu T, Ananiadou S: ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials. BMC Medical Informatics and Decision Making. 2012, 12 (Suppl 1): S3-10.1186/1472-6947-12-S1-S3.PubMedCentralCrossRefPubMed
7.
go back to reference Aronson AR, Lang FM: An Overview of MetaMap: Historical Perspective and Recent Advances. Journal of the American Medical Informatics Association. 2010, 17: 229-236.PubMedCentralCrossRefPubMed Aronson AR, Lang FM: An Overview of MetaMap: Historical Perspective and Recent Advances. Journal of the American Medical Informatics Association. 2010, 17: 229-236.PubMedCentralCrossRefPubMed
8.
go back to reference Demner-Fushman D, Lin J: Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics. 2007, 33: 63-103. 10.1162/coli.2007.33.1.63.CrossRef Demner-Fushman D, Lin J: Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics. 2007, 33: 63-103. 10.1162/coli.2007.33.1.63.CrossRef
9.
go back to reference Patel Chintan, Cimino James: Semantic Query Generation from Eligibility Criteria in Clinical Trials. AMIA 2007 Symposium Proceedings. 2007, 1070- Patel Chintan, Cimino James: Semantic Query Generation from Eligibility Criteria in Clinical Trials. AMIA 2007 Symposium Proceedings. 2007, 1070-
10.
go back to reference de Bruijn Berry, Carini Simona, Kiritchenko Svetlana, Martin Joel, Sim Ida: Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications. AMIA 2008 Symposium Proceedings. 2008, 141-145. de Bruijn Berry, Carini Simona, Kiritchenko Svetlana, Martin Joel, Sim Ida: Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications. AMIA 2008 Symposium Proceedings. 2008, 141-145.
11.
go back to reference Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I: ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making. 2010, 10: 56-10.1186/1472-6947-10-56.PubMedCentralCrossRefPubMed Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I: ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making. 2010, 10: 56-10.1186/1472-6947-10-56.PubMedCentralCrossRefPubMed
12.
go back to reference Yu Hong, Cao Yong-gang: Automatically Extracting Information Needs from Ad Hoc Clinical Questions. AMIA 2008 Symposium Proceedings. 2008, 96-100. Yu Hong, Cao Yong-gang: Automatically Extracting Information Needs from Ad Hoc Clinical Questions. AMIA 2008 Symposium Proceedings. 2008, 96-100.
13.
go back to reference Demner-Fushman Dina, Lin Jimmy: Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering. ACL 2006 Proceedings. 2006 Demner-Fushman Dina, Lin Jimmy: Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering. ACL 2006 Proceedings. 2006
14.
go back to reference Bishop CM: Pattern Recognition and Machine Learning. 2007, Springer Bishop CM: Pattern Recognition and Machine Learning. 2007, Springer
15.
go back to reference Steyvers Mark, Griffiths Tom: Probabilistic Topic Models. Latent Semantic Analysis: A Road to Meaning. Edited by: Landauer T, McNamara D, Dennis S, Kintsch W. 2006, Laurence Erlbaum Steyvers Mark, Griffiths Tom: Probabilistic Topic Models. Latent Semantic Analysis: A Road to Meaning. Edited by: Landauer T, McNamara D, Dennis S, Kintsch W. 2006, Laurence Erlbaum
17.
go back to reference Huang X, Lin J, Demner-Fushman D: PICO as a Knowledge Representation for Clinical Questions. AMIA 2006 Symposium Proceedings. 2006, 359-363. Huang X, Lin J, Demner-Fushman D: PICO as a Knowledge Representation for Clinical Questions. AMIA 2006 Symposium Proceedings. 2006, 359-363.
18.
go back to reference Schardt C, Adams MB, Owens T, Keitz S, Fontelo P: Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Medical Informatics and Decision Making. 2007, 7: 16-10.1186/1472-6947-7-16.PubMedCentralCrossRefPubMed Schardt C, Adams MB, Owens T, Keitz S, Fontelo P: Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Medical Informatics and Decision Making. 2007, 7: 16-10.1186/1472-6947-7-16.PubMedCentralCrossRefPubMed
19.
go back to reference Boudin F, Nie JY, Dawes M: Clinical Information Retrieval using Document and PICO Structure. Proceedings of the NAACL 2010. 2010, Los Angeles, California: Association for Computational Linguistics, 822-830. Boudin F, Nie JY, Dawes M: Clinical Information Retrieval using Document and PICO Structure. Proceedings of the NAACL 2010. 2010, Los Angeles, California: Association for Computational Linguistics, 822-830.
20.
go back to reference Richardson WS, et al: The well-built clinical question: a key to evidence-based decisions. ACP Journal Club. 1995, 123: Richardson WS, et al: The well-built clinical question: a key to evidence-based decisions. ACP Journal Club. 1995, 123:
21.
go back to reference Kim SN, Martinez D, Cavedon L, Yencken L: Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics. 2011, 12 (Suppl 2): S5-10.1186/1471-2105-12-S2-S5.CrossRef Kim SN, Martinez D, Cavedon L, Yencken L: Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics. 2011, 12 (Suppl 2): S5-10.1186/1471-2105-12-S2-S5.CrossRef
22.
go back to reference Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA Data Mining Software: An Update. SIGKDD Explorations. 2009, 11: Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA Data Mining Software: An Update. SIGKDD Explorations. 2009, 11:
23.
go back to reference Taura Kenjiro: GXP: An Interactive Shell for the Grid Environment. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems. 2004 Taura Kenjiro: GXP: An Interactive Shell for the Grid Environment. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems. 2004
24.
go back to reference Dietterich TG: Machine Learning. Nature Encyclopedia of Cognitive Science. 2003, Macmillan Dietterich TG: Machine Learning. Nature Encyclopedia of Cognitive Science. 2003, Macmillan
25.
go back to reference Duda RO, Hart PE, Stork DG: Pattern Classification. 2001, Wiley Duda RO, Hart PE, Stork DG: Pattern Classification. 2001, Wiley
26.
go back to reference Tu S, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I: A Practical Method for Transforming Free-Text Eligibility Criteria into Computable Criteria. Journal of Biomedical Informatics. 2011, 44: 239-250. 10.1016/j.jbi.2010.09.007.PubMedCentralCrossRefPubMed Tu S, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I: A Practical Method for Transforming Free-Text Eligibility Criteria into Computable Criteria. Journal of Biomedical Informatics. 2011, 44: 239-250. 10.1016/j.jbi.2010.09.007.PubMedCentralCrossRefPubMed
Metadata
Title
A method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data
Authors
Angelo Restificar
Ioannis Korkontzelos
Sophia Ananiadou
Publication date
01-04-2013
Publisher
BioMed Central
DOI
https://doi.org/10.1186/1472-6947-13-S1-S6

Other articles of this Special Issue 1/2013

BMC Medical Informatics and Decision Making 1/2013 Go to the issue