Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-04-2013 | Proceedings

A method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data

Authors: Angelo Restificar, Ioannis Korkontzelos, Sophia Ananiadou

Published in: BMC Medical Informatics and Decision Making | Special Issue 1/2013

Abstract

Background

We consider the user task of designing clinical trial protocols and propose a method that discovers and outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents

D^{'}, | D^{'} | ≪ | D |

, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D, D ⊃ D', by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. The appropriateness is measured by the degree to which they are consistent with the user-supplied sample documents D'.

Method

We propose a novel three-step method called LDALR which views documents as a mixture of latent topics. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA). Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score.

Results

Our experiments have shown that LDALR is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments using LDALR, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.

Conclusions

We have proposed LDALR, a practical method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Results from our experiments suggest that LDALR models can be used to effectively find appropriate eligibility criteria from a large repository of clinical trial protocols.

Restificar A, Ananiadou S: Inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics. 2012, New York: ACM, 21-28. 10.1145/2390068.2390074.

Korkontzelos I, Mu T, Restificar A, Ananiadou S: Text mining for efficient search and assisted creation of clinical trials. Proceedings of the ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics. 2011, New York: ACM, 43-50. 10.1145/2064696.2064706.

Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JLMR). 2008, 9: 1871-1874.

Lin CJ, Weng RC, Keerthi SS: Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research (JLMR). 2008, 9: 627-650.

Blei DM, Ng AY, Jordan MI: Latent Dirichlet Allocation. Journal of Machine Learning Research (JLMR). 2003, 3: 993-1022.

Korkontzelos I, Mu T, Ananiadou S: ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials. BMC Medical Informatics and Decision Making. 2012, 12 (Suppl 1): S3-10.1186/1472-6947-12-S1-S3.PubMedCentralCrossRefPubMed

Aronson AR, Lang FM: An Overview of MetaMap: Historical Perspective and Recent Advances. Journal of the American Medical Informatics Association. 2010, 17: 229-236.PubMedCentralCrossRefPubMed

Demner-Fushman D, Lin J: Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics. 2007, 33: 63-103. 10.1162/coli.2007.33.1.63.CrossRef

Patel Chintan, Cimino James: Semantic Query Generation from Eligibility Criteria in Clinical Trials. AMIA 2007 Symposium Proceedings. 2007, 1070-

10.

de Bruijn Berry, Carini Simona, Kiritchenko Svetlana, Martin Joel, Sim Ida: Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications. AMIA 2008 Symposium Proceedings. 2008, 141-145.

11.

Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I: ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making. 2010, 10: 56-10.1186/1472-6947-10-56.PubMedCentralCrossRefPubMed

12.

Yu Hong, Cao Yong-gang: Automatically Extracting Information Needs from Ad Hoc Clinical Questions. AMIA 2008 Symposium Proceedings. 2008, 96-100.

13.

Demner-Fushman Dina, Lin Jimmy: Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering. ACL 2006 Proceedings. 2006

14.

Bishop CM: Pattern Recognition and Machine Learning. 2007, Springer

15.

Steyvers Mark, Griffiths Tom: Probabilistic Topic Models. Latent Semantic Analysis: A Road to Meaning. Edited by: Landauer T, McNamara D, Dennis S, Kintsch W. 2006, Laurence Erlbaum

16.

McCallum AK: MALLET: A Machine Learning for Language Toolkit. 2002, [Http://mallet.cs.umass.edu]

17.

Huang X, Lin J, Demner-Fushman D: PICO as a Knowledge Representation for Clinical Questions. AMIA 2006 Symposium Proceedings. 2006, 359-363.

18.

Schardt C, Adams MB, Owens T, Keitz S, Fontelo P: Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Medical Informatics and Decision Making. 2007, 7: 16-10.1186/1472-6947-7-16.PubMedCentralCrossRefPubMed

19.

Boudin F, Nie JY, Dawes M: Clinical Information Retrieval using Document and PICO Structure. Proceedings of the NAACL 2010. 2010, Los Angeles, California: Association for Computational Linguistics, 822-830.

20.

Richardson WS, et al: The well-built clinical question: a key to evidence-based decisions. ACP Journal Club. 1995, 123:

21.

Kim SN, Martinez D, Cavedon L, Yencken L: Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics. 2011, 12 (Suppl 2): S5-10.1186/1471-2105-12-S2-S5.CrossRef

22.

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA Data Mining Software: An Update. SIGKDD Explorations. 2009, 11:

23.

Taura Kenjiro: GXP: An Interactive Shell for the Grid Environment. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems. 2004

24.

Dietterich TG: Machine Learning. Nature Encyclopedia of Cognitive Science. 2003, Macmillan

25.

Duda RO, Hart PE, Stork DG: Pattern Classification. 2001, Wiley

26.

Tu S, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I: A Practical Method for Transforming Free-Text Eligibility Criteria into Computable Criteria. Journal of Biomedical Informatics. 2011, 44: 239-250. 10.1016/j.jbi.2010.09.007.PubMedCentralCrossRefPubMed

Title: A method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data
Authors: Angelo Restificar
Ioannis Korkontzelos
Sophia Ananiadou
Publication date: 01-04-2013
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue Special Issue 1/2013
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/1472-6947-13-S1-S6

Keynote webinar | Spotlight on medication adherence

Springer Medicine

A method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data

Abstract

Background

Method

Results

Conclusions

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Method

Results

Conclusions

Please log in to get access to this content

Other articles of this Special Issue 1/2013

Efficient protein structure search using indexing methods

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features

Improved method for protein complex detection using bottleneck proteins

Rule-based multi-scale simulation for drug effect pathway analysis

Generation and application of drug indication inference models using typed network motif comparison analysis

On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach