Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2019 | Research article

Latent Dirichlet Allocation in predicting clinical trial terminations

Authors: Simon Geletta, Lendie Follett, Marcia Laugerman

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Abstract

Background

This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures.

Method

We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data.

Results

In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone.

Conclusions

Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies.

Kasenda B, Von Elm E, You J, Blümle A, Tomonaga Y, Saccilotto R, Amstutz A, Bengough T, Meerpohl JJ, Stegert M, et al. Prevalence, characteristics, and publication of discontinued randomized trials. Jama. 2014;311(10):1045–52.CrossRef

Jamjoom AAB, Gane AB, Demetriades AK. Randomized controlled trials in neurosurgery: an observational analysis of trial discontinuation and publication outcome. J Neurosurg. 2017;127(4):857–66.CrossRef

Department of Health and Human Services. Final rule—clinical trials registration and results information submission. Fed Regist. 2016;81:64981–5157.

Cahan A, Anand V. Second thoughts on the final rule: An analysis of baseline participant characteristics reports on clinicaltrials. gov. PloS one. 2017;12(11):e0185886.CrossRef

Lazard AJ, Saffer AJ, Wilcox GB, Chung ADW, Mackert MS, Bernhardt JM. E-cigarette social media messages: a text mining analysis of marketing and consumer conversations on twitter. JMIR Public Health Surveill. 2016;2(2):e171.CrossRef

Lazard AJ, Scheinfeld E, Bernhardt JM, Wilcox GB, Suran M. Detecting themes of public concern: a text mining analysis of the centers for disease control and prevention’s ebola live twitter chat. Am J Infect Control. 2015;43(10):1109–11.CrossRef

Glowacki EM, Lazard AJ, Wilcox GB, Mackert M, Bernhardt JM. Identifying the public’s concerns and the centers for disease control and prevention’s reactions during a health crisis: an analysis of a zika live twitter chat. Am J Infect Control. 2016;44(12):1709–11.CrossRef

Blei DM, Ng A, Jordan M. Latent dirichlet allocation journal of machine learning research (3); 2003.

Amado A, Cortez P, Rita P, Moro S. Research trends on big data in marketing: a text mining and topic modeling based literature analysis. Eur Res Manag Bus Econ. 2018;24(1):1–7.CrossRef

10.

Delen D, Crossland MD. Seeding the survey and analysis of research literature with text mining. Expert Syst Appl. 2008;34(3):1707–20.CrossRef

11.

Cai Z, Li H, Hu X, Graesser A. Can Word Probabilities from LDA be Simply Added up to Represent Documents? Paper presented at the 9th International Conference on Educational Data Mining, June 29 - July 2, 2016 Raleigh, North Carolina.

12.

Ramanathan V, Wechsler H. Phishing detection and impersonated entity discovery using conditional random field and latent dirichlet allocation. Comput Secur. 2013;34:123–39.CrossRef

13.

Xiao C, Zhang P, Chaovalitwongse WA, Hu J, Wang F. Adverse drug reaction prediction with symbolic latent dirichlet allocation. In: Thirty-First AAAI Conference on Artificial Intelligence; 2017.

14.

Follett L, Geletta S, Laugerman M. Quantifying risk associated with clinical trial termination: a text mining approach. Inf Process Manag. 2019;56(3):516–25.CrossRef

15.

Han H, Guo X, Hua Y. Variable selection using mean decrease accuracy and mean decrease gini based on random forest. In: 2016 7th ieee international conference on software engineering and service science (icsess). Beijing: IEEE; 2016. p. 219–24.

Title: Latent Dirichlet Allocation in predicting clinical trial terminations
Authors: Simon Geletta
Lendie Follett
Marcia Laugerman
Publication date: 01-12-2019
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-019-0973-y

ACC 2024 Congress

Springer Medicine

Latent Dirichlet Allocation in predicting clinical trial terminations

Abstract

Background

Method

Results

Conclusions

ACC 2024 Congress

Springer Medicine

Abstract

Background

Method

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2019

GatewayNet: a form of sequential rule mining

Clinical decision support system for the management of osteoporosis compared to NOGG guidelines and an osteology specialist: a validation pilot study

Integrating an openEHR-based personalized virtual model for the ageing population within HBase

Talking about treatment benefits, harms, and what matters to patients in radiation oncology: an observational study

Two-stage framework for optic disc localization and glaucoma classification in retinal fundus images using deep learning

AliClu - Temporal sequence alignment for clustering longitudinal clinical data