Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2019 | Research article

Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study

Authors: Frank Soboczenski, Thomas A. Trikalinos, Joël Kuiper, Randolph G. Bias, Byron C. Wallace, Iain J. Marshall

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Abstract

Objective

Assessing risks of bias in randomized controlled trials (RCTs) is an important but laborious task when conducting systematic reviews. RobotReviewer (RR), an open-source machine learning (ML) system, semi-automates bias assessments. We conducted a user study of RobotReviewer, evaluating time saved and usability of the tool.

Materials and methods

Systematic reviewers applied the Cochrane Risk of Bias tool to four randomly selected RCT articles. Reviewers judged: whether an RCT was at low, or high/unclear risk of bias for each bias domain in the Cochrane tool (Version 1); and highlighted article text justifying their decision. For a random two of the four articles, the process was semi-automated: users were provided with ML-suggested bias judgments and text highlights. Participants could amend the suggestions if necessary. We measured time taken for the task, ML suggestions, usability via the System Usability Scale (SUS) and collected qualitative feedback.

Results

For 41 volunteers, semi-automation was quicker than manual assessment (mean 755 vs. 824 s; relative time 0.75, 95% CI 0.62–0.92). Reviewers accepted 301/328 (91%) of the ML Risk of Bias (RoB) judgments, and 202/328 (62%) of text highlights without change. Overall, ML suggested text highlights had a recall of 0.90 (SD 0.14) and precision of 0.87 (SD 0.21) with respect to the users’ final versions. Reviewers assigned the system a mean 77.7 SUS score, corresponding to a rating between “good” and “excellent”.

Conclusions

Semi-automation (where humans validate machine learning suggestions) can improve the efficiency of evidence synthesis. Our system was rated highly usable, and expedited bias assessment of RCTs.

https://github.com/ijmarshall/robotreviewer

http://community.cochrane.org/help/data-management-tools/cochrane-crowd

https://eppi.ioe.ac.uk/cms/

http://www.srsm.org

www.robotreviewer.net

https://www.youtube.com/watch?v=0xwwze83sBs

We provide a Jupyter Notebook with the code for all analyses together with the source data collected as a supplement to this article

https://github.com/h21k/RRUX

This feedback might result from participants expecting the tool to highlight only relevant text. In fact, the system is intentionally designed to over-retrieve rationales, always retrieving the top three most likely rationales (to increase the likelihood that at least one will be relevant). However, the trade-off of this approach is that clearly irrelevant text may be highlighted (for example, where only one piece of relevant text appears in the article, or due to inaccuracies of prediction on a particular document).

https://nbviewer.jupyter.org/github/h21k/RRUX/blob/master/Analysis.ipynb

https://github.com/h21k/RRUX

https://github.com/h21k/robotreviewer3/blob/ux/robotreviewer/templates/ux.html

Sackett DL, Rosenberg WMC, Gray JAM, et al. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312:71–2.CrossRef

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7:e1000326.CrossRef

Higgins JPT, Altman DG, Gotzsche PC, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.CrossRef

Hartling L, Ospina M, Liang Y. Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ. 2009;339:b4012.CrossRef

Tsafnat G, Dunn A, Glasziou P, et al. The automation of systematic reviews. BMJ. 2013;346:f139.CrossRef

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4:78.CrossRef

O’Connor AM, Tsafnat G, Gilbert SB, et al. Moving toward the automation of the systematic review process: a summary of discussions at the second meeting of International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7:3.CrossRef

Marshall IJ, et al. RobotReviewer: evaluation of a system for automatically assessing Bias in clinical trials. J Am Med Inform Assoc: JAMIA. 2015;23(1):193–201. https://doi.org/10.1093/jamia/ocv044.CrossRef

Marshall IJ, Kuiper J, Banner E, et al. Automating biomedical evidence synthesis: Robotreviewer. In: Proceedings of the conference. Association for Computational Linguistics. Meeting. NIH Public Access 2017. Accessed July 2017.

10.

Marshall IJ, Kuiper J, Wallace BC. Automating risk of bias assessment for clinical trials. In: Proceedings of the 5th ACM conference on Bioinformatics, computational biology, and health informatics. Newport Beach: ACM; 2014. p. 88–95.

11.

Zhang Y, Marshall I, Wallace BC. Rationale-augmented convolutional neural networks for text classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing. NIH Public Access 2016. 795. Accessed Nov 2016.

12.

Zhang Y, Marshall I, Wallace BC. Rationale-augmented convolutional neural networks for text classification. Proc Conf Empir Methods Nat Lang Process. 2016;2016:795–804.PubMedPubMedCentral

13.

Mintz M, Bills S, Snow R, et al. Distant supervision for relation extraction without labeled data. Suntec, Singapore: Association for Computational Linguistics; 2009. p. 1003–11.

14.

Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources. Proc Int Conf Intell Syst Mol Biol. 1999;1999:77–86.

15.

Wallace BC, Kuiper J, Sharma A, et al. Extracting PICO sentences from clinical trial reports using supervised distant supervision. J Mach Learn Res. 2016;17:1–25.

16.

Kuiper J, Marshall IJ, Wallace BC, et al. Spá: A Web-Based Viewer for Text Mining in Evidence Based Medicine. In: Machine Learning and Knowledge Discovery in Databases; 2014. p. 452–5.

17.

Bangor A, Kortum PT, Miller JT. An empirical evaluation of the system usability scale. Int J Human Comput Interact. 2008;24:574–94.CrossRef

18.

Sauro J. Measuring usability with the system usability scale (SUS). 2011. http://www.measuringusability.com/sus.php

19.

Brooke J. SUS: a retrospective. J Usability Stud. 2013;8:29–40.

20.

Bates D, Mächler M, Bolker B, et al. Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw. 2015;67:1–48. https://doi.org/10.18637/jss.v067.i01.CrossRef

21.

System Usability Scale (SUS). Usability.gov. 2013. https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html (Accessed 22 May 2018).

22.

Bangor A, Kortum P, Miller J. Determining what individual SUS scores mean: adding an adjective rating scale. J Usability Studies. 2009;4:114–23.

23.

Gates A, Vandermeer B, Hartling L. Technology-assisted risk of bias assessment in systematic reviews: a prospective cross-sectional evaluation of the RobotReviewer machine learning tool. J Clin Epidemiol. 2018;96:54–62.CrossRef

24.

Higgins J, Sterne J, Savović J, et al. A revised tool for assessing risk of bias in randomized trials. Cochrane Database Syst Rev. 2016;10(Suppl 1):29–31.

Title: Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study
Authors: Frank Soboczenski
Thomas A. Trikalinos
Joël Kuiper
Randolph G. Bias
Byron C. Wallace
Iain J. Marshall
Publication date: 01-12-2019
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-019-0814-z

At a glance: The STEP trials

Springer Medicine

Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study

Abstract

Objective

Materials and methods

Results

Conclusions

At a glance: The STEP trials

Springer Medicine

Abstract

Objective

Materials and methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2019

The past, present and future of opioid withdrawal assessment: a scoping review of scales and technologies

Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department

Collective intelligence in medical decision-making: a systematic scoping review

Working with patients and the public to design an electronic health record interface: a qualitative mixed-methods study

How to measure temporal changes in care pathways for chronic diseases using health care registry data

Expenditure variations analysis using residuals for identifying high health care utilizers in a state Medicaid program