Top

Published in:

Open Access 01-12-2019 | Methodology

Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error

Authors: Alexandra Bannach-Brown, Piotr Przybyła, James Thomas, Andrew S. C. Rice, Sophia Ananiadou, Jing Liao, Malcolm Robert Macleod

Published in: Systematic Reviews | Issue 1/2019

Abstract

Background

Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.

Methods

We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).

Results

ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.

Conclusions

This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.

Available only for authorised users

Bornmann L, Mutz R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J Assoc Inf Sci Technol. 2015;66(11):2215–22.CrossRef

Cohen AM, Adams CE, Davis JM, Yu C, Yu PS, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. In: Proceedings of the 1st ACM international Health Informatics Symposium: ACM; 2010. p. 376–80.

Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, Macleod M. SWIFT-review: a text-mining workbench for systematic review. Syst Rev. 2016;5(1):87.CrossRef

Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3(1):74.CrossRef

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.CrossRef

Borah R, Brown AW, Capers PL, et al. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7:e012545. https://doi.org/10.1136/bmjopen-2016-012545.CrossRefPubMedPubMedCentral

Cohen AM, Hersh WR, Peterson K, Yen PY. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19. https://doi.org/10.1197/jamia.M1929.CrossRefPubMedPubMedCentral

Cohen AM, Ambert K, McDonagh M. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Mak. 2012;12(1):33.CrossRef

Wallace BC, Small K, Brodley CE, Lau J, Schmid CH, Bertram L, et al. Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining. Genetics Med. 2012;14(7):663–9.CrossRef

10.

Lewis DD, Gale WA. A Sequential Algorithm for Training Text Classifiers. In W. Bruce Croft and C. J. van Rijsbergen, eds., SIGIR 94: Proceedings of Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. London: Springer-Verlag; 1994. pp. 3–12.

11.

Wallace BC, Small K, Brodley CE, Trikalinos TA. Active learning for biomedical citation screening. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10). New York: ACM; 2010. pp. 173–182. https://doi.org/10.1145/1835804.1835829.

12.

Liu J, Timsina P, El-Gayar O. A comparative analysis of semi-supervised learning: the case of article selection for medical systematic reviews. Inf Syst Front. 2016:1–13. https://doi.org/10.1007/s10796-016-9724-0.

13.

Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.CrossRef

14.

Wallace, B. C., Small, K., Brodley, C. E., Lau, J., & Trikalinos, TA. (2012). Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. Proceedings of the 2nd ACM SIGHIT Symposium on International Health Informatics - IHI ‘12, 819. doi: https://doi.org/10.1145/2110363.2110464

15.

Kontonatsios G, Brockmeier AJ, Przybyła P, McNaught J, Mu T, Goulermas JY, Ananiadou S. A semi-supervised approach using label propagation to support citation screening. J Biomed Inform. 2017;72:67–76. https://doi.org/10.1016/j.jbi.2017.06.018.CrossRefPubMedPubMedCentral

16.

Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara-Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synth Methods. 2014;5(1):31–49.CrossRef

17.

Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11:1.CrossRef

18.

Rathbone J, Hoffmann T, Glasziou P. Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Syst Rev. 2015;4(1):80. https://doi.org/10.1186/s13643-015-0067-6.CrossRefPubMedPubMedCentral

19.

Liao, J., Ananiadou, S., Currie, G.L., Howard, B.E., Rice, A., Sena, E.S., Thomas, J., Varghese, A., Macleod, M.R. (2018) Automation of citation screening in pre-clinical systematic reviews. bioRxiv 280131; doi: https://doi.org/10.1101/280131.

20.

Sena ES, Currie GL, McCann SK, Macleod MR, Howells DW. Systematic reviews and meta-analysis of preclinical studies: why perform them and how to appraise them critically. J Cereb Blood Flow Metab. 2014;34(5):737–42.CrossRef

21.

de Vries RB, Hooijmans CR, Tillema A, Leenaars M, Ritskes-Hoitinga M. Letter to the Editor. Laboratory Animals. 2014;48(1):88. https://doi.org/10.1177/0023677213494374.CrossRefPubMed

22.

Hooijmans, C. R., Tillema, A., Leenaars, M., & Ritskes-Hoitinga, M. (2010). Enhancing search efficiency by means of a search filter for finding all studies on animal experimentation in PubMed. Lab Anim, 44(3), 170–175. doi: https://doi.org/10.1258/la.2010.009117

23.

Bannach-Brown A, Liao J, Wegener G, Macleod MR. Understanding in vivo modelling of depression in non-human animals: a systematic review protocol. Evidence Based Preclinical Med. 2016;3(2):20–7.CrossRef

24.

Kuhn, M., (2017) “The caret package”. https://topepo.github.io/caret/. Accessed 11 Dec 2018.

25.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12(Oct):2825–30.

26.

Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. USA: Cambridge University Press; 2008.CrossRef

27.

Oracle (2018). MySQL 8.0 Reference Manual: Full-Text Stopwords. Accessed from: https://dev.mysql.com/doc/refman/8.0/en/fulltext-stopwords.html on: 14/05/2018.

28.

Thomas J, Brunton J, Graziosi S. EPPI-Reviewer 4.0: software for research synthesis. EPPI-Centre Software. London: Social Science Research Unit, Institute of Education; 2010.

29.

Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii JI. Developing a robust part-of-speech tagger for biomedical text. In: Panhellenic conference on informatics. Berlin, Heidelberg: Springer; 2005. p. 382–92.

30.

McCallum, AK. (2002). MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu. Accessed 11 Dec 2018.

31.

Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1.CrossRef

32.

R Core Team (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2018. https://www.R-project.org/.

33.

Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998;17:857–72.CrossRef

34.

Bannach-Brown, A., Thomas, J., Przybyła, P., Liao, J., (2016). Protocol for error analysis: machine learning and text mining solutions for systematic reviews of animal models of depression. Published on CAMARADES Website. www.CAMARADES.info. Direct Access: https://drive.google.com/file/d/0BxckMffc78BYTm0tUzJJZkc1alk/view

35.

Rodriguez JD, Perez A, Lozano JA. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell. 2010;32(3):569–75.CrossRef

36.

Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, Pepe MS. Net reclassification indices for evaluating risk-prediction instruments: a critical review. Epidemiology (Cambridge, Mass). 2014;25(1):114–21. https://doi.org/10.1097/EDE.0000000000000018.CrossRef

37.

Pencina MJ, D'Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–72.CrossRef

38.

Robin, X. (2017). “pROC” Package. https://cran.r-project.org/web/packages/pROC/pROC.pdf. Accessed 11 Dec 2018.

39.

Nakazawa, M., (2018). “fsmb” Package. https://cran.r-project.org/web/packages/fmsb/fmsb.pdf. Accessed 11 Dec 2018.

40.

Cormack GV, Grossman MR. Engineering quality and reliability in technology-assisted review. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR ‘16 (pp. 75–84). New York. New York, USA: ACM Press; 2016. https://doi.org/10.1145/2911451.2911510.CrossRef

41.

Elliott JH, Synnot A, Turner T, Simmonds M, Akl EA, McDonald S, Salanti G, Meerpohl J, MacLehose H, Hilton J, Tovey D. Living systematic review: 1. Introduction—the why, what, when, and how. J Clin Epidemiol. 2017;91:23–30. https://doi.org/10.1016/j.jclinepi.2017.08.010.CrossRefPubMed

42.

Wallace BC, Small K, Brodley CE, Trikalinos TA. Class imbalance, redux. In: Data Mining (ICDM), 2011 IEEE 11th International Conference on: IEEE; 2011. p. 754–63. https://doi.org/10.1109/ICDM.2011.33.

43.

Marshall IJ, Noel-Storr A, Kuiper J, Thomas J, Wallace BC. Machine learning for identifying Randomized Controlled Trials: an evaluation and practitioner's guide. Res Synthesis Methods. 2018:1–12. https://doi.org/10.1002/jrsm.1287.

44.

Przybyła P, Brockmeier AJ, Kontonatsios G, Le Pogam MA, McNaught J, von Elm E, Nolan K, Ananiadou S. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res Synth Methods. 2018;9(3):470–88. https://doi.org/10.1002/jrsm.1311.CrossRefPubMedPubMedCentral

45.

Centre for Evidence-Based Medicine (2018) “Likelihood Ratios” Retrieved from: https://www.cebm.net/2014/02/likelihood-ratios/. Accessed on 29/11/2018

Title: Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error
Authors: Alexandra Bannach-Brown
Piotr Przybyła
James Thomas
Andrew S. C. Rice
Sophia Ananiadou
Jing Liao
Malcolm Robert Macleod
Publication date: 01-12-2019
Publisher: BioMed Central
Published in: Systematic Reviews / Issue 1/2019
Electronic ISSN: 2046-4053
DOI: https://doi.org/10.1186/s13643-019-0942-7

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error

Abstract

Background

Methods

Results

Conclusions

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2019

Factors influencing perceptions of private water quality in North America: a systematic review

Neurocognitive outcome of school-aged children with congenital heart disease who underwent cardiopulmonary bypass surgery: a systematic review protocol

Evidence-based models of care for the treatment of alcohol use disorder in primary health care settings: protocol for systematic review

Addressing preconception behaviour change through mobile phone apps: a protocol for a systematic review and meta-analysis

Long-term efficacy of interventions for actinic keratosis: protocol for a systematic review and network meta-analysis

Benefits and harms of medical cannabis: a scoping review of systematic reviews