Skip to main content
Top
Published in: Systematic Reviews 1/2019

Open Access 01-12-2019 | Methodology

Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error

Authors: Alexandra Bannach-Brown, Piotr Przybyła, James Thomas, Andrew S. C. Rice, Sophia Ananiadou, Jing Liao, Malcolm Robert Macleod

Published in: Systematic Reviews | Issue 1/2019

Login to get access

Abstract

Background

Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.

Methods

We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).

Results

ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.

Conclusions

This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.
Appendix
Available only for authorised users
Literature
1.
go back to reference Bornmann L, Mutz R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J Assoc Inf Sci Technol. 2015;66(11):2215–22.CrossRef Bornmann L, Mutz R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J Assoc Inf Sci Technol. 2015;66(11):2215–22.CrossRef
2.
go back to reference Cohen AM, Adams CE, Davis JM, Yu C, Yu PS, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. In: Proceedings of the 1st ACM international Health Informatics Symposium: ACM; 2010. p. 376–80. Cohen AM, Adams CE, Davis JM, Yu C, Yu PS, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. In: Proceedings of the 1st ACM international Health Informatics Symposium: ACM; 2010. p. 376–80.
3.
go back to reference Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, Macleod M. SWIFT-review: a text-mining workbench for systematic review. Syst Rev. 2016;5(1):87.CrossRef Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, Macleod M. SWIFT-review: a text-mining workbench for systematic review. Syst Rev. 2016;5(1):87.CrossRef
4.
go back to reference Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3(1):74.CrossRef Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3(1):74.CrossRef
5.
go back to reference O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.CrossRef O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.CrossRef
8.
go back to reference Cohen AM, Ambert K, McDonagh M. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Mak. 2012;12(1):33.CrossRef Cohen AM, Ambert K, McDonagh M. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Mak. 2012;12(1):33.CrossRef
9.
go back to reference Wallace BC, Small K, Brodley CE, Lau J, Schmid CH, Bertram L, et al. Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining. Genetics Med. 2012;14(7):663–9.CrossRef Wallace BC, Small K, Brodley CE, Lau J, Schmid CH, Bertram L, et al. Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining. Genetics Med. 2012;14(7):663–9.CrossRef
10.
go back to reference Lewis DD, Gale WA. A Sequential Algorithm for Training Text Classifiers. In W. Bruce Croft and C. J. van Rijsbergen, eds., SIGIR 94: Proceedings of Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. London: Springer-Verlag; 1994. pp. 3–12. Lewis DD, Gale WA. A Sequential Algorithm for Training Text Classifiers. In W. Bruce Croft and C. J. van Rijsbergen, eds., SIGIR 94: Proceedings of Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. London: Springer-Verlag; 1994. pp. 3–12.
11.
go back to reference Wallace BC, Small K, Brodley CE, Trikalinos TA. Active learning for biomedical citation screening. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10). New York: ACM; 2010. pp. 173–182. https://doi.org/10.1145/1835804.1835829. Wallace BC, Small K, Brodley CE, Trikalinos TA. Active learning for biomedical citation screening. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10). New York: ACM; 2010. pp. 173–182. https://​doi.​org/​10.​1145/​1835804.​1835829.
13.
go back to reference Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.CrossRef Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.CrossRef
14.
go back to reference Wallace, B. C., Small, K., Brodley, C. E., Lau, J., & Trikalinos, TA. (2012). Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. Proceedings of the 2nd ACM SIGHIT Symposium on International Health Informatics - IHI ‘12, 819. doi: https://doi.org/10.1145/2110363.2110464 Wallace, B. C., Small, K., Brodley, C. E., Lau, J., & Trikalinos, TA. (2012). Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. Proceedings of the 2nd ACM SIGHIT Symposium on International Health Informatics - IHI ‘12, 819. doi: https://​doi.​org/​10.​1145/​2110363.​2110464
16.
go back to reference Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara-Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synth Methods. 2014;5(1):31–49.CrossRef Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara-Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synth Methods. 2014;5(1):31–49.CrossRef
17.
go back to reference Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11:1.CrossRef Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11:1.CrossRef
19.
go back to reference Liao, J., Ananiadou, S., Currie, G.L., Howard, B.E., Rice, A., Sena, E.S., Thomas, J., Varghese, A., Macleod, M.R. (2018) Automation of citation screening in pre-clinical systematic reviews. bioRxiv 280131; doi: https://doi.org/10.1101/280131. Liao, J., Ananiadou, S., Currie, G.L., Howard, B.E., Rice, A., Sena, E.S., Thomas, J., Varghese, A., Macleod, M.R. (2018) Automation of citation screening in pre-clinical systematic reviews. bioRxiv 280131; doi: https://​doi.​org/​10.​1101/​280131.
20.
go back to reference Sena ES, Currie GL, McCann SK, Macleod MR, Howells DW. Systematic reviews and meta-analysis of preclinical studies: why perform them and how to appraise them critically. J Cereb Blood Flow Metab. 2014;34(5):737–42.CrossRef Sena ES, Currie GL, McCann SK, Macleod MR, Howells DW. Systematic reviews and meta-analysis of preclinical studies: why perform them and how to appraise them critically. J Cereb Blood Flow Metab. 2014;34(5):737–42.CrossRef
22.
23.
go back to reference Bannach-Brown A, Liao J, Wegener G, Macleod MR. Understanding in vivo modelling of depression in non-human animals: a systematic review protocol. Evidence Based Preclinical Med. 2016;3(2):20–7.CrossRef Bannach-Brown A, Liao J, Wegener G, Macleod MR. Understanding in vivo modelling of depression in non-human animals: a systematic review protocol. Evidence Based Preclinical Med. 2016;3(2):20–7.CrossRef
25.
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12(Oct):2825–30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12(Oct):2825–30.
26.
go back to reference Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. USA: Cambridge University Press; 2008.CrossRef Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. USA: Cambridge University Press; 2008.CrossRef
28.
go back to reference Thomas J, Brunton J, Graziosi S. EPPI-Reviewer 4.0: software for research synthesis. EPPI-Centre Software. London: Social Science Research Unit, Institute of Education; 2010. Thomas J, Brunton J, Graziosi S. EPPI-Reviewer 4.0: software for research synthesis. EPPI-Centre Software. London: Social Science Research Unit, Institute of Education; 2010.
29.
go back to reference Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii JI. Developing a robust part-of-speech tagger for biomedical text. In: Panhellenic conference on informatics. Berlin, Heidelberg: Springer; 2005. p. 382–92. Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii JI. Developing a robust part-of-speech tagger for biomedical text. In: Panhellenic conference on informatics. Berlin, Heidelberg: Springer; 2005. p. 382–92.
31.
go back to reference Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1.CrossRef Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1.CrossRef
33.
go back to reference Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998;17:857–72.CrossRef Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998;17:857–72.CrossRef
35.
go back to reference Rodriguez JD, Perez A, Lozano JA. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell. 2010;32(3):569–75.CrossRef Rodriguez JD, Perez A, Lozano JA. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell. 2010;32(3):569–75.CrossRef
37.
go back to reference Pencina MJ, D'Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–72.CrossRef Pencina MJ, D'Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–72.CrossRef
40.
go back to reference Cormack GV, Grossman MR. Engineering quality and reliability in technology-assisted review. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR ‘16 (pp. 75–84). New York. New York, USA: ACM Press; 2016. https://doi.org/10.1145/2911451.2911510.CrossRef Cormack GV, Grossman MR. Engineering quality and reliability in technology-assisted review. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR ‘16 (pp. 75–84). New York. New York, USA: ACM Press; 2016. https://​doi.​org/​10.​1145/​2911451.​2911510.CrossRef
43.
Metadata
Title
Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error
Authors
Alexandra Bannach-Brown
Piotr Przybyła
James Thomas
Andrew S. C. Rice
Sophia Ananiadou
Jing Liao
Malcolm Robert Macleod
Publication date
01-12-2019
Publisher
BioMed Central
Published in
Systematic Reviews / Issue 1/2019
Electronic ISSN: 2046-4053
DOI
https://doi.org/10.1186/s13643-019-0942-7

Other articles of this Issue 1/2019

Systematic Reviews 1/2019 Go to the issue