Skip to main content
Top
Published in: Systematic Reviews 1/2019

Open Access 01-12-2019 | Methodology

Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy

Authors: Christopher R. Norman, Mariska M. G. Leeflang, Raphaël Porcher, Aurélie Névéol

Published in: Systematic Reviews | Issue 1/2019

Login to get access

Abstract

Background

The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy.

Methods

We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates.

Results

The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations.

Conclusion

Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.
Literature
1.
go back to reference Elliott JH, Turner T, Clavisi O, Thomas J, Higgins JP, Mavergames C, Gruen RL. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014; 11(2):1001603.CrossRef Elliott JH, Turner T, Clavisi O, Thomas J, Higgins JP, Mavergames C, Gruen RL. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014; 11(2):1001603.CrossRef
2.
go back to reference Beynon R, Leeflang MM, McDonald S, Eisinga A, Mitchell RL, Whiting P, Glanville JM. Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE. Cochrane Database Syst Rev. 2013; 9(9):1–34. Beynon R, Leeflang MM, McDonald S, Eisinga A, Mitchell RL, Whiting P, Glanville JM. Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE. Cochrane Database Syst Rev. 2013; 9(9):1–34.
3.
go back to reference Leeflang MM, Scholten RJ, Rutjes AW, Reitsma JB, Bossuyt PM. Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. J Clin Epidemiol. 2006; 59(3):234–40.PubMedCrossRef Leeflang MM, Scholten RJ, Rutjes AW, Reitsma JB, Bossuyt PM. Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. J Clin Epidemiol. 2006; 59(3):234–40.PubMedCrossRef
5.
go back to reference Petersen H, Poon J, Poon SK, Loy C. Increased workload for systematic review literature searches of diagnostic tests compared with treatments: challenges and opportunities. JMIR Med Inform. 2014; 2(1):11.CrossRef Petersen H, Poon J, Poon SK, Loy C. Increased workload for systematic review literature searches of diagnostic tests compared with treatments: challenges and opportunities. JMIR Med Inform. 2014; 2(1):11.CrossRef
6.
go back to reference Kanoulas E, Li D, Azzopardi L, Spijker R. Overview of the CLEF technologically assisted reviews in empirical medicine. In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11-14, 2017. CEUR Workshop Proceedings. Padua: CEUR-WS.org: 2017. Kanoulas E, Li D, Azzopardi L, Spijker R. Overview of the CLEF technologically assisted reviews in empirical medicine. In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum, Dublin, Ireland, September 11-14, 2017. CEUR Workshop Proceedings. Padua: CEUR-WS.org: 2017.
7.
go back to reference Kanoulas E, Li D, Azzopardi L, Spijker R. Overview of the CLEF technologically assisted reviews in empirical medicine 2018. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings. Padua: Conference and Labs of the Evaluation Forum: 2018. Kanoulas E, Li D, Azzopardi L, Spijker R. Overview of the CLEF technologically assisted reviews in empirical medicine 2018. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings. Padua: Conference and Labs of the Evaluation Forum: 2018.
13.
go back to reference Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005; 58(10):982–90.PubMedCrossRef Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005; 58(10):982–90.PubMedCrossRef
14.
go back to reference Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001; 20(19):2865–84.PubMedCrossRef Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001; 20(19):2865–84.PubMedCrossRef
15.
go back to reference Gargon E, Gurung B, Medley N, Altman DG, Blazeby JM, Clarke M, Williamson PR. Choosing important health outcomes for comparative effectiveness research: a systematic review. PloS ONE. 2014; 9(6):99111.CrossRef Gargon E, Gurung B, Medley N, Altman DG, Blazeby JM, Clarke M, Williamson PR. Choosing important health outcomes for comparative effectiveness research: a systematic review. PloS ONE. 2014; 9(6):99111.CrossRef
16.
go back to reference Booth A. How much searching is enough? comprehensive versus optimal retrieval for technology assessments. Int J Technol Assess Health Care. 2010; 26(4):431–5.PubMedCrossRef Booth A. How much searching is enough? comprehensive versus optimal retrieval for technology assessments. Int J Technol Assess Health Care. 2010; 26(4):431–5.PubMedCrossRef
17.
18.
go back to reference Egger M, Smith GD. Bias in location and selection of studies. BMJ Br Med J. 1998; 316(7124):61.CrossRef Egger M, Smith GD. Bias in location and selection of studies. BMJ Br Med J. 1998; 316(7124):61.CrossRef
19.
go back to reference Kung J, Chiappelli F, Cajulis OO, Avezova R, Kossan G, Chew L, Maida CA. From systematic reviews to clinical recommendations for evidence-based health care: validation of revised assessment of multiple systematic reviews (r-amstar) for grading of clinical relevance. Open Dent J. 2010; 4:84.PubMedPubMedCentral Kung J, Chiappelli F, Cajulis OO, Avezova R, Kossan G, Chew L, Maida CA. From systematic reviews to clinical recommendations for evidence-based health care: validation of revised assessment of multiple systematic reviews (r-amstar) for grading of clinical relevance. Open Dent J. 2010; 4:84.PubMedPubMedCentral
20.
go back to reference Shea BJ, Hamel C, Wells GA, Bouter LM, Kristjansson E, Grimshaw J, Henry DA, Boers M. Amstar is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol. 2009; 62(10):1013–20.PubMedCrossRef Shea BJ, Hamel C, Wells GA, Bouter LM, Kristjansson E, Grimshaw J, Henry DA, Boers M. Amstar is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol. 2009; 62(10):1013–20.PubMedCrossRef
21.
go back to reference Tricco AC, Antony J, Zarin W, Strifler L, Ghassemi M, Ivory J, Perrier L, Hutton B, Moher D, Straus SE. A scoping review of rapid review methods. BMC Med. 2015; 13(1):224.PubMedPubMedCentralCrossRef Tricco AC, Antony J, Zarin W, Strifler L, Ghassemi M, Ivory J, Perrier L, Hutton B, Moher D, Straus SE. A scoping review of rapid review methods. BMC Med. 2015; 13(1):224.PubMedPubMedCentralCrossRef
22.
go back to reference Marshall IJ, Marshall R, Wallace BC, Brassey J, Thomas J. Rapid reviews may produce different results to systematic reviews: a meta-epidemiological study. J Clin Epidemiol. 2019; 109:30–41.PubMedPubMedCentralCrossRef Marshall IJ, Marshall R, Wallace BC, Brassey J, Thomas J. Rapid reviews may produce different results to systematic reviews: a meta-epidemiological study. J Clin Epidemiol. 2019; 109:30–41.PubMedPubMedCentralCrossRef
23.
go back to reference Nussbaumer-Streit B, Klerings I, Wagner G, Heise TL, Dobrescu AI, Armijo-Olivo S, Stratil JM, Persad E, Lhachimi SK, Van Noord MG, et al.Abbreviated literature searches were viable alternatives to comprehensive searches: a meta-epidemiological study. J Clin Epidemiol. 2018; 102:1–11.PubMedCrossRef Nussbaumer-Streit B, Klerings I, Wagner G, Heise TL, Dobrescu AI, Armijo-Olivo S, Stratil JM, Persad E, Lhachimi SK, Van Noord MG, et al.Abbreviated literature searches were viable alternatives to comprehensive searches: a meta-epidemiological study. J Clin Epidemiol. 2018; 102:1–11.PubMedCrossRef
24.
go back to reference Halladay CW, Trikalinos TA, Schmid IT, Schmid CH, Dahabreh IJ. Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. J Clin Epidemiol. 2015; 68(9):1076–84.PubMedCrossRef Halladay CW, Trikalinos TA, Schmid IT, Schmid CH, Dahabreh IJ. Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. J Clin Epidemiol. 2015; 68(9):1076–84.PubMedCrossRef
25.
go back to reference Sampson M, Barrowman NJ, Moher D, Klassen TP, Platt R, John PDS, Viola R, Raina P, et al.Should meta-analysts search embase in addition to medline?. J Clin Epidemiol. 2003; 56(10):943–55.PubMedCrossRef Sampson M, Barrowman NJ, Moher D, Klassen TP, Platt R, John PDS, Viola R, Raina P, et al.Should meta-analysts search embase in addition to medline?. J Clin Epidemiol. 2003; 56(10):943–55.PubMedCrossRef
26.
go back to reference Egger M, Juni P, Bartlett C, Holenstein F, Sterne J, et al. How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? empirical study. Health Technol Assess. 2003; 7(1):1–76.PubMed Egger M, Juni P, Bartlett C, Holenstein F, Sterne J, et al. How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? empirical study. Health Technol Assess. 2003; 7(1):1–76.PubMed
27.
go back to reference Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. Grey literature in systematic reviews: a cross-sectional study of the contribution of non-english reports, unpublished studies and dissertations to the results of meta-analyses in child-relevant reviews. BMC Med Res Methodol. 2017; 17(1):64.PubMedPubMedCentralCrossRef Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. Grey literature in systematic reviews: a cross-sectional study of the contribution of non-english reports, unpublished studies and dissertations to the results of meta-analyses in child-relevant reviews. BMC Med Res Methodol. 2017; 17(1):64.PubMedPubMedCentralCrossRef
28.
go back to reference Booth A. Over 85% of included studies in systematic reviews are on MEDLINE. J Clin Epidemiol. 2016; 79:165–6.PubMedCrossRef Booth A. Over 85% of included studies in systematic reviews are on MEDLINE. J Clin Epidemiol. 2016; 79:165–6.PubMedCrossRef
29.
go back to reference Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF. Text categorization models for high-quality article retrieval in internal medicine. J Am Med Inform Assoc. 2005; 12(2):207–16.PubMedPubMedCentralCrossRef Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF. Text categorization models for high-quality article retrieval in internal medicine. J Am Med Inform Assoc. 2005; 12(2):207–16.PubMedPubMedCentralCrossRef
30.
go back to reference Dobrokhotov PB, Goutte C, Veuthey A-L, Gaussier E. Assisting medical annotation in swiss-prot using statistical classifiers. Int J Med Inform. 2005; 74(2-4):317–24.PubMedCrossRef Dobrokhotov PB, Goutte C, Veuthey A-L, Gaussier E. Assisting medical annotation in swiss-prot using statistical classifiers. Int J Med Inform. 2005; 74(2-4):317–24.PubMedCrossRef
33.
go back to reference Wallace BC, Small K, Brodley CE, Lau J, Trikalinos Ta. Deploying an interactive machine learning system in an evidence-based practice center. Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI ’12, 819. 2012. https://doi.org/10.1145/2110363.2110464. Wallace BC, Small K, Brodley CE, Lau J, Trikalinos Ta. Deploying an interactive machine learning system in an evidence-based practice center. Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI ’12, 819. 2012. https://​doi.​org/​10.​1145/​2110363.​2110464.
34.
go back to reference Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, et al.Swift-review: a text-mining workbench for systematic review. Syst Rev. 2016; 5(1):87.PubMedPubMedCentralCrossRef Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, et al.Swift-review: a text-mining workbench for systematic review. Syst Rev. 2016; 5(1):87.PubMedPubMedCentralCrossRef
35.
go back to reference Przybyła P, Brockmeier AJ, Kontonatsios G, Le Pogam M-A, McNaught J, von Elm E, Nolan K, Ananiadou S. Prioritising references for systematic reviews with robotanalyst: a user study. Res Synth Methods. 2018; 9(3):470–88.PubMedPubMedCentral Przybyła P, Brockmeier AJ, Kontonatsios G, Le Pogam M-A, McNaught J, von Elm E, Nolan K, Ananiadou S. Prioritising references for systematic reviews with robotanalyst: a user study. Res Synth Methods. 2018; 9(3):470–88.PubMedPubMedCentral
36.
go back to reference Cormack GV, Grossman MR. Technology-assisted review in empirical medicine: waterloo participation in clef ehealth 2017. In: CLEF (Working Notes). Padua : Conference and Labs of the Evaluation Forum: 2017. Cormack GV, Grossman MR. Technology-assisted review in empirical medicine: waterloo participation in clef ehealth 2017. In: CLEF (Working Notes). Padua : Conference and Labs of the Evaluation Forum: 2017.
37.
go back to reference Olorisade BK, de Quincey E, Brereton P, Andras P. A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. New York: ACM: 2016. p. 14. Olorisade BK, de Quincey E, Brereton P, Andras P. A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. New York: ACM: 2016. p. 14.
38.
go back to reference Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association (JAMIA). 2011; 18 No 5:540–543. Oxford University Press.CrossRef Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association (JAMIA). 2011; 18 No 5:540–543. Oxford University Press.CrossRef
39.
40.
go back to reference Bannach-Brown A, Przybyła P, Thomas J, Rice AS, Ananiadou S, Liao J, Macleod MR. Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Syst Rev. 2019; 8(1):23.PubMedPubMedCentralCrossRef Bannach-Brown A, Przybyła P, Thomas J, Rice AS, Ananiadou S, Liao J, Macleod MR. Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Syst Rev. 2019; 8(1):23.PubMedPubMedCentralCrossRef
41.
go back to reference Lerner I, Créquit P, Ravaud P, Atal I. Automatic screening using word embeddings achieved high sensitivity and workload reduction for updating living network meta-analyses. J Clin Epidemiol. 2019; 108:86–94.PubMedCrossRef Lerner I, Créquit P, Ravaud P, Atal I. Automatic screening using word embeddings achieved high sensitivity and workload reduction for updating living network meta-analyses. J Clin Epidemiol. 2019; 108:86–94.PubMedCrossRef
42.
go back to reference Norman C, Leeflang M, Névéol A. Data extraction and synthesis in systematic reviews of diagnostic test accuracy: a corpus for automating and evaluating the process. In: AMIA Annual Symposium Proceedings, vol. 2018. Bethesda, Maryland: American Medical Informatics Association: 2018. p. 817. Norman C, Leeflang M, Névéol A. Data extraction and synthesis in systematic reviews of diagnostic test accuracy: a corpus for automating and evaluating the process. In: AMIA Annual Symposium Proceedings, vol. 2018. Bethesda, Maryland: American Medical Informatics Association: 2018. p. 817.
44.
go back to reference Norman C, Leeflang M, Névéol A. Limsi@ clef ehealth 2018 task 2: Technology assisted reviews by stacking active and static learning. CLEF (Working Notes). 2018; 2125:1–13. Padua: Conference and Labs of the Evaluation Forum. Norman C, Leeflang M, Névéol A. Limsi@ clef ehealth 2018 task 2: Technology assisted reviews by stacking active and static learning. CLEF (Working Notes). 2018; 2125:1–13. Padua: Conference and Labs of the Evaluation Forum.
45.
go back to reference Cormack GV, Grossman MR. Engineering quality and reliability in technology-assisted review. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM: 2016. p. 75–84. Cormack GV, Grossman MR. Engineering quality and reliability in technology-assisted review. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM: 2016. p. 75–84.
46.
go back to reference Cormack GV, Grossman MR. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv preprint. 2015; 1504.06868:1–19. arXiv. Cormack GV, Grossman MR. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv preprint. 2015; 1504.06868:1–19. arXiv.
47.
go back to reference Thorlund K, Imberger G, Johnston BC, Walsh M, Awad T, Thabane L, Gluud C, Devereaux P, Wetterslev J. Evolution of heterogeneity (I2) estimates and their 95% confidence intervals in large meta-analyses. PloS ONE. 2012; 7(7):39471.CrossRef Thorlund K, Imberger G, Johnston BC, Walsh M, Awad T, Thabane L, Gluud C, Devereaux P, Wetterslev J. Evolution of heterogeneity (I2) estimates and their 95% confidence intervals in large meta-analyses. PloS ONE. 2012; 7(7):39471.CrossRef
48.
go back to reference Cohen AM, Ambert K, McDonagh M. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Making. 2012; 12(1):33.CrossRef Cohen AM, Ambert K, McDonagh M. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Making. 2012; 12(1):33.CrossRef
49.
go back to reference Satopää V, Albrecht J, Irwin D, Raghavan B. Finding a. In: 2011 31st International Conference on Distributed Computing Systems Workshops. Piscataway: IEEE: 2011. p. 166–71. Satopää V, Albrecht J, Irwin D, Raghavan B. Finding a. In: 2011 31st International Conference on Distributed Computing Systems Workshops. Piscataway: IEEE: 2011. p. 166–71.
50.
go back to reference Tran V-T, Porcher R, Tran V-C, Ravaud P. Predicting data saturation in qualitative surveys with mathematical models from ecological research. J Clin Epidemiol. 2017; 82:71–8.PubMedCrossRef Tran V-T, Porcher R, Tran V-C, Ravaud P. Predicting data saturation in qualitative surveys with mathematical models from ecological research. J Clin Epidemiol. 2017; 82:71–8.PubMedCrossRef
51.
go back to reference Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005; 21(15):3301–7.PubMedCrossRef Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005; 21(15):3301–7.PubMedCrossRef
52.
go back to reference Jadad AR, Cook DJ, Jones A, Klassen TP, Tugwell P, Moher M, Moher D. Methodology and reports of systematic reviews and meta-analyses: a comparison of cochrane reviews with articles published in paper-based journals. JAMA. 1998; 280(3):278–80.PubMedCrossRef Jadad AR, Cook DJ, Jones A, Klassen TP, Tugwell P, Moher M, Moher D. Methodology and reports of systematic reviews and meta-analyses: a comparison of cochrane reviews with articles published in paper-based journals. JAMA. 1998; 280(3):278–80.PubMedCrossRef
Metadata
Title
Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
Authors
Christopher R. Norman
Mariska M. G. Leeflang
Raphaël Porcher
Aurélie Névéol
Publication date
01-12-2019
Publisher
BioMed Central
Published in
Systematic Reviews / Issue 1/2019
Electronic ISSN: 2046-4053
DOI
https://doi.org/10.1186/s13643-019-1162-x

Other articles of this Issue 1/2019

Systematic Reviews 1/2019 Go to the issue