Skip to main content
Top
Published in: Systematic Reviews 1/2022

Open Access 01-12-2022 | Artificial Intelligence | Research

Reducing systematic review burden using Deduklick: a novel, automated, reliable, and explainable deduplication algorithm to foster medical research

Authors: Nikolay Borissov, Quentin Haas, Beatrice Minder, Doris Kopp-Heim, Marc von Gernler, Heidrun Janka, Douglas Teodoro, Poorya Amini

Published in: Systematic Reviews | Issue 1/2022

Login to get access

Abstract

Background

Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check for duplicates using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, and rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with a set of rules created by expert information specialists.

Methods

Deduklick’s deduplication uses a multistep algorithm of data normalization, calculates a similarity score, and identifies unique and duplicate references based on metadata fields, such as title, authors, journal, DOI, year, issue, volume, and page number range. We measured and compared Deduklick’s capacity to accurately detect duplicates with the information specialists’ standard, manual duplicate removal process using EndNote on eight existing heterogeneous datasets. Using a sensitivity analysis, we manually cross-compared the efficiency and noise of both methods.

Discussion

Deduklick achieved average recall of 99.51%, average precision of 100.00%, and average F1 score of 99.75%. In contrast, the manual deduplication process achieved average recall of 88.65%, average precision of 99.95%, and average F1 score of 91.98%. Deduklick achieved equal to higher expert-level performance on duplicate removal. It also preserved high metadata quality and drastically reduced time spent on analysis. Deduklick represents an efficient, transparent, ergonomic, and time-saving solution for identifying and removing duplicates in SRs searches. Deduklick could therefore simplify SRs production and represent important advantages for scientists, including saving time, increasing accuracy, reducing costs, and contributing to quality SRs.
Appendix
Available only for authorised users
Literature
1.
go back to reference Grant MJ, Booth A. A typology of reviews: an analysis of 14 review types and associated methodologies. Health Info Libr J. 2009;26(2):91–108.CrossRef Grant MJ, Booth A. A typology of reviews: an analysis of 14 review types and associated methodologies. Health Info Libr J. 2009;26(2):91–108.CrossRef
2.
go back to reference Nagendrababu V, Dilokthornsakul P, Jinatongthai P, et al. Glossary for systematic reviews and meta-analyses. Int Endod J. 2019;53(2):232–49.CrossRef Nagendrababu V, Dilokthornsakul P, Jinatongthai P, et al. Glossary for systematic reviews and meta-analyses. Int Endod J. 2019;53(2):232–49.CrossRef
3.
go back to reference Clark J, Glasziou P, Del Mar C, Bannach-Brown A, Stehlik P, Scott AM. A full systematic review was completed in 2 weeks using automation tools: a case study. J Clin Epidemiol. 2020;121:81–90.CrossRef Clark J, Glasziou P, Del Mar C, Bannach-Brown A, Stehlik P, Scott AM. A full systematic review was completed in 2 weeks using automation tools: a case study. J Clin Epidemiol. 2020;121:81–90.CrossRef
4.
go back to reference Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7(2):e012545.CrossRef Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7(2):e012545.CrossRef
6.
go back to reference Qi X, Yang M, Ren W, et al. Find duplicates among the PubMed, Embase, and Cochrane Library databases in systematic review. PLoS ONE. 2013;8(8):e71838.CrossRef Qi X, Yang M, Ren W, et al. Find duplicates among the PubMed, Embase, and Cochrane Library databases in systematic review. PLoS ONE. 2013;8(8):e71838.CrossRef
8.
go back to reference Westgate MJ. revtools: an R package to support article screening for evidence synthesis. Res Synth Methods. 2019;10(4):606–14.CrossRef Westgate MJ. revtools: an R package to support article screening for evidence synthesis. Res Synth Methods. 2019;10(4):606–14.CrossRef
11.
go back to reference Bannach-Brown A, Hair K, Bahor Z, Soliman N, Macleod M, Liao J. Technological advances in preclinical meta-research. BMJ Open Science. 2021;5(1):e100131.CrossRef Bannach-Brown A, Hair K, Bahor Z, Soliman N, Macleod M, Liao J. Technological advances in preclinical meta-research. BMJ Open Science. 2021;5(1):e100131.CrossRef
12.
14.
go back to reference Emanuel J. Users and citation management tools: use and support. Ref Serv Rev. 2013;41(4):639–59.CrossRef Emanuel J. Users and citation management tools: use and support. Ref Serv Rev. 2013;41(4):639–59.CrossRef
15.
go back to reference Peters MDJ. Managing and coding references for systematic reviews and scoping reviews in EndNote. Med Ref Serv Q. 2017;36(1):19–31.CrossRef Peters MDJ. Managing and coding references for systematic reviews and scoping reviews in EndNote. Med Ref Serv Q. 2017;36(1):19–31.CrossRef
17.
go back to reference Qi X-S. Duplicates in systematic reviews: a critical, but often neglected issue. World J Meta-Anal. 2013;1(3):97.CrossRef Qi X-S. Duplicates in systematic reviews: a critical, but often neglected issue. World J Meta-Anal. 2013;1(3):97.CrossRef
19.
go back to reference Page MJ, Mckenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.CrossRef Page MJ, Mckenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.CrossRef
20.
go back to reference Cohen IG, Evgeniou T, Gerke S, Minssen T. The European artificial intelligence strategy: implications and challenges for digital health. Lancet Digital Health. 2020;2(7):e376–9.CrossRef Cohen IG, Evgeniou T, Gerke S, Minssen T. The European artificial intelligence strategy: implications and challenges for digital health. Lancet Digital Health. 2020;2(7):e376–9.CrossRef
Metadata
Title
Reducing systematic review burden using Deduklick: a novel, automated, reliable, and explainable deduplication algorithm to foster medical research
Authors
Nikolay Borissov
Quentin Haas
Beatrice Minder
Doris Kopp-Heim
Marc von Gernler
Heidrun Janka
Douglas Teodoro
Poorya Amini
Publication date
01-12-2022
Publisher
BioMed Central
Published in
Systematic Reviews / Issue 1/2022
Electronic ISSN: 2046-4053
DOI
https://doi.org/10.1186/s13643-022-02045-9

Other articles of this Issue 1/2022

Systematic Reviews 1/2022 Go to the issue