Skip to main content
Top
Published in: Systematic Reviews 1/2015

Open Access 01-12-2015 | Research

Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module

Authors: John Rathbone, Matt Carter, Tammy Hoffmann, Paul Glasziou

Published in: Systematic Reviews | Issue 1/2015

Login to get access

Abstract

Background

A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple times. Although reference management software use algorithms to remove duplicate records, this is only partially successful and necessitates removing the remaining duplicates manually. This time-consuming task leads to wasted resources. We sought to evaluate the effectiveness of a newly developed deduplication program against EndNote.

Methods

A literature search of 1,988 citations was manually inspected and duplicate citations identified and coded to create a benchmark dataset. The Systematic Review Assistant-Deduplication Module (SRA-DM) was iteratively developed and tested using the benchmark dataset and compared with EndNote’s default one step auto-deduplication process matching on (‘author’, ‘year’, ‘title’). The accuracy of deduplication was reported by calculating the sensitivity and specificity. Further validation tests, with three additional benchmarked literature searches comprising a total of 4,563 citations were performed to determine the reliability of the SRA-DM algorithm.

Results

The sensitivity (84%) and specificity (100%) of the SRA-DM was superior to EndNote (sensitivity 51%, specificity 99.83%). Validation testing on three additional biomedical literature searches demonstrated that SRA-DM consistently achieved higher sensitivity than EndNote (90% vs 63%), (84% vs 73%) and (84% vs 64%). Furthermore, the specificity of SRA-DM was 100%, whereas the specificity of EndNote was imperfect (average 99.75%) with some unique records wrongly assigned as duplicates. Overall, there was a 42.86% increase in the number of duplicates records detected with SRA-DM compared with EndNote auto-deduplication.

Conclusions

The Systematic Review Assistant-Deduplication Module offers users a reliable program to remove duplicate records with greater sensitivity and specificity than EndNote. This application will save researchers and information specialists time and avoid research waste. The deduplication program is freely available online.
Literature
1.
go back to reference Islamaj Dogan R, Murray GC, Névéol A, Lu Z: Understanding PubMed user search behavior through log analysis.Database J Biol Databases Curation 2009, 2009:1. Islamaj Dogan R, Murray GC, Névéol A, Lu Z: Understanding PubMed user search behavior through log analysis.Database J Biol Databases Curation 2009, 2009:1.
4.
go back to reference Lefebvre C, Eisinga A, McDonald S, Paul N: Enhancing access to reports of randomized trials published world-wide–the contribution of EMBASE records to the Cochrane central register of controlled trials (CENTRAL) in the Cochrane library.Emerg Themes Epidemiol 2008, 5:13. 10.1186/1742-7622-5-13CrossRefPubMedPubMedCentral Lefebvre C, Eisinga A, McDonald S, Paul N: Enhancing access to reports of randomized trials published world-wide–the contribution of EMBASE records to the Cochrane central register of controlled trials (CENTRAL) in the Cochrane library.Emerg Themes Epidemiol 2008, 5:13. 10.1186/1742-7622-5-13CrossRefPubMedPubMedCentral
5.
go back to reference Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH: Semi-automated screening of biomedical citations for systematic reviews.BMC Bioinformatics 2010, 11:1–11. 10.1186/1471-2105-11-1CrossRef Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH: Semi-automated screening of biomedical citations for systematic reviews.BMC Bioinformatics 2010, 11:1–11. 10.1186/1471-2105-11-1CrossRef
6.
go back to reference Sampson M, McGowan J, Cogo E, Horsley T: Managing database overlap in systematic reviews using batch citation matcher: case studies using scopus.J Med Libr Assoc 2006, 94:461–463.PubMedPubMedCentral Sampson M, McGowan J, Cogo E, Horsley T: Managing database overlap in systematic reviews using batch citation matcher: case studies using scopus.J Med Libr Assoc 2006, 94:461–463.PubMedPubMedCentral
7.
go back to reference Sievert MC, Andrews MJ: Indexing consistency in information science abstracts.J Am Soc Inf Sci 1991, 42:1–6. 10.1002/(SICI)1097-4571(199101)42:1<1::AID-ASI1>3.0.CO;2-9CrossRef Sievert MC, Andrews MJ: Indexing consistency in information science abstracts.J Am Soc Inf Sci 1991, 42:1–6. 10.1002/(SICI)1097-4571(199101)42:1<1::AID-ASI1>3.0.CO;2-9CrossRef
8.
go back to reference Smith B, Darzins P, Quinn M, Heller R: Modern methods of searching the medical literature.Med J Aust 1992, 2:603–611. Smith B, Darzins P, Quinn M, Heller R: Modern methods of searching the medical literature.Med J Aust 1992, 2:603–611.
9.
go back to reference Kleijnen J, Knipschild P: The comprehensiveness of MEDLINE and Embase computer searches. Searches for controlled trials of homoeopathy, ascorbic acid for common cold and ginkgo biloba for cerebral insufficiency and intermittent claudication.Pharm Weekbl Sci 1992, 14:316–320. 10.1007/BF01977620CrossRefPubMed Kleijnen J, Knipschild P: The comprehensiveness of MEDLINE and Embase computer searches. Searches for controlled trials of homoeopathy, ascorbic acid for common cold and ginkgo biloba for cerebral insufficiency and intermittent claudication.Pharm Weekbl Sci 1992, 14:316–320. 10.1007/BF01977620CrossRefPubMed
10.
go back to reference Odaka T, Nakayama A, Akazawa K, Sakamoto M, Kinukawa N, Kamakura T, Nishioka Y, Itasaka H, Watanabe Y, Nose Y: The effect of a multiple literature database search–a numerical evaluation in the domain of Japanese life science.J Med Syst 1992, 16:177–181. 10.1007/BF00999380CrossRefPubMed Odaka T, Nakayama A, Akazawa K, Sakamoto M, Kinukawa N, Kamakura T, Nishioka Y, Itasaka H, Watanabe Y, Nose Y: The effect of a multiple literature database search–a numerical evaluation in the domain of Japanese life science.J Med Syst 1992, 16:177–181. 10.1007/BF00999380CrossRefPubMed
11.
go back to reference Rovers JP, Janosik JE, Souney PF: Crossover comparison of drug information online database vendors: dialog and MEDLARS.Ann Pharmacother 1993, 27:634–639.CrossRefPubMed Rovers JP, Janosik JE, Souney PF: Crossover comparison of drug information online database vendors: dialog and MEDLARS.Ann Pharmacother 1993, 27:634–639.CrossRefPubMed
12.
go back to reference Ramos-Remus C, Suarez-Almazor M, Dorgan M, Gomez-Vargas A, Russell AS: Performance of online biomedical databases in rheumatology.J Rheumatol 1994, 21:1912–1921.PubMed Ramos-Remus C, Suarez-Almazor M, Dorgan M, Gomez-Vargas A, Russell AS: Performance of online biomedical databases in rheumatology.J Rheumatol 1994, 21:1912–1921.PubMed
13.
go back to reference Royle P, Milne R: Literature searching for randomized controlled trials used in Cochrane reviews: rapid versus exhaustive searches.Int J Technol Assess Health Care 2003, 19:591–603.CrossRefPubMed Royle P, Milne R: Literature searching for randomized controlled trials used in Cochrane reviews: rapid versus exhaustive searches.Int J Technol Assess Health Care 2003, 19:591–603.CrossRefPubMed
18.
go back to reference Qi X, Yang M, Ren W, Jia J, Wang J, Han G, Fan D: Find duplicates among the PubMed, EMBASE, and Cochrane library databases in systematic review.PLoS One 2013, 8:e71838. 10.1371/journal.pone.0071838CrossRefPubMedPubMedCentral Qi X, Yang M, Ren W, Jia J, Wang J, Han G, Fan D: Find duplicates among the PubMed, EMBASE, and Cochrane library databases in systematic review.PLoS One 2013, 8:e71838. 10.1371/journal.pone.0071838CrossRefPubMedPubMedCentral
19.
go back to reference Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, Michie S, Moher D, Wager E: Reducing waste from incomplete or unusable reports of biomedical research.Lancet 2014, 383:267–276. 10.1016/S0140-6736(13)62228-XCrossRefPubMed Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, Michie S, Moher D, Wager E: Reducing waste from incomplete or unusable reports of biomedical research.Lancet 2014, 383:267–276. 10.1016/S0140-6736(13)62228-XCrossRefPubMed
20.
go back to reference Chan AW, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, Krumholz HM, Ghersi D, van der Worp HB: Increasing value and reducing waste: addressing inaccessible research.Lancet 2014, 383:257–266. 10.1016/S0140-6736(13)62296-5CrossRefPubMedPubMedCentral Chan AW, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, Krumholz HM, Ghersi D, van der Worp HB: Increasing value and reducing waste: addressing inaccessible research.Lancet 2014, 383:257–266. 10.1016/S0140-6736(13)62296-5CrossRefPubMedPubMedCentral
21.
go back to reference Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gülmezoglu AM, Howells DW, Ioannidis JP, Oliver S: How to increase value and reduce waste when research priorities are set.Lancet 2014, 383:156–165. 10.1016/S0140-6736(13)62229-1CrossRefPubMed Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gülmezoglu AM, Howells DW, Ioannidis JP, Oliver S: How to increase value and reduce waste when research priorities are set.Lancet 2014, 383:156–165. 10.1016/S0140-6736(13)62229-1CrossRefPubMed
22.
go back to reference Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R: Increasing value and reducing waste in research design, conduct, and analysis.Lancet 2014, 383:166–175. 10.1016/S0140-6736(13)62227-8CrossRefPubMedPubMedCentral Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R: Increasing value and reducing waste in research design, conduct, and analysis.Lancet 2014, 383:166–175. 10.1016/S0140-6736(13)62227-8CrossRefPubMedPubMedCentral
23.
go back to reference Jiang Y, Lin C, Meng W, Yu C, Cohen AM, Smalheiser NR: Rule-based deduplication of article records from bibliographic databases.Database (Oxford) 2014, 2014:1–7.CrossRef Jiang Y, Lin C, Meng W, Yu C, Cohen AM, Smalheiser NR: Rule-based deduplication of article records from bibliographic databases.Database (Oxford) 2014, 2014:1–7.CrossRef
Metadata
Title
Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module
Authors
John Rathbone
Matt Carter
Tammy Hoffmann
Paul Glasziou
Publication date
01-12-2015
Publisher
BioMed Central
Published in
Systematic Reviews / Issue 1/2015
Electronic ISSN: 2046-4053
DOI
https://doi.org/10.1186/2046-4053-4-6

Other articles of this Issue 1/2015

Systematic Reviews 1/2015 Go to the issue