Skip to main content
Top
Published in: Systematic Reviews 1/2015

Open Access 01-12-2015 | Research

Automating data extraction in systematic reviews: a systematic review

Authors: Siddhartha R. Jonnalagadda, Pawan Goyal, Mark D. Huffman

Published in: Systematic Reviews | Issue 1/2015

Login to get access

Abstract

Background

Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.

Methods

We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.

Results

Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.

Conclusions

We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.
Appendix
Available only for authorised users
Literature
2.
go back to reference Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews, NHS Centre for Reviews and Dissemination. 2001. Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews, NHS Centre for Reviews and Dissemination. 2001.
3.
go back to reference Woolf SH. Manual for conducting systematic reviews, Agency for Health Care Policy and Research. 1996. Woolf SH. Manual for conducting systematic reviews, Agency for Health Care Policy and Research. 1996.
4.
go back to reference Field MJ, Lohr KN. Clinical practice guidelines: directions for a new program, Clinical Practice Guidelines. 1990. Field MJ, Lohr KN. Clinical practice guidelines: directions for a new program, Clinical Practice Guidelines. 1990.
5.
go back to reference Elliott J, Turner T, Clavisi O, Thomas J, Higgins J, Mavergames C, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11:e1001603.PubMedPubMedCentralCrossRef Elliott J, Turner T, Clavisi O, Thomas J, Higgins J, Mavergames C, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11:e1001603.PubMedPubMedCentralCrossRef
6.
go back to reference Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33.PubMedCrossRef Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33.PubMedCrossRef
7.
go back to reference Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. p. 3–10. Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. p. 3–10.
9.
go back to reference Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276(8):637–9.PubMedCrossRef Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276(8):637–9.PubMedCrossRef
10.
go back to reference Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem Lab Med. 2003;41(1):68–73. doi:10.1515/CCLM.2003.012.PubMedCrossRef Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem Lab Med. 2003;41(1):68–73. doi:10.​1515/​CCLM.​2003.​012.PubMedCrossRef
11.
go back to reference Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.PubMed Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.PubMed
12.
go back to reference Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie J-Y. The identification of clinically important elements within medical journal abstracts: Patient–Population–Problem, Exposure–Intervention, Comparison, Outcome, Duration and Results (PECODR). Inform Prim Care. 2007;15(1):9–16.PubMed Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie J-Y. The identification of clinically important elements within medical journal abstracts: Patient–Population–Problem, Exposure–Intervention, Comparison, Outcome, Duration and Results (PECODR). Inform Prim Care. 2007;15(1):9–16.PubMed
13.
go back to reference Kim S, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support evidence based medicine. BMC Bioinform. 2011;12 Suppl 2:S5.CrossRef Kim S, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support evidence based medicine. BMC Bioinform. 2011;12 Suppl 2:S5.CrossRef
14.
go back to reference Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3(1):25.PubMedPubMedCentralCrossRef Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3(1):25.PubMedPubMedCentralCrossRef
15.
go back to reference Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning. 2001. p. 282–9. %L 3140. Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning. 2001. p. 282–9. %L 3140.
17.
go back to reference Huang K-C, Liu C-H, Yang S-S, Liao C-C, Xiao F, Wong J-M, et al, editors. Classification of PICO elements by text features systematically extracted from PubMed abstracts. Granular Computing (GrC), 2011 IEEE International Conference on; 2011: IEEE. Huang K-C, Liu C-H, Yang S-S, Liao C-C, Xiao F, Wong J-M, et al, editors. Classification of PICO elements by text features systematically extracted from PubMed abstracts. Granular Computing (GrC), 2011 IEEE International Conference on; 2011: IEEE.
18.
go back to reference Verbeke M, Van Asch V, Morante R, Frasconi P, Daelemans W, De Raedt L, editors. A statistical relational learning approach to identifying evidence based medicine categories. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012: Association for Computational Linguistics. Verbeke M, Van Asch V, Morante R, Frasconi P, Daelemans W, De Raedt L, editors. A statistical relational learning approach to identifying evidence based medicine categories. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012: Association for Computational Linguistics.
19.
go back to reference Huang K-C, Chiang IJ, Xiao F, Liao C-C, Liu CC-H, Wong J-M. PICO element detection in medical text without metadata: are first sentences enough? J Biomed Inform. 2013;46(5):940–6.PubMedCrossRef Huang K-C, Chiang IJ, Xiao F, Liao C-C, Liu CC-H, Wong J-M. PICO element detection in medical text without metadata: are first sentences enough? J Biomed Inform. 2013;46(5):940–6.PubMedCrossRef
20.
go back to reference Hassanzadeh H, Groza T, Hunter J. Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J Biomed Inform. 2014;49:159–70.PubMedCrossRef Hassanzadeh H, Groza T, Hunter J. Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J Biomed Inform. 2014;49:159–70.PubMedCrossRef
21.
go back to reference Robinson DA. Finding patient-oriented evidence in PubMed abstracts. Athens: University of Georgia; 2012. Robinson DA. Finding patient-oriented evidence in PubMed abstracts. Athens: University of Georgia; 2012.
22.
go back to reference Chung GY-C. Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions. J Biomed Inform. 2009;42(5):790–800.PubMedCrossRef Chung GY-C. Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions. J Biomed Inform. 2009;42(5):790–800.PubMedCrossRef
23.
go back to reference Hara K, Matsumoto Y. Extracting clinical trial design information from MEDLINE abstracts. N Gener Comput. 2007;25(3):263–75.CrossRef Hara K, Matsumoto Y. Extracting clinical trial design information from MEDLINE abstracts. N Gener Comput. 2007;25(3):263–75.CrossRef
24.
go back to reference Zhao J, Bysani P, Kan MY. Exploiting classification correlations for the extraction of evidence-based practice information. AMIA Annu Symp Proc. 2012;2012:1070–8.PubMedPubMedCentral Zhao J, Bysani P, Kan MY. Exploiting classification correlations for the extraction of evidence-based practice information. AMIA Annu Symp Proc. 2012;2012:1070–8.PubMedPubMedCentral
25.
go back to reference Hsu W, Speier W, Taira R. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA Annu Symp Proc. 2012;2012:350–9.PubMedPubMedCentral Hsu W, Speier W, Taira R. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA Annu Symp Proc. 2012;2012:350–9.PubMedPubMedCentral
26.
go back to reference Song MH, Lee YH, Kang UG. Comparison of machine learning algorithms for classification of the sentences in three clinical practice guidelines. Healthcare Informatics Res. 2013;19(1):16–24.CrossRef Song MH, Lee YH, Kang UG. Comparison of machine learning algorithms for classification of the sentences in three clinical practice guidelines. Healthcare Informatics Res. 2013;19(1):16–24.CrossRef
27.
go back to reference Marshall IJ, Kuiper J, Wallace BC, editors. Automating risk of bias assessment for clinical trials. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics; 2014: ACM. Marshall IJ, Kuiper J, Wallace BC, editors. Automating risk of bias assessment for clinical trials. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics; 2014: ACM.
28.
go back to reference Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.CrossRef Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.CrossRef
31.
go back to reference Joachims T. Text categorization with support vector machines: learning with many relevant features, Machine Learning: ECML-98, Tenth European Conference on Machine Learning. 1998. p. 137–42. Joachims T. Text categorization with support vector machines: learning with many relevant features, Machine Learning: ECML-98, Tenth European Conference on Machine Learning. 1998. p. 137–42.
32.
go back to reference Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM. Extracting subject demographic information from abstracts of randomized clinical trial reports. 2007. Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM. Extracting subject demographic information from abstracts of randomized clinical trial reports. 2007.
34.
go back to reference Summerscales RL, Argamon S, Hupert J, Schwartz A. Identifying treatments, groups, and outcomes in medical abstracts. The Sixth Midwest Computational Linguistics Colloquium (MCLC 2009). 2009. Summerscales RL, Argamon S, Hupert J, Schwartz A. Identifying treatments, groups, and outcomes in medical abstracts. The Sixth Midwest Computational Linguistics Colloquium (MCLC 2009). 2009.
35.
go back to reference Summerscales R, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials, the 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2011. p. 372–7. Summerscales R, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials, the 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2011. p. 372–7.
36.
go back to reference Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010;10:56.PubMedPubMedCentralCrossRef Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010;10:56.PubMedPubMedCentralCrossRef
37.
go back to reference Restificar A, Ananiadou S. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data, Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012. ACM. Restificar A, Ananiadou S. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data, Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012. ACM.
38.
go back to reference Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3(4–5):993–1022. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3(4–5):993–1022.
39.
go back to reference Lin S, Ng J-P, Pradhan S, Shah J, Pietrobon R, Kan M-Y, editors. Extracting formulaic and free text clinical research articles metadata using conditional random fields. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents; 2010: Association for Computational Linguistics. Lin S, Ng J-P, Pradhan S, Shah J, Pietrobon R, Kan M-Y, editors. Extracting formulaic and free text clinical research articles metadata using conditional random fields. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents; 2010: Association for Computational Linguistics.
40.
go back to reference De Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I, editors. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annual Symposium Proceedings; 2008: American Medical Informatics Association. De Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I, editors. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annual Symposium Proceedings; 2008: American Medical Informatics Association.
41.
go back to reference Zhu H, Ni Y, Cai P, Qiu Z, Cao F. Automatic extracting of patient-related attributes: disease, age, gender and race. Stud Health Technol Inform. 2011;180:589–93. Zhu H, Ni Y, Cai P, Qiu Z, Cao F. Automatic extracting of patient-related attributes: disease, age, gender and race. Stud Health Technol Inform. 2011;180:589–93.
42.
go back to reference Davis-Desmond P, Mollá D, editors. Detection of evidence in clinical research papers. Proceedings of the Fifth Australasian Workshop on Health Informatics and Knowledge Management-Volume 129; 2012: Australian Computer Society, Inc. Davis-Desmond P, Mollá D, editors. Detection of evidence in clinical research papers. Proceedings of the Fifth Australasian Workshop on Health Informatics and Knowledge Management-Volume 129; 2012: Australian Computer Society, Inc.
44.
go back to reference Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synthesis Methods. 2011;2(1):1–14.CrossRef Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synthesis Methods. 2011;2(1):1–14.CrossRef
46.
47.
go back to reference O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.PubMedPubMedCentralCrossRef O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.PubMedPubMedCentralCrossRef
48.
go back to reference Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11(1):55.PubMedPubMedCentralCrossRef Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11(1):55.PubMedPubMedCentralCrossRef
49.
go back to reference Wallace BC, Small K, Brodley CE, Trikalinos TA, editors. Active learning for biomedical citation screening. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining; 2010: ACM. Wallace BC, Small K, Brodley CE, Trikalinos TA, editors. Active learning for biomedical citation screening. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining; 2010: ACM.
50.
go back to reference Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.PubMedPubMedCentralCrossRef Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.PubMedPubMedCentralCrossRef
52.
go back to reference Cohen A, Adams C, Davis J, Yu C, Yu P, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium. 2010:376–80. Cohen A, Adams C, Davis J, Yu C, Yu P, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium. 2010:376–80.
53.
go back to reference Choong MK, Galgani F, Dunn AG, Tsafnat G. Automatic evidence retrieval for systematic reviews. J Med Inter Res. 2014;16(10):e223. Choong MK, Galgani F, Dunn AG, Tsafnat G. Automatic evidence retrieval for systematic reviews. J Med Inter Res. 2014;16(10):e223.
54.
go back to reference Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.PubMedPubMedCentralCrossRef Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.PubMedPubMedCentralCrossRef
55.
go back to reference García Adeva JJ, Pikatza Atxa JM, Ubeda Carrillo M, Ansuategi ZE. Automatic text classification to support systematic reviews in medicine. Expert Syst Appl. 2014;41(4):1498–508.CrossRef García Adeva JJ, Pikatza Atxa JM, Ubeda Carrillo M, Ansuategi ZE. Automatic text classification to support systematic reviews in medicine. Expert Syst Appl. 2014;41(4):1498–508.CrossRef
56.
go back to reference Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara‐Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synthesis Methods. 2014;5(1):31–49.CrossRef Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara‐Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synthesis Methods. 2014;5(1):31–49.CrossRef
57.
go back to reference Cullen RJ. In search of evidence: family practitioners’ use of the Internet for clinical information. J Med Libr Assoc. 2002;90(4):370–9.PubMedPubMedCentral Cullen RJ. In search of evidence: family practitioners’ use of the Internet for clinical information. J Med Libr Assoc. 2002;90(4):370–9.PubMedPubMedCentral
58.
go back to reference Hersh WR, Hickam DH. How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998;280(15):1347–52.PubMedCrossRef Hersh WR, Hickam DH. How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998;280(15):1347–52.PubMedCrossRef
61.
go back to reference McColl A, Smith H, White P, Field J. General practitioner’s perceptions of the route to evidence based medicine: a questionnaire survey. BMJ. 1998;316(7128):361–5.PubMedPubMedCentralCrossRef McColl A, Smith H, White P, Field J. General practitioner’s perceptions of the route to evidence based medicine: a questionnaire survey. BMJ. 1998;316(7128):361–5.PubMedPubMedCentralCrossRef
64.
go back to reference Rousseau N, McColl E, Newton J, Grimshaw J, Eccles M. Practice based, longitudinal, qualitative interview study of computerised evidence based guidelines in primary care. BMJ. 2003;326(7384):314.PubMedPubMedCentralCrossRef Rousseau N, McColl E, Newton J, Grimshaw J, Eccles M. Practice based, longitudinal, qualitative interview study of computerised evidence based guidelines in primary care. BMJ. 2003;326(7384):314.PubMedPubMedCentralCrossRef
67.
go back to reference Lau J. Evidence-based medicine and meta-analysis: getting more out of the literature. In: Greenes RA, editor. Clinical decision support: the road ahead. 2007. p. 249. Lau J. Evidence-based medicine and meta-analysis: getting more out of the literature. In: Greenes RA, editor. Clinical decision support: the road ahead. 2007. p. 249.
68.
go back to reference Fraser AG, Dunstan FD. On the impossibility of being expert. BMJ (Clinical Res). 2010;341:c6815.CrossRef Fraser AG, Dunstan FD. On the impossibility of being expert. BMJ (Clinical Res). 2010;341:c6815.CrossRef
Metadata
Title
Automating data extraction in systematic reviews: a systematic review
Authors
Siddhartha R. Jonnalagadda
Pawan Goyal
Mark D. Huffman
Publication date
01-12-2015
Publisher
BioMed Central
Published in
Systematic Reviews / Issue 1/2015
Electronic ISSN: 2046-4053
DOI
https://doi.org/10.1186/s13643-015-0066-7

Other articles of this Issue 1/2015

Systematic Reviews 1/2015 Go to the issue