Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2009

Open Access 01-12-2009 | Research article

Sentence retrieval for abstracts of randomized controlled trials

Author: Grace Y Chung

Published in: BMC Medical Informatics and Decision Making | Issue 1/2009

Login to get access

Abstract

Background

The practice of evidence-based medicine (EBM) requires clinicians to integrate their expertise with the latest scientific research. But this is becoming increasingly difficult with the growing numbers of published articles. There is a clear need for better tools to improve clinician's ability to search the primary literature. Randomized clinical trials (RCTs) are the most reliable source of evidence documenting the efficacy of treatment options. This paper describes the retrieval of key sentences from abstracts of RCTs as a step towards helping users find relevant facts about the experimental design of clinical studies.

Method

Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically categorized. This is done by extending a previous approach for labeling sentences in an abstract for general categories associated with scientific argumentation or rhetorical roles: Aim, Method, Results and Conclusion. Methods are tested on several corpora of RCT abstracts. First structured abstracts with headings specifically indicating Intervention, Participant and Outcome Measures are used. Also a manually annotated corpus of structured and unstructured abstracts is prepared for testing a classifier that identifies sentences belonging to each category.

Results

Using CRFs, sentences can be labeled for the four rhetorical roles with F-scores from 0.93–0.98. This outperforms the use of Support Vector Machines. Furthermore, sentences can be automatically labeled for Intervention, Participant and Outcome Measures, in unstructured and structured abstracts where the section headings do not specifically indicate these three topics. F-scores of up to 0.83 and 0.84 are obtained for Intervention and Outcome Measure sentences.

Conclusion

Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports. This is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.
Literature
1.
go back to reference Sackett DL, Strauss SE, Richardson WS, Rosenberg W, Haynes RB: Evidence Based Medicine: How to Practice and Teach EBM. 2000, Edinburgh: Churchill Livingstone Sackett DL, Strauss SE, Richardson WS, Rosenberg W, Haynes RB: Evidence Based Medicine: How to Practice and Teach EBM. 2000, Edinburgh: Churchill Livingstone
2.
go back to reference Oxman AD, Sackett DL, Guyatt GH: Users' guides to the medical literature. I. How to get started, The Evidence-Based Medicine Working Group. JAMA. 1993, 270 (17): 2093-5. 10.1001/jama.270.17.2093.CrossRefPubMed Oxman AD, Sackett DL, Guyatt GH: Users' guides to the medical literature. I. How to get started, The Evidence-Based Medicine Working Group. JAMA. 1993, 270 (17): 2093-5. 10.1001/jama.270.17.2093.CrossRefPubMed
3.
go back to reference Keech A, Gebski V, Pike R: Interpreting and Reporting Clinical Trials. A guide to the consort statement and the principles of randomised controlled trials. 2007, NSW, Australia: Australasian Medical Publishing Keech A, Gebski V, Pike R: Interpreting and Reporting Clinical Trials. A guide to the consort statement and the principles of randomised controlled trials. 2007, NSW, Australia: Australasian Medical Publishing
4.
go back to reference Tsay MY, Ma YY: Bibliometric analysis of the literature of randomized controlled trials. J Med Libr Assoc. 2005, 93 (4): 450-458.PubMedPubMedCentral Tsay MY, Ma YY: Bibliometric analysis of the literature of randomized controlled trials. J Med Libr Assoc. 2005, 93 (4): 450-458.PubMedPubMedCentral
5.
go back to reference Covell DG, Uman GC, Manning PR: Information needs in office practice: are they being met?. Annals of Internal Medicine. 1985, 103: 596-9.CrossRefPubMed Covell DG, Uman GC, Manning PR: Information needs in office practice: are they being met?. Annals of Internal Medicine. 1985, 103: 596-9.CrossRefPubMed
6.
go back to reference Ely JW, Osheroff JA, Ebell MH, Chambliss ML, Vinson DC, Stevermerr JJ: Obstacles to answering doctors. questions about patient care with evidence: qualitative study. British Medical Journal. 2002, 324: 710-3. 10.1136/bmj.324.7339.710.CrossRefPubMedPubMedCentral Ely JW, Osheroff JA, Ebell MH, Chambliss ML, Vinson DC, Stevermerr JJ: Obstacles to answering doctors. questions about patient care with evidence: qualitative study. British Medical Journal. 2002, 324: 710-3. 10.1136/bmj.324.7339.710.CrossRefPubMedPubMedCentral
7.
go back to reference D'Alessandro DM, Kreiter CD, Peterson MW: An Evaluation of information seeking behaviors of general pediatricans. Pediatrics. 2004, 113: 64-69. 10.1542/peds.113.1.64.CrossRefPubMed D'Alessandro DM, Kreiter CD, Peterson MW: An Evaluation of information seeking behaviors of general pediatricans. Pediatrics. 2004, 113: 64-69. 10.1542/peds.113.1.64.CrossRefPubMed
12.
go back to reference Sim I, Owens DK, Lavori PW, Rennels GD: Electronic Trial Banks: A Complementary Method for Reporting Randomized Trials. Medical Decision Making. 2000, 20 (4): 440-450. 10.1177/0272989X0002000408.CrossRefPubMed Sim I, Owens DK, Lavori PW, Rennels GD: Electronic Trial Banks: A Complementary Method for Reporting Randomized Trials. Medical Decision Making. 2000, 20 (4): 440-450. 10.1177/0272989X0002000408.CrossRefPubMed
13.
go back to reference P R, I T, J G, Aronson AR: Argumentative feedback: a linguistically-motivated term expansion for information retrieval. Proceedings of the ACL, Association for Computational Linguistics. 2006, 675-682. P R, I T, J G, Aronson AR: Argumentative feedback: a linguistically-motivated term expansion for information retrieval. Proceedings of the ACL, Association for Computational Linguistics. 2006, 675-682.
14.
go back to reference Moher D, Schultz KF, Altman D: The CONSORT Statment: revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001, 285: 1987-1991. 10.1001/jama.285.15.1987.CrossRefPubMed Moher D, Schultz KF, Altman D: The CONSORT Statment: revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001, 285: 1987-1991. 10.1001/jama.285.15.1987.CrossRefPubMed
18.
go back to reference Mann W, Thompson S: Rhetorical Structure Theory: A Theory of Text Organization, Volume ISI/RS-87-190. 1987, Los Angeles, CA: ISI: Information Sciences Institute Mann W, Thompson S: Rhetorical Structure Theory: A Theory of Text Organization, Volume ISI/RS-87-190. 1987, Los Angeles, CA: ISI: Information Sciences Institute
19.
go back to reference Sporleder C, Lascarides A: Exploiting Linguistic Cues to Classify Rhetorical Relations. Proceedings of Recent Advances in Natural Language Processing (RANLP), Bulgaria. 2005 Sporleder C, Lascarides A: Exploiting Linguistic Cues to Classify Rhetorical Relations. Proceedings of Recent Advances in Natural Language Processing (RANLP), Bulgaria. 2005
20.
go back to reference Marcu D, Echihabi A: An unsupervised approach to recognizing discourse relations. Proceedings of the Association of Computational Linguistics Meeting. 2002, 368-375. Marcu D, Echihabi A: An unsupervised approach to recognizing discourse relations. Proceedings of the Association of Computational Linguistics Meeting. 2002, 368-375.
21.
go back to reference Swales J: Genre analysis: English in Academic and Research Settings. 1990, Cambridge University: Cambridge University Press Swales J: Genre analysis: English in Academic and Research Settings. 1990, Cambridge University: Cambridge University Press
22.
go back to reference Orasan C: Patterns in Scientific Abstract. Proceedings of Corpus Linguistics Conference. 2001 Orasan C: Patterns in Scientific Abstract. Proceedings of Corpus Linguistics Conference. 2001
23.
go back to reference Salanger-Meyer F: Discourse Movements in Medical English Abstracts and their linguistic exponents: A genre analysis study. INTERFACE: Journal of Applied Linguistics. 1990, 4 (2): 107-124. Salanger-Meyer F: Discourse Movements in Medical English Abstracts and their linguistic exponents: A genre analysis study. INTERFACE: Journal of Applied Linguistics. 1990, 4 (2): 107-124.
24.
go back to reference Teufel S, Moens M: Summarizing Scientific Articles – Experiments with relevance and rhetorical status. Computational Linguistics. 2002, 28 (4): Teufel S, Moens M: Summarizing Scientific Articles – Experiments with relevance and rhetorical status. Computational Linguistics. 2002, 28 (4):
25.
go back to reference Ruch P, Boyer C, Chichester C, Tbahriti I, Geissbuhler A, Fabry P, Gobeill J, Pillet V, Rebholz-Schuhmann D, Lovis C, Veuthey A: Using argumentation to extract key sentences from biomedical abstracts. International Journal of Medical Informatics. 2007, 76: 195-200. 10.1016/j.ijmedinf.2006.05.002.CrossRefPubMed Ruch P, Boyer C, Chichester C, Tbahriti I, Geissbuhler A, Fabry P, Gobeill J, Pillet V, Rebholz-Schuhmann D, Lovis C, Veuthey A: Using argumentation to extract key sentences from biomedical abstracts. International Journal of Medical Informatics. 2007, 76: 195-200. 10.1016/j.ijmedinf.2006.05.002.CrossRefPubMed
26.
go back to reference Tbahriti I, Chichester C, Lisacek F, Ruch P: Using argumentation to retrieve articles with similar citations: An inquiry into improving related articles search in the medline digital library. International Journal of Medical Inforamtics. 2006, 75 (6): 488-495. 10.1016/j.ijmedinf.2005.06.007.CrossRef Tbahriti I, Chichester C, Lisacek F, Ruch P: Using argumentation to retrieve articles with similar citations: An inquiry into improving related articles search in the medline digital library. International Journal of Medical Inforamtics. 2006, 75 (6): 488-495. 10.1016/j.ijmedinf.2005.06.007.CrossRef
27.
go back to reference McKnight L, Srinivasan P: Categorization of Sentence Types in Medical Abstracts. AMIA Annu Symp Proc. 2003, 440-444. McKnight L, Srinivasan P: Categorization of Sentence Types in Medical Abstracts. AMIA Annu Symp Proc. 2003, 440-444.
28.
go back to reference Shimbo M, Yamasaki T, Matsumoto Y: Using Section information for text retrieval: a case study with the medline abstracts. Proceedings of Second International Workshop on Active Mining. 2003, 32-41. Shimbo M, Yamasaki T, Matsumoto Y: Using Section information for text retrieval: a case study with the medline abstracts. Proceedings of Second International Workshop on Active Mining. 2003, 32-41.
29.
go back to reference Yamamoto Y, Takagi T: A sentence classification system for multi-document summarization in the biomedical domain. Proceedings of International Workshop on Biomedical Data Engineering. 2005, 90-95. Yamamoto Y, Takagi T: A sentence classification system for multi-document summarization in the biomedical domain. Proceedings of International Workshop on Biomedical Data Engineering. 2005, 90-95.
30.
go back to reference Xu R, Supekar K, Huang Y, Das A, Garber A: Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts. Proceedings of the Annual Symposium of AMIA. 2006, 824-828. Xu R, Supekar K, Huang Y, Das A, Garber A: Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts. Proceedings of the Annual Symposium of AMIA. 2006, 824-828.
31.
go back to reference Lin J, Karakos D, D DF, Khudanpur S: Generative Content Models for Structural Analysis of Medical Abstracts. Proceedings of Workshop on Biomedical Natural Language Processing BioNLP, New York. 2006 Lin J, Karakos D, D DF, Khudanpur S: Generative Content Models for Structural Analysis of Medical Abstracts. Proceedings of Workshop on Biomedical Natural Language Processing BioNLP, New York. 2006
32.
go back to reference Chung GY, Coiera E: A study of structured clinical abstracts and the semantic classification of sentences. Proceedings of BioNLP Workshop, Prague, Czech Republic. 2007 Chung GY, Coiera E: A study of structured clinical abstracts and the semantic classification of sentences. Proceedings of BioNLP Workshop, Prague, Czech Republic. 2007
33.
go back to reference Hirohata K, Okazaki N, Ananiadou S, Ishizuka M: Identifying Sections in Scientific Abstracts using Conditional Random Fields. Proceedings of the Third International Joint Conference on Natural Language Processing: January 2008; Hyderabad, India. 2008, 381-388. Hirohata K, Okazaki N, Ananiadou S, Ishizuka M: Identifying Sections in Scientific Abstracts using Conditional Random Fields. Proceedings of the Third International Joint Conference on Natural Language Processing: January 2008; Hyderabad, India. 2008, 381-388.
34.
go back to reference Richardson WS, Wilson MC, Nishikawa J, Hayward RSA: The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995, 123 (3): A12-A13.PubMed Richardson WS, Wilson MC, Nishikawa J, Hayward RSA: The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995, 123 (3): A12-A13.PubMed
35.
go back to reference Demner-Fushman D, Few B, Hauser SE, Thoma GR: Automatically Identifying Health Outcome Information in MEDLINE Records. Journal of the American Medical Informatics Association. 2006, 13 (1): 52-60. 10.1197/jamia.M1911.CrossRefPubMedPubMedCentral Demner-Fushman D, Few B, Hauser SE, Thoma GR: Automatically Identifying Health Outcome Information in MEDLINE Records. Journal of the American Medical Informatics Association. 2006, 13 (1): 52-60. 10.1197/jamia.M1911.CrossRefPubMedPubMedCentral
36.
go back to reference Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM: Extracting Subject Demographic Information from Abstracts of Randomized Clinical Trial Reports. Proceedings of MedInfo. Edited by: et al KK. 2007, IOS Press Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM: Extracting Subject Demographic Information from Abstracts of Randomized Clinical Trial Reports. Proceedings of MedInfo. Edited by: et al KK. 2007, IOS Press
37.
go back to reference Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie JY: The identification of clinically important elements within medical journal abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR). Informatics in Primary Care. 2007, 15 (1): 9-16.PubMed Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie JY: The identification of clinically important elements within medical journal abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR). Informatics in Primary Care. 2007, 15 (1): 9-16.PubMed
38.
go back to reference Sutton C, McCallum A: An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning. Edited by: Getoor L, Taskar B. 2007, Cambridge, Massachusetts: MIT Press Sutton C, McCallum A: An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning. Edited by: Getoor L, Taskar B. 2007, Cambridge, Massachusetts: MIT Press
39.
go back to reference Lafferty J, McCallum A, Pereira F: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of International Conference on Machine Learning (ICML). 2001, 282-289. Lafferty J, McCallum A, Pereira F: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of International Conference on Machine Learning (ICML). 2001, 282-289.
40.
go back to reference Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel H: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics. 2008, 9 (207): Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel H: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics. 2008, 9 (207):
41.
go back to reference Settle B: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. Proceedings of International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA). 2004 Settle B: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. Proceedings of International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA). 2004
43.
go back to reference Burges C: A Tutorial on Support Vector Machines for Pattern Recognition. Journal Data Mining and Knowledge Discovery. 1998, 2 (2): Burges C: A Tutorial on Support Vector Machines for Pattern Recognition. Journal Data Mining and Knowledge Discovery. 1998, 2 (2):
45.
go back to reference Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J: Developing a Robust Part-of-Speech Tagger for Biomedical Text. Advances in Informatics – 10th Panhellenic Conference on Informatics. 2005, Cambridge, Massachusetts: Springer Berlin/Heidelberg, 382-392. Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J: Developing a Robust Part-of-Speech Tagger for Biomedical Text. Advances in Informatics – 10th Panhellenic Conference on Informatics. 2005, Cambridge, Massachusetts: Springer Berlin/Heidelberg, 382-392.
Metadata
Title
Sentence retrieval for abstracts of randomized controlled trials
Author
Grace Y Chung
Publication date
01-12-2009
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2009
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-9-10

Other articles of this Issue 1/2009

BMC Medical Informatics and Decision Making 1/2009 Go to the issue