Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2009 | Research article

Sentence retrieval for abstracts of randomized controlled trials

Author: Grace Y Chung

Published in: BMC Medical Informatics and Decision Making | Issue 1/2009

Abstract

Background

The practice of evidence-based medicine (EBM) requires clinicians to integrate their expertise with the latest scientific research. But this is becoming increasingly difficult with the growing numbers of published articles. There is a clear need for better tools to improve clinician's ability to search the primary literature. Randomized clinical trials (RCTs) are the most reliable source of evidence documenting the efficacy of treatment options. This paper describes the retrieval of key sentences from abstracts of RCTs as a step towards helping users find relevant facts about the experimental design of clinical studies.

Method

Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically categorized. This is done by extending a previous approach for labeling sentences in an abstract for general categories associated with scientific argumentation or rhetorical roles: Aim, Method, Results and Conclusion. Methods are tested on several corpora of RCT abstracts. First structured abstracts with headings specifically indicating Intervention, Participant and Outcome Measures are used. Also a manually annotated corpus of structured and unstructured abstracts is prepared for testing a classifier that identifies sentences belonging to each category.

Results

Using CRFs, sentences can be labeled for the four rhetorical roles with F-scores from 0.93–0.98. This outperforms the use of Support Vector Machines. Furthermore, sentences can be automatically labeled for Intervention, Participant and Outcome Measures, in unstructured and structured abstracts where the section headings do not specifically indicate these three topics. F-scores of up to 0.83 and 0.84 are obtained for Intervention and Outcome Measure sentences.

Conclusion

Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports. This is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.

Sackett DL, Strauss SE, Richardson WS, Rosenberg W, Haynes RB: Evidence Based Medicine: How to Practice and Teach EBM. 2000, Edinburgh: Churchill Livingstone

Oxman AD, Sackett DL, Guyatt GH: Users' guides to the medical literature. I. How to get started, The Evidence-Based Medicine Working Group. JAMA. 1993, 270 (17): 2093-5. 10.1001/jama.270.17.2093.CrossRefPubMed

Keech A, Gebski V, Pike R: Interpreting and Reporting Clinical Trials. A guide to the consort statement and the principles of randomised controlled trials. 2007, NSW, Australia: Australasian Medical Publishing

Tsay MY, Ma YY: Bibliometric analysis of the literature of randomized controlled trials. J Med Libr Assoc. 2005, 93 (4): 450-458.PubMedPubMedCentral

Covell DG, Uman GC, Manning PR: Information needs in office practice: are they being met?. Annals of Internal Medicine. 1985, 103: 596-9.CrossRefPubMed

Ely JW, Osheroff JA, Ebell MH, Chambliss ML, Vinson DC, Stevermerr JJ: Obstacles to answering doctors. questions about patient care with evidence: qualitative study. British Medical Journal. 2002, 324: 710-3. 10.1136/bmj.324.7339.710.CrossRefPubMedPubMedCentral

D'Alessandro DM, Kreiter CD, Peterson MW: An Evaluation of information seeking behaviors of general pediatricans. Pediatrics. 2004, 113: 64-69. 10.1542/peds.113.1.64.CrossRefPubMed

The Cochrane Collaboration. [http://www.cochrane.org]

Evidence Based Medicine. [http://ebm.bmjjournals.com]

10.

Clinical Evidence. [http://www.clinicalevidence.com]

11.

The ACP Journal Club. [http://www.acpjc.org]

12.

Sim I, Owens DK, Lavori PW, Rennels GD: Electronic Trial Banks: A Complementary Method for Reporting Randomized Trials. Medical Decision Making. 2000, 20 (4): 440-450. 10.1177/0272989X0002000408.CrossRefPubMed

13.

P R, I T, J G, Aronson AR: Argumentative feedback: a linguistically-motivated term expansion for information retrieval. Proceedings of the ACL, Association for Computational Linguistics. 2006, 675-682.

14.

Moher D, Schultz KF, Altman D: The CONSORT Statment: revised recommendations for improving the quality of reports of parallel-group randomized trials. Journal of the American Medical Association. 2001, 285: 1987-1991. 10.1001/jama.285.15.1987.CrossRefPubMed

15.

The Journal of the American Medical Association. [http://jama.ama-assn.org/]

16.

Annals of Internal Medicine. [http://www.annals.org/]

17.

Heart. [http://heart.bmj.com/]

18.

Mann W, Thompson S: Rhetorical Structure Theory: A Theory of Text Organization, Volume ISI/RS-87-190. 1987, Los Angeles, CA: ISI: Information Sciences Institute

19.

Sporleder C, Lascarides A: Exploiting Linguistic Cues to Classify Rhetorical Relations. Proceedings of Recent Advances in Natural Language Processing (RANLP), Bulgaria. 2005

20.

Marcu D, Echihabi A: An unsupervised approach to recognizing discourse relations. Proceedings of the Association of Computational Linguistics Meeting. 2002, 368-375.

21.

Swales J: Genre analysis: English in Academic and Research Settings. 1990, Cambridge University: Cambridge University Press

22.

Orasan C: Patterns in Scientific Abstract. Proceedings of Corpus Linguistics Conference. 2001

23.

Salanger-Meyer F: Discourse Movements in Medical English Abstracts and their linguistic exponents: A genre analysis study. INTERFACE: Journal of Applied Linguistics. 1990, 4 (2): 107-124.

24.

Teufel S, Moens M: Summarizing Scientific Articles – Experiments with relevance and rhetorical status. Computational Linguistics. 2002, 28 (4):

25.

Ruch P, Boyer C, Chichester C, Tbahriti I, Geissbuhler A, Fabry P, Gobeill J, Pillet V, Rebholz-Schuhmann D, Lovis C, Veuthey A: Using argumentation to extract key sentences from biomedical abstracts. International Journal of Medical Informatics. 2007, 76: 195-200. 10.1016/j.ijmedinf.2006.05.002.CrossRefPubMed

26.

Tbahriti I, Chichester C, Lisacek F, Ruch P: Using argumentation to retrieve articles with similar citations: An inquiry into improving related articles search in the medline digital library. International Journal of Medical Inforamtics. 2006, 75 (6): 488-495. 10.1016/j.ijmedinf.2005.06.007.CrossRef

27.

McKnight L, Srinivasan P: Categorization of Sentence Types in Medical Abstracts. AMIA Annu Symp Proc. 2003, 440-444.

28.

Shimbo M, Yamasaki T, Matsumoto Y: Using Section information for text retrieval: a case study with the medline abstracts. Proceedings of Second International Workshop on Active Mining. 2003, 32-41.

29.

Yamamoto Y, Takagi T: A sentence classification system for multi-document summarization in the biomedical domain. Proceedings of International Workshop on Biomedical Data Engineering. 2005, 90-95.

30.

Xu R, Supekar K, Huang Y, Das A, Garber A: Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts. Proceedings of the Annual Symposium of AMIA. 2006, 824-828.

31.

Lin J, Karakos D, D DF, Khudanpur S: Generative Content Models for Structural Analysis of Medical Abstracts. Proceedings of Workshop on Biomedical Natural Language Processing BioNLP, New York. 2006

32.

Chung GY, Coiera E: A study of structured clinical abstracts and the semantic classification of sentences. Proceedings of BioNLP Workshop, Prague, Czech Republic. 2007

33.

Hirohata K, Okazaki N, Ananiadou S, Ishizuka M: Identifying Sections in Scientific Abstracts using Conditional Random Fields. Proceedings of the Third International Joint Conference on Natural Language Processing: January 2008; Hyderabad, India. 2008, 381-388.

34.

Richardson WS, Wilson MC, Nishikawa J, Hayward RSA: The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995, 123 (3): A12-A13.PubMed

35.

Demner-Fushman D, Few B, Hauser SE, Thoma GR: Automatically Identifying Health Outcome Information in MEDLINE Records. Journal of the American Medical Informatics Association. 2006, 13 (1): 52-60. 10.1197/jamia.M1911.CrossRefPubMedPubMedCentral

36.

Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM: Extracting Subject Demographic Information from Abstracts of Randomized Clinical Trial Reports. Proceedings of MedInfo. Edited by: et al KK. 2007, IOS Press

37.

Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie JY: The identification of clinically important elements within medical journal abstracts: Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results (PECODR). Informatics in Primary Care. 2007, 15 (1): 9-16.PubMed

38.

Sutton C, McCallum A: An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning. Edited by: Getoor L, Taskar B. 2007, Cambridge, Massachusetts: MIT Press

39.

Lafferty J, McCallum A, Pereira F: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of International Conference on Machine Learning (ICML). 2001, 282-289.

40.

Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel H: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics. 2008, 9 (207):

41.

Settle B: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. Proceedings of International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA). 2004

42.

McCallum A: MALLET: A Machine Learning for Language Toolkit. 2002, [http://mallet.cs.umass.edu]

43.

Burges C: A Tutorial on Support Vector Machines for Pattern Recognition. Journal Data Mining and Knowledge Discovery. 1998, 2 (2):

44.

SVM Light. 2004, [http://svmlight.joachims.org/]

45.

Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J: Developing a Robust Part-of-Speech Tagger for Biomedical Text. Advances in Informatics – 10th Panhellenic Conference on Informatics. 2005, Cambridge, Massachusetts: Springer Berlin/Heidelberg, 382-392.

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/9/10/prepub

Title: Sentence retrieval for abstracts of randomized controlled trials
Author: Grace Y Chung
Publication date: 01-12-2009
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue 1/2009
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/1472-6947-9-10

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Sentence retrieval for abstracts of randomized controlled trials

Abstract

Background

Method

Results

Conclusion

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Background

Method

Results

Conclusion

Please log in to get access to this content

Other articles of this Issue 1/2009

The first-year growth response to growth hormone treatment predicts the long-term prepubertal growth response in children

Assessment of ePrescription quality: an observational study at three mail-order pharmacies

Disease surveillance using a hidden Markov model

Physicians' attitudes towards ePrescribing – evaluation of a Swedish full-scale implementation

Impact of two interventions on timeliness and data quality of an electronic disease surveillance system in a resource limited setting (Peru): a prospective evaluation

BioSunMS: a plug-in-based software for the management of patients information and the analysis of peptide profiles from mass spectrometry