Skip to main content
Top
Published in: Journal of Translational Medicine 1/2016

Open Access 01-12-2016 | Research

Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach

Authors: Alisa Surkis, Janice A. Hogle, Deborah DiazGranados, Joe D. Hunt, Paul E. Mazmanian, Emily Connors, Kate Westaby, Elizabeth C. Whipple, Trisha Adamus, Meridith Mueller, Yindalon Aphinyanaphongs

Published in: Journal of Translational Medicine | Issue 1/2016

Login to get access

Abstract

Background

Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications.

Methods

Based on collaboratively developed definitions, we created a detailed checklist for categories along the translational spectrum from T0 to T4. We applied the checklist to CTSA-linked publications to construct a set of coded publications for use in training machine learning-based text classifiers to classify publications within these categories. The training sets combined T1/T2 and T3/T4 categories due to low frequency of these publication types compared to the frequency of T0 publications. We then compared classifier performance across different algorithms and feature sets and applied the classifiers to all publications in PubMed indexed to CTSA grants. To validate the algorithm, we manually classified the articles with the top 100 scores from each classifier.

Results

The definitions and checklist facilitated classification and resulted in good inter-rater reliability for coding publications for the training set. Very good performance was achieved for the classifiers as represented by the area under the receiver operating curves (AUC), with an AUC of 0.94 for the T0 classifier, 0.84 for T1/T2, and 0.92 for T3/T4.

Conclusions

The combination of definitions agreed upon by five CTSA hubs, a checklist that facilitates more uniform definition interpretation, and algorithms that perform well in classifying publications along the translational spectrum provide a basis for establishing and applying uniform definitions of translational research categories. The classification algorithms allow publication analyses that would not be feasible with manual classification, such as assessing the distribution and trends of publications across the CTSA network and comparing the categories of publications and their citations to assess knowledge transfer across the translational research spectrum.
Appendix
Available only for authorised users
Literature
1.
go back to reference Rubio DM, Del Junco DJ, Bhore R, Lindsell CJ, Oster RA, Wittkowski KM, Welty LJ, Li YJ, DeMets D. Biostatistics, epidemiology, and research design (BERD) key function committee of the Clinical and Translational Science Awards (CTSA) Consortium. Evaluation metrics for biostatistical and epidemiological collaborations. Stat Med. 2011;30:2767–77.CrossRefPubMedPubMedCentral Rubio DM, Del Junco DJ, Bhore R, Lindsell CJ, Oster RA, Wittkowski KM, Welty LJ, Li YJ, DeMets D. Biostatistics, epidemiology, and research design (BERD) key function committee of the Clinical and Translational Science Awards (CTSA) Consortium. Evaluation metrics for biostatistical and epidemiological collaborations. Stat Med. 2011;30:2767–77.CrossRefPubMedPubMedCentral
2.
go back to reference Woolf SH. The meaning of translational research and why it matters. JAMA. 2008;299:211–3.PubMed Woolf SH. The meaning of translational research and why it matters. JAMA. 2008;299:211–3.PubMed
3.
go back to reference Sung NS, Crowley WF Jr, Genel M, Salber P, Sandy L, Sherwood LM, Johnson SB, Catanese V, Tilson H, Getz K, Larson EL, Scheinberg D, Reece EA, Slavkin H, Dobs A, Grebb J, Martinez RA, Korn A, Rimoin D. Central challenges facing the national clinical research enterprise. JAMA. 2003;289:1278–87.CrossRefPubMed Sung NS, Crowley WF Jr, Genel M, Salber P, Sandy L, Sherwood LM, Johnson SB, Catanese V, Tilson H, Getz K, Larson EL, Scheinberg D, Reece EA, Slavkin H, Dobs A, Grebb J, Martinez RA, Korn A, Rimoin D. Central challenges facing the national clinical research enterprise. JAMA. 2003;289:1278–87.CrossRefPubMed
4.
go back to reference Westfall JM, Mold J, Fagnan L. Practice-based research—“blue highways” on the NIH roadmap. JAMA. 2007;297:403–6.CrossRefPubMed Westfall JM, Mold J, Fagnan L. Practice-based research—“blue highways” on the NIH roadmap. JAMA. 2007;297:403–6.CrossRefPubMed
6.
go back to reference Khoury MJ, Gwinn M, Yoon PW, Dowling N, Moore CA, Bradley L. The continuum of translation research in genomic medicine: how can we accelerate the appropriate integration of human genome discoveries into health care and disease prevention? Genet Med. 2007;9:665–74.CrossRefPubMed Khoury MJ, Gwinn M, Yoon PW, Dowling N, Moore CA, Bradley L. The continuum of translation research in genomic medicine: how can we accelerate the appropriate integration of human genome discoveries into health care and disease prevention? Genet Med. 2007;9:665–74.CrossRefPubMed
7.
go back to reference Dougherty D, Conway PH. The “3T’s” road map to transform US health care: the “how” of high-quality care. JAMA. 2008;299:2319–21.CrossRefPubMed Dougherty D, Conway PH. The “3T’s” road map to transform US health care: the “how” of high-quality care. JAMA. 2008;299:2319–21.CrossRefPubMed
8.
go back to reference Waldman SA, Terzic A. Clinical and translational science: from bench-bedside to global village. Clin Transl Sci. 2010;3:254–7.CrossRefPubMed Waldman SA, Terzic A. Clinical and translational science: from bench-bedside to global village. Clin Transl Sci. 2010;3:254–7.CrossRefPubMed
9.
go back to reference Blumberg RS, Dittel B, Hafler D, von Herrath M, Nestle FO. Unraveling the autoimmune translational research process layer by layer. Nat Med. 2012;18:35–41.CrossRefPubMedPubMedCentral Blumberg RS, Dittel B, Hafler D, von Herrath M, Nestle FO. Unraveling the autoimmune translational research process layer by layer. Nat Med. 2012;18:35–41.CrossRefPubMedPubMedCentral
12.
go back to reference Committee to Review the Clinical and Translational Science Awards Program at the National Center for Advancing Translational Sciences, Board on Health Sciences Policy, Institute of Medicine. The CTSA Program at NIH: opportunities for advancing clinical and translational research. Washington: National Academies Press; 2013. Committee to Review the Clinical and Translational Science Awards Program at the National Center for Advancing Translational Sciences, Board on Health Sciences Policy, Institute of Medicine. The CTSA Program at NIH: opportunities for advancing clinical and translational research. Washington: National Academies Press; 2013.
14.
go back to reference Narin F, Pinski G, Gee HH. Structure of the biomedical literature. J Am Soc Inf Sci. 1976;27:25–45.CrossRef Narin F, Pinski G, Gee HH. Structure of the biomedical literature. J Am Soc Inf Sci. 1976;27:25–45.CrossRef
15.
go back to reference Boyack KW, Patek M, Ungar LH, Yoon P, Klavans R. Classification of individual articles from all of science by research level. J Informetr. 2014;8:1–12.CrossRef Boyack KW, Patek M, Ungar LH, Yoon P, Klavans R. Classification of individual articles from all of science by research level. J Informetr. 2014;8:1–12.CrossRef
16.
go back to reference Cambrosio A, Keating P, Mercier S, Lewison G, Mogoutov A. Mapping the emergence and development of translational cancer research. Eur J Cancer. 2006;42:3140–8.CrossRefPubMed Cambrosio A, Keating P, Mercier S, Lewison G, Mogoutov A. Mapping the emergence and development of translational cancer research. Eur J Cancer. 2006;42:3140–8.CrossRefPubMed
17.
go back to reference Lewison G, Rippon I, Wooding S. Tracking knowledge diffusion through citations. Res Eval. 2005;14:5–14.CrossRef Lewison G, Rippon I, Wooding S. Tracking knowledge diffusion through citations. Res Eval. 2005;14:5–14.CrossRef
18.
go back to reference Lewison G, Paraje G. The classification of biomedical journals by research level. Scientometrics. 2004;60:145–57.CrossRef Lewison G, Paraje G. The classification of biomedical journals by research level. Scientometrics. 2004;60:145–57.CrossRef
19.
go back to reference Grant J. Evaluating the outcomes of biomedical research on healthcare. Res Eval. 1999;8:33–8.CrossRef Grant J. Evaluating the outcomes of biomedical research on healthcare. Res Eval. 1999;8:33–8.CrossRef
23.
go back to reference Rubio DM, Blank AE, Dozier A, Hites L, Gilliam VA, Hunt J, Rainwater J, Trochim WM. Developing common metrics for the Clinical and Translational Science Awards (CTSAs): lessons learned. Clin Transl Sci. 2015;8:451–9.CrossRefPubMed Rubio DM, Blank AE, Dozier A, Hites L, Gilliam VA, Hunt J, Rainwater J, Trochim WM. Developing common metrics for the Clinical and Translational Science Awards (CTSAs): lessons learned. Clin Transl Sci. 2015;8:451–9.CrossRefPubMed
24.
go back to reference Hutchins BI, Yuan X, Anderson JM, Santangelo GM. Relative Citation Ratio (RCR): a new metric that uses citation rates to measure influence at the article level. bioRxiv. 2016:029629. Hutchins BI, Yuan X, Anderson JM, Santangelo GM. Relative Citation Ratio (RCR): a new metric that uses citation rates to measure influence at the article level. bioRxiv. 2016:029629.
25.
go back to reference Marmot M, Friel S, Bell R, Houweling TAJ, Taylor S. Commission on Social Determinants of Health: closing the gap in a generation: health equity through action on the social determinants of health. Lancet. 2008;372:1661–9.CrossRefPubMed Marmot M, Friel S, Bell R, Houweling TAJ, Taylor S. Commission on Social Determinants of Health: closing the gap in a generation: health equity through action on the social determinants of health. Lancet. 2008;372:1661–9.CrossRefPubMed
28.
go back to reference Entrez Programming Utilities Help. National Center for Biotechnology Information; 2010. Entrez Programming Utilities Help. National Center for Biotechnology Information; 2010.
30.
go back to reference Manning CD, Raghavan P, Schütze H. Introduction to information retrieval, vol. 1. Cambridge: Cambridge University Press; 2008.CrossRef Manning CD, Raghavan P, Schütze H. Introduction to information retrieval, vol. 1. Cambridge: Cambridge University Press; 2008.CrossRef
31.
go back to reference Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988;24:513–23.CrossRef Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988;24:513–23.CrossRef
32.
go back to reference Leopold E, Kindermann J. Text categorization with support vector machines. How to represent texts in input space? Mach Learn. 2002;46:423–44.CrossRef Leopold E, Kindermann J. Text categorization with support vector machines. How to represent texts in input space? Mach Learn. 2002;46:423–44.CrossRef
33.
go back to reference Aphinyanaphongs Y, Fu LD, Li Z, Peskin ER, Efstathiadis E, Aliferis CF, Statnikov A. A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization. J Assn Inf Sci Tec. 2014;65:1964–87.CrossRef Aphinyanaphongs Y, Fu LD, Li Z, Peskin ER, Efstathiadis E, Aliferis CF, Statnikov A. A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization. J Assn Inf Sci Tec. 2014;65:1964–87.CrossRef
34.
go back to reference Kibriya AM, Frank E, Pfahringer B, Holmes G. Multinomial Naive Bayes for text categorization revisited. In: AI 2004: advances in artificial intelligence. Berlin: Springer; 2004. p. 488–99. (lecture notes in computer science). Kibriya AM, Frank E, Pfahringer B, Holmes G. Multinomial Naive Bayes for text categorization revisited. In: AI 2004: advances in artificial intelligence. Berlin: Springer; 2004. p. 488–99. (lecture notes in computer science).
35.
go back to reference McCallum AK. Mallet: a machine learning for language toolkit; 2002. McCallum AK. Mallet: a machine learning for language toolkit; 2002.
36.
go back to reference Genkin A, Lewis DD, Madigan D. Large-scale Bayesian logistic regression for text categorization. Technometrics. 2007;49:291–304.CrossRef Genkin A, Lewis DD, Madigan D. Large-scale Bayesian logistic regression for text categorization. Technometrics. 2007;49:291–304.CrossRef
39.
go back to reference Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J. LIBLINEAR: a library for large linear classification. J Mach Learn Res. 2008;9:1871–4. Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J. LIBLINEAR: a library for large linear classification. J Mach Learn Res. 2008;9:1871–4.
40.
go back to reference Joachims T. Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C, editors. Machine learning: ECML-98, vol. 1398., Lecture notes in computer scienceBerlin: Springer; 1998. p. 137–42.CrossRef Joachims T. Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C, editors. Machine learning: ECML-98, vol. 1398., Lecture notes in computer scienceBerlin: Springer; 1998. p. 137–42.CrossRef
41.
go back to reference Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2:27.CrossRef Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2:27.CrossRef
42.
go back to reference Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF. GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform. 2005;74:491–503.CrossRefPubMed Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF. GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform. 2005;74:491–503.CrossRefPubMed
43.
go back to reference Aphinyanaphongs Y, Aliferis C. Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE. AMIA Annu Symp Proc. 2006;2006:6–10.PubMedCentral Aphinyanaphongs Y, Aliferis C. Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE. AMIA Annu Symp Proc. 2006;2006:6–10.PubMedCentral
Metadata
Title
Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach
Authors
Alisa Surkis
Janice A. Hogle
Deborah DiazGranados
Joe D. Hunt
Paul E. Mazmanian
Emily Connors
Kate Westaby
Elizabeth C. Whipple
Trisha Adamus
Meridith Mueller
Yindalon Aphinyanaphongs
Publication date
01-12-2016
Publisher
BioMed Central
Published in
Journal of Translational Medicine / Issue 1/2016
Electronic ISSN: 1479-5876
DOI
https://doi.org/10.1186/s12967-016-0992-8

Other articles of this Issue 1/2016

Journal of Translational Medicine 1/2016 Go to the issue