Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2007

Open Access 01-12-2007 | Research article

An automatic method to generate domain-specific investigator networks using PubMed abstracts

Authors: Wei Yu, Ajay Yesupriya, Anja Wulf, Junfeng Qu, Marta Gwinn, Muin J Khoury

Published in: BMC Medical Informatics and Decision Making | Issue 1/2007

Login to get access

Abstract

Background

Collaboration among investigators has become critical to scientific research. This includes ad hoc collaboration established through personal contacts as well as formal consortia established by funding agencies. Continued growth in online resources for scientific research and communication has promoted the development of highly networked research communities. Extending these networks globally requires identifying additional investigators in a given domain, profiling their research interests, and collecting current contact information. We present a novel strategy for building investigator networks dynamically and producing detailed investigator profiles using data available in PubMed abstracts.

Results

We developed a novel strategy to obtain detailed investigator information by automatically parsing the affiliation string in PubMed records. We illustrated the results by using a published literature database in human genome epidemiology (HuGE Pub Lit) as a test case. Our parsing strategy extracted country information from 92.1% of the affiliation strings in a random sample of PubMed records and in 97.0% of HuGE records, with accuracies of 94.0% and 91.0%, respectively. Institution information was parsed from 91.3% of the general PubMed records (accuracy 86.8%) and from 94.2% of HuGE PubMed records (accuracy 87.0). We demonstrated the application of our approach to dynamic creation of investigator networks by creating a prototype information system containing a large database of PubMed abstracts relevant to human genome epidemiology (HuGE Pub Lit), indexed using PubMed medical subject headings converted to Unified Medical Language System concepts. Our method was able to identify 70–90% of the investigators/collaborators in three different human genetics fields; it also successfully identified 9 of 10 genetics investigators within the PREBIC network, an existing preterm birth research network.

Conclusion

We successfully created a web-based prototype capable of creating domain-specific investigator networks based on an application that accurately generates detailed investigator profiles from PubMed abstracts combined with robust standard vocabularies. This approach could be used for other biomedical fields to efficiently establish domain-specific investigator networks.
Appendix
Available only for authorised users
Literature
1.
go back to reference Ioannidis JP, Gwinn M, Little J, Higgins JP, Bernstein JL, Boffetta P, Bondy M, Bray MS, Brenchley PE, Buffler PA, Casas JP, Chokkalingam A, Danesh J, Smith GD, Dolan S, Duncan R, Gruis NA, Hartge P, Hashibe M, Hunter DJ, Jarvelin MR, Malmer B, Maraganore DM, Newton-Bishop JA, O'Brien TR, Petersen G, Riboli E, Salanti G, Seminara D, Smeeth L, Taioli E, Timpson N, Uitterlinden AG, Vineis P, Wareham N, Winn DM, Zimmern R, Khoury MJ, Human Genome Epidemiology Network and the Network of Investigator Networks: A network of investigator networks in human genome epidemiology. Am J Epidemiol. 2005, 162: 302-304. 10.1093/aje/kwi201.CrossRefPubMed Ioannidis JP, Gwinn M, Little J, Higgins JP, Bernstein JL, Boffetta P, Bondy M, Bray MS, Brenchley PE, Buffler PA, Casas JP, Chokkalingam A, Danesh J, Smith GD, Dolan S, Duncan R, Gruis NA, Hartge P, Hashibe M, Hunter DJ, Jarvelin MR, Malmer B, Maraganore DM, Newton-Bishop JA, O'Brien TR, Petersen G, Riboli E, Salanti G, Seminara D, Smeeth L, Taioli E, Timpson N, Uitterlinden AG, Vineis P, Wareham N, Winn DM, Zimmern R, Khoury MJ, Human Genome Epidemiology Network and the Network of Investigator Networks: A network of investigator networks in human genome epidemiology. Am J Epidemiol. 2005, 162: 302-304. 10.1093/aje/kwi201.CrossRefPubMed
2.
5.
go back to reference Newman ME: Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E Stat Nonlin Soft Matter Phys. 2001, 64: 016132-CrossRefPubMed Newman ME: Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E Stat Nonlin Soft Matter Phys. 2001, 64: 016132-CrossRefPubMed
6.
go back to reference Newman ME: Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E Stat Nonlin Soft Matter Phys. 2001, 64: 016131-CrossRefPubMed Newman ME: Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E Stat Nonlin Soft Matter Phys. 2001, 64: 016131-CrossRefPubMed
8.
go back to reference Newman ME: Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA. 2004, 101 (Suppl 1): 5200-5205. 10.1073/pnas.0307545100. 2004 Jan 26CrossRefPubMedPubMedCentral Newman ME: Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA. 2004, 101 (Suppl 1): 5200-5205. 10.1073/pnas.0307545100. 2004 Jan 26CrossRefPubMedPubMedCentral
9.
go back to reference Lin BK, Clyne M, Walsh M, Gomez O, Yu W, Gwinn M, Khoury MJ: Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database. Am J Epidemiol. 2006, 164: 1-4. 10.1093/aje/kwj175.CrossRefPubMed Lin BK, Clyne M, Walsh M, Gomez O, Yu W, Gwinn M, Khoury MJ: Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database. Am J Epidemiol. 2006, 164: 1-4. 10.1093/aje/kwj175.CrossRefPubMed
11.
go back to reference Lindberg DA, Humphreys BL, McCray AT: The Unified Medical Language System. Methods Inf Med. 1993, 32: 281-291.PubMed Lindberg DA, Humphreys BL, McCray AT: The Unified Medical Language System. Methods Inf Med. 1993, 32: 281-291.PubMed
19.
go back to reference Teasley S, Wolinsky S: Communication. Scientific collaborations at a distance. Science. 2001, 292: 2254-2255. 10.1126/science.1061619.CrossRefPubMed Teasley S, Wolinsky S: Communication. Scientific collaborations at a distance. Science. 2001, 292: 2254-2255. 10.1126/science.1061619.CrossRefPubMed
20.
go back to reference Collins FS, Patrinos A, Jordan E, Chakravarti A, Gesteland R, Walters L: New goals for the U.S. Human Genome Project: 1998–2003. Science. 1998, 282: 682-689. 10.1126/science.282.5389.682.CrossRefPubMed Collins FS, Patrinos A, Jordan E, Chakravarti A, Gesteland R, Walters L: New goals for the U.S. Human Genome Project: 1998–2003. Science. 1998, 282: 682-689. 10.1126/science.282.5389.682.CrossRefPubMed
21.
go back to reference Ioannidis JP, Gwinn M, Little J, Higgins JP, Bernstein JL, Boffetta P, Bondy M, Bray MS, Brenchley PE, Buffler PA, Casas JP, Chokkalingam A, Danesh J, Smith GD, Dolan S, Duncan R, Gruis NA, Hartge P, Hashibe M, Hunter DJ, Jarvelin MR, Malmer B, Maraganore DM, Newton-Bishop JA, O'Brien TR, Petersen G, Riboli E, Salanti G, Seminara D, Smeeth L, Taioli E, Timpson N, Uitterlinden AG, Vineis P, Wareham N, Winn DM, Zimmern R, Khoury MJ, Human Genome Epidemiology Network and the Network of Investigator Networks: A road map for efficient and reliable human genome epidemiology. Nat Genet. 2006, 38: 3-5. 10.1038/ng0106-3.CrossRefPubMed Ioannidis JP, Gwinn M, Little J, Higgins JP, Bernstein JL, Boffetta P, Bondy M, Bray MS, Brenchley PE, Buffler PA, Casas JP, Chokkalingam A, Danesh J, Smith GD, Dolan S, Duncan R, Gruis NA, Hartge P, Hashibe M, Hunter DJ, Jarvelin MR, Malmer B, Maraganore DM, Newton-Bishop JA, O'Brien TR, Petersen G, Riboli E, Salanti G, Seminara D, Smeeth L, Taioli E, Timpson N, Uitterlinden AG, Vineis P, Wareham N, Winn DM, Zimmern R, Khoury MJ, Human Genome Epidemiology Network and the Network of Investigator Networks: A road map for efficient and reliable human genome epidemiology. Nat Genet. 2006, 38: 3-5. 10.1038/ng0106-3.CrossRefPubMed
22.
go back to reference Kremer JA, Braat DD, Evers JL: Geographical distribution of publications in Human Reproduction and Fertility and Sterility in the 1990s. Hum Reprod. 2000, 15: 1653-1656. 10.1093/humrep/15.8.1653.CrossRefPubMed Kremer JA, Braat DD, Evers JL: Geographical distribution of publications in Human Reproduction and Fertility and Sterility in the 1990s. Hum Reprod. 2000, 15: 1653-1656. 10.1093/humrep/15.8.1653.CrossRefPubMed
23.
go back to reference Tutarel O: Geographical distribution of publications in the field of medical education. BMC Med Educ. 2002, 2:3: 3-10.1186/1472-6920-2-3.CrossRef Tutarel O: Geographical distribution of publications in the field of medical education. BMC Med Educ. 2002, 2:3: 3-10.1186/1472-6920-2-3.CrossRef
Metadata
Title
An automatic method to generate domain-specific investigator networks using PubMed abstracts
Authors
Wei Yu
Ajay Yesupriya
Anja Wulf
Junfeng Qu
Marta Gwinn
Muin J Khoury
Publication date
01-12-2007
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2007
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-7-17

Other articles of this Issue 1/2007

BMC Medical Informatics and Decision Making 1/2007 Go to the issue