Skip to main content
Top
Published in: BMC Cancer 1/2004

Open Access 01-12-2004 | Database

Tumor taxonomy for the developmental lineage classification of neoplasms

Author: Jules J Berman

Published in: BMC Cancer | Issue 1/2004

Login to get access

Abstract

Background

The new "Developmental lineage classification of neoplasms" was described in a prior publication. The classification is simple (the entire hierarchy is described with just 39 classifiers), comprehensive (providing a place for every tumor of man), and consistent with recent attempts to characterize tumors by cytogenetic and molecular features. A taxonomy is a list of the instances that populate a classification. The taxonomy of neoplasia attempts to list every known term for every known tumor of man.

Methods

The taxonomy provides each concept with a unique code and groups synonymous terms under the same concept. A Perl script validated successive drafts of the taxonomy ensuring that: 1) each term occurs only once in the taxonomy; 2) each term occurs in only one tumor class; 3) each concept code occurs in one and only one hierarchical position in the classification; and 4) the file containing the classification and taxonomy is a well-formed XML (eXtensible Markup Language) document.

Results

The taxonomy currently contains 122,632 different terms encompassing 5,376 neoplasm concepts. Each concept has, on average, 23 synonyms. The taxonomy populates "The developmental lineage classification of neoplasms," and is available as an XML file, currently 9+ Megabytes in length. A representation of the classification/taxonomy listing each term followed by its code, followed by its full ancestry, is available as a flat-file, 19+ Megabytes in length.
The taxonomy is the largest nomenclature of neoplasms, with more than twice the number of neoplasm names found in other medical nomenclatures, including the 2004 version of the Unified Medical Language System, the Systematized Nomenclature of Medicine Clinical Terminology, the National Cancer Institute's Thesaurus, and the International Classification of Diseases Oncolology version.

Conclusions

This manuscript describes a comprehensive taxonomy of neoplasia that collects synonymous terms under a unique code number and assigns each tumor to a single class within the tumor hierarchy. The entire classification and taxonomy are available as open access files (in XML and flat-file formats) with this article.
Appendix
Available only for authorised users
Literature
2.
go back to reference Mayr E: The growth of biological thought: diversity, evolution and inheritance. 1982, Cambridge: Belknap Press Mayr E: The growth of biological thought: diversity, evolution and inheritance. 1982, Cambridge: Belknap Press
3.
go back to reference Baorto DM, Cimino JJ, Parvin CA, Kahn MG: Combining laboratory data sets from multiple institutions using the logical observation identifier names and codes (LOINC). Int J Med Inform. 1998, 51: 29-37. 10.1016/S1386-5056(98)00089-6.CrossRefPubMed Baorto DM, Cimino JJ, Parvin CA, Kahn MG: Combining laboratory data sets from multiple institutions using the logical observation identifier names and codes (LOINC). Int J Med Inform. 1998, 51: 29-37. 10.1016/S1386-5056(98)00089-6.CrossRefPubMed
4.
go back to reference Marti'n-Sanchez F, Maojo V, Lo'pez-Campos G: Integrating genomics into health information systems. Methods Inf Med. 2002, 41: 25-30. Marti'n-Sanchez F, Maojo V, Lo'pez-Campos G: Integrating genomics into health information systems. Methods Inf Med. 2002, 41: 25-30.
5.
go back to reference Cantor MN, Lussier YA: Putting data integration into practice: using biomedical terminologies to add structure to existing data sources. AMIA Annu Symp Proc. 2003, 125-129. Cantor MN, Lussier YA: Putting data integration into practice: using biomedical terminologies to add structure to existing data sources. AMIA Annu Symp Proc. 2003, 125-129.
6.
go back to reference Covitz PA, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahni H, Gustafson S, Buetow KH: caCORE: a common infrastructure for cancer informatics. Bioinformatics. 2003, 19: 2404-2412. 10.1093/bioinformatics/btg335.CrossRefPubMed Covitz PA, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahni H, Gustafson S, Buetow KH: caCORE: a common infrastructure for cancer informatics. Bioinformatics. 2003, 19: 2404-2412. 10.1093/bioinformatics/btg335.CrossRefPubMed
7.
go back to reference Stein LD: Integrating biological databases. Nature Reviews – Genetics. 2003, 4: 337-345. 10.1038/nrg1065.CrossRefPubMed Stein LD: Integrating biological databases. Nature Reviews – Genetics. 2003, 4: 337-345. 10.1038/nrg1065.CrossRefPubMed
8.
go back to reference Berman JJ: A tool for sharing annotated research data: the "Category 0" UMLS (Unified Medical Language System) vocabularies. BMC Med Inform Decis Mak. 2003, 3: 6-10.1186/1472-6947-3-6.CrossRefPubMedPubMedCentral Berman JJ: A tool for sharing annotated research data: the "Category 0" UMLS (Unified Medical Language System) vocabularies. BMC Med Inform Decis Mak. 2003, 3: 6-10.1186/1472-6947-3-6.CrossRefPubMedPubMedCentral
11.
go back to reference Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R, Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261. 10.1093/nar/gkh066.CrossRefPubMed Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R, Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261. 10.1093/nar/gkh066.CrossRefPubMed
14.
go back to reference Ahmed K, Ayers D, Birbeck M, Cousins J, Dodds D, Lubell J, Nic M, Rivers-Moore D, Watt A, Worden R, Wrightson A: Professional XML Meta Data. 2001, Wrox Press Ltd. Birmingham Ahmed K, Ayers D, Birbeck M, Cousins J, Dodds D, Lubell J, Nic M, Rivers-Moore D, Watt A, Worden R, Wrightson A: Professional XML Meta Data. 2001, Wrox Press Ltd. Birmingham
18.
go back to reference Zweigenbaum P, Grabar N: Corpus-based associations provide additional morphological variants to medical terminologies. Proc AMIA Symp. 2003, 768-72. Zweigenbaum P, Grabar N: Corpus-based associations provide additional morphological variants to medical terminologies. Proc AMIA Symp. 2003, 768-72.
21.
go back to reference Walsh SH: The clinician's perspective on electronic health records and how they can affect patient care. BMJ. 2004, 328: 1184-1187. 10.1136/bmj.328.7449.1184.CrossRefPubMedPubMedCentral Walsh SH: The clinician's perspective on electronic health records and how they can affect patient care. BMJ. 2004, 328: 1184-1187. 10.1136/bmj.328.7449.1184.CrossRefPubMedPubMedCentral
22.
go back to reference Humphreys BL, McCray AT, Cheh ML: Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test. J Am Med Inform Assoc. 1997, 4: 484-500.CrossRefPubMedPubMedCentral Humphreys BL, McCray AT, Cheh ML: Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test. J Am Med Inform Assoc. 1997, 4: 484-500.CrossRefPubMedPubMedCentral
Metadata
Title
Tumor taxonomy for the developmental lineage classification of neoplasms
Author
Jules J Berman
Publication date
01-12-2004
Publisher
BioMed Central
Published in
BMC Cancer / Issue 1/2004
Electronic ISSN: 1471-2407
DOI
https://doi.org/10.1186/1471-2407-4-88

Other articles of this Issue 1/2004

BMC Cancer 1/2004 Go to the issue
Webinar | 19-02-2024 | 17:30 (CET)

Keynote webinar | Spotlight on antibody–drug conjugates in cancer

Antibody–drug conjugates (ADCs) are novel agents that have shown promise across multiple tumor types. Explore the current landscape of ADCs in breast and lung cancer with our experts, and gain insights into the mechanism of action, key clinical trials data, existing challenges, and future directions.

Dr. Véronique Diéras
Prof. Fabrice Barlesi
Developed by: Springer Medicine