Top

Published in:

01-12-2012

Understanding the Limits of Large Datasets

Authors: Catherine M. Sanders, Sidney L. Saltzstein, Matthew M. Schultzel, Duy H. Nguyen, Helen Shi Stafford, Georgia Robins Sadler

Published in: Journal of Cancer Education | Issue 4/2012

Abstract

Many health professionals use large datasets to answer behavioral, translational, or clinical questions. Understanding the impact of missing data in large databases, such as disease registries, can avoid erroneous interpretations of these data. Using the California Cancer Registry, the authors selected seven common cancers, seven sociodemographic and clinical variables, and the top three reporting sources, as examples of the type of data that would be deemed critical to most studies. The gender variable had no missing data, followed by age (<0.1 % missing), ethnicity (1.7 %), stage (9.8 %), differentiation (39.1 %), and birthplace (41.1 %). Reports from hospitals and clinics had the lowest percentages of missing data. Users of large datasets should anticipate the limitations of missing data to prevent methodological flaws and misinterpretations of research findings. Knowledge of what and how much data may be missing in large datasets can help prevent errors in research conclusions, while better guiding treatment modalities and public health policies and programs.

American Joint Committee on Cancer (1988) Manual for staging of cancer, Thirdth edn. J.B. Lippincott, Philadelphia

Furie B et al (2003) Clinical hematology and oncology. Presentation, diagnosis, and treatment. Churchill Livingstone, Philadelphia

Gomez SL, Glaser SL (2005) Quality of cancer registry birthplace data for Hispanics living in the United States. Cancer Causes Control 16(6):713–723PubMedCrossRef

Gomez SL et al (2004) Bias in completeness of birthplace data for Asian groups in a population-based cancer registry (United States). Cancer Causes Control 15(3):243–253PubMedCrossRef

Lin SS, O'Malley CD, Lui SW (2001) Factors associated with missing birthplace information in a population-based cancer registry. Ethn Dis 11(4):598–605PubMed

Gomez SL et al (2003) Hospital policy and practice regarding the collection of data on race, ethnicity, and birthplace. Am J Public Health 93(10):1685–1688PubMedCrossRef

Konowitz PM, Petrossian GA, Rose DN (1984) The underreporting of disease and physicians' knowledge of reporting requirements. Public Health Rep 99(1):31–35PubMed

Seixas NS, Rosenman KD (1986) Voluntary reporting system for occupational disease: pilot project, evaluation. Public Health Rep 101(3):278–282PubMed

Mettlin CJ et al (1997) A comparison of breast, colorectal, lung, and prostate cancers reported to the National Cancer Data Base and the Surveillance, Epidemiology, and End Results Program. Cancer 79(10):2052–2061PubMedCrossRef

Title: Understanding the Limits of Large Datasets
Authors: Catherine M. Sanders
Sidney L. Saltzstein
Matthew M. Schultzel
Duy H. Nguyen
Helen Shi Stafford
Georgia Robins Sadler
Publication date: 01-12-2012
Publisher: Springer-Verlag
Published in: Journal of Cancer Education / Issue 4/2012
Print ISSN: 0885-8195
Electronic ISSN: 1543-0154
DOI: https://doi.org/10.1007/s13187-012-0383-7

Webinar | 19-02-2024 | 17:30 (CET)

Keynote webinar | Spotlight on antibody–drug conjugates in cancer

Watch now

Antibody–drug conjugates (ADCs) are novel agents that have shown promise across multiple tumor types. Explore the current landscape of ADCs in breast and lung cancer with our experts, and gain insights into the mechanism of action, key clinical trials data, existing challenges, and future directions.

Dr. Véronique Diéras

Prof. Fabrice Barlesi

Developed by: Springer Medicine

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Please log in to get access to this content

Other articles of this Issue 4/2012

Content, Placement, and Acquisition of Cancer Education for Latino Patient Care: A Qualitative Study of Medical and Nursing Students

Erratum to: Influence of Framing and Graphic Format on Comprehension of Risk Information among American Indian Tribal College Students

Information Needs of Cancer Patients: A Comparison of Nurses’ and Patients’ Perceptions

The Development of a Peritoneal Surface Malignancy Program: a Tale of Three Hospitals

The Mayo Clinic Breast Cancer Book: A New Resource for Patients Navigating Breast Cancer

The Psychological Impact of a False-Positive Screening Mammogram in Barcelona

Keynote webinar | Spotlight on antibody–drug conjugates in cancer