Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2012

Open Access 01-12-2012 | Research article

Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records

Authors: Ronan Ryan, Sally Vernon, Gill Lawrence, Sue Wilson

Published in: BMC Medical Informatics and Decision Making | Issue 1/2012

Login to get access

Abstract

Background

Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incompletely recorded. This paper presents a method for imputing missing data on the ethnicity of cancer patients, developed for a regional cancer registry in the UK.

Methods

Routine records from cancer screening services, name recognition software (Nam Pehchan and Onomap), 2001 national Census data, and multiple imputation were used to predict the ethnicity of the 23% of cases that were still missing following linkage with self-reported ethnicity from inpatient hospital records.

Results

The name recognition software were good predictors of ethnicity for South Asian cancer cases when compared with data on ethnicity derived from hospital inpatient records, especially when combined (sensitivity 90.5%; specificity 99.9%; PPV 93.3%). Onomap was a poor predictor of ethnicity for other minority ethnic groups (sensitivity 4.4% for Black cases and 0.0% for Chinese/Other ethnic groups). Area-based data derived from the national Census was also a poor predictor non-White ethnicity (sensitivity: South Asian 7.4%; Black 2.3%; Chinese/Other 0.0%; Mixed 0.0%).

Conclusions

Currently, neither method for assigning individuals to an ethnic group (name recognition and ethnic distribution of area of residence) performs well across all ethnic groups. We recommend further development of name recognition applications and the identification of additional methods for predicting ethnicity to improve their precision and accuracy for comparisons of health outcomes. However, real improvements can only come from better recording of ethnicity by health services.
Literature
3.
go back to reference Cummins C, Winter H, Cheng KK, Maric R, Silcocks P, Varghese C: An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin. J Public Health Med. 1999, 21: 401-6. 10.1093/pubmed/21.4.401.CrossRefPubMed Cummins C, Winter H, Cheng KK, Maric R, Silcocks P, Varghese C: An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin. J Public Health Med. 1999, 21: 401-6. 10.1093/pubmed/21.4.401.CrossRefPubMed
4.
go back to reference Price CL, Szczepura AK, Gumber AK, Patnick JP: Comparison of breast and bowel cancer screening uptake patterns in a common cohort of South Asian women in England. BMC Health Services Research. 2010, 10: 103-10.1186/1472-6963-10-103.CrossRefPubMedPubMedCentral Price CL, Szczepura AK, Gumber AK, Patnick JP: Comparison of breast and bowel cancer screening uptake patterns in a common cohort of South Asian women in England. BMC Health Services Research. 2010, 10: 103-10.1186/1472-6963-10-103.CrossRefPubMedPubMedCentral
5.
go back to reference Szczepura A, Price CL, Gumber A: Breast and bowel cancer screening uptake patterns over 15 years for UK South Asian ethnic minority populations, corrected for differences in socio-demographic characteristics. BMC Public Health. 2008, 8: 1471-2458.CrossRef Szczepura A, Price CL, Gumber A: Breast and bowel cancer screening uptake patterns over 15 years for UK South Asian ethnic minority populations, corrected for differences in socio-demographic characteristics. BMC Public Health. 2008, 8: 1471-2458.CrossRef
6.
go back to reference Nanchahal K, Mangtani P, Alston M, dos Santos Silva I: Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies. J Public Health Med. 2001, 23: 278-85. 10.1093/pubmed/23.4.278.CrossRefPubMed Nanchahal K, Mangtani P, Alston M, dos Santos Silva I: Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies. J Public Health Med. 2001, 23: 278-85. 10.1093/pubmed/23.4.278.CrossRefPubMed
9.
go back to reference Mateos P: A review of name-based ethnicity classification methods and their potential in population studies. Population Space and Place. 2007, 13: 243-263. 10.1002/psp.457.CrossRef Mateos P: A review of name-based ethnicity classification methods and their potential in population studies. Population Space and Place. 2007, 13: 243-263. 10.1002/psp.457.CrossRef
10.
go back to reference Brant LJ, Boxall E: The problem with using computer programmes to assign ethnicity: Immigration decreases sensitivity. Public Health. 2009, 123: 316-320. 10.1016/j.puhe.2009.02.002.CrossRefPubMed Brant LJ, Boxall E: The problem with using computer programmes to assign ethnicity: Immigration decreases sensitivity. Public Health. 2009, 123: 316-320. 10.1016/j.puhe.2009.02.002.CrossRefPubMed
12.
go back to reference Renshaw C, Jack RH, Dixon S, Møller H, Davies EA: Estimating attendance for breast cancer screening in ethnic groups in London. BMC Public Health. 2010, 10: 157-10.1186/1471-2458-10-157.CrossRefPubMedPubMedCentral Renshaw C, Jack RH, Dixon S, Møller H, Davies EA: Estimating attendance for breast cancer screening in ethnic groups in London. BMC Public Health. 2010, 10: 157-10.1186/1471-2458-10-157.CrossRefPubMedPubMedCentral
13.
go back to reference Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P: Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ. 2007, 335: 136-10.1136/bmj.39261.471806.55.CrossRefPubMedPubMedCentral Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P: Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ. 2007, 335: 136-10.1136/bmj.39261.471806.55.CrossRefPubMedPubMedCentral
14.
go back to reference Royston P, Carlin JB, White IR: Multiple imputation of missing values: New features for mim. Stata Journal. 2009, 9: 252-264. Royston P, Carlin JB, White IR: Multiple imputation of missing values: New features for mim. Stata Journal. 2009, 9: 252-264.
16.
go back to reference Downing A, Forman D, Thomas JD, West RM, Lawrence G, Gilthorpe MS: Investigating the association between ethnicity and survival from breast cancer using routinely collected health data: challenges and potential solutions [abstract]. Journal of Epidemiology and Community Health. 2009, 63: 88-10.1136/jech.2009.096735j.CrossRef Downing A, Forman D, Thomas JD, West RM, Lawrence G, Gilthorpe MS: Investigating the association between ethnicity and survival from breast cancer using routinely collected health data: challenges and potential solutions [abstract]. Journal of Epidemiology and Community Health. 2009, 63: 88-10.1136/jech.2009.096735j.CrossRef
17.
go back to reference van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18: 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999, 18: 681-694. 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R.CrossRefPubMed
18.
go back to reference Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009, 338: b2393-10.1136/bmj.b2393.CrossRefPubMedPubMedCentral Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009, 338: b2393-10.1136/bmj.b2393.CrossRefPubMedPubMedCentral
19.
go back to reference Iqbal G, Gumber A, Szczepura A, Johnson M, Wilson S, Dunn J: Improving ethnicity data collection for cancer statistics in the UK. Diversity in Health and Care. 2009, 16: 267-285. Iqbal G, Gumber A, Szczepura A, Johnson M, Wilson S, Dunn J: Improving ethnicity data collection for cancer statistics in the UK. Diversity in Health and Care. 2009, 16: 267-285.
Metadata
Title
Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
Authors
Ronan Ryan
Sally Vernon
Gill Lawrence
Sue Wilson
Publication date
01-12-2012
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2012
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-12-3

Other articles of this Issue 1/2012

BMC Medical Informatics and Decision Making 1/2012 Go to the issue