Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2018

Open Access 01-12-2018 | Research article

Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania

Authors: Christopher T. Rentsch, Katie Harron, Mark Urassa, Jim Todd, Georges Reniers, Basia Zaba

Published in: BMC Medical Research Methodology | Issue 1/2018

Login to get access

Abstract

Background

Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania.

Methods

Semi-automatic point-of-contact interactive record linkage was used to establish gold standard links between community-based HIV surveillance data and medical records at clinics serving the surveillance population. Automated probabilistic record linkage was used to create analytic datasets at minimum, low, medium, and high match score thresholds. Cox proportional hazards regression models were used to compare HIV care registration rates by testing modality (sero-survey vs. clinic) in each analytic dataset. We assessed linkage quality using three approaches: quantifying linkage errors, comparing characteristics between linked and unlinked data, and evaluating bias and precision of regression estimates.

Results

Between 2014 and 2017, 405 individuals with gold standard links were newly diagnosed with HIV in sero-surveys (n = 263) and clinics (n = 142). Automated probabilistic linkage correctly identified 233 individuals (positive predictive value [PPV] = 65%) at the low threshold and 95 individuals (PPV = 90%) at the high threshold. Significant differences were found between linked and unlinked records in primary exposure and outcome variables and for adjusting covariates at every threshold. As expected, differences attenuated with increasing threshold. Testing modality was significantly associated with time to registration in the gold standard data (adjusted hazard ratio [HR] 4.98 for clinic-based testing, 95% confidence interval [CI] 3.34, 7.42). Increasing false matches weakened the association (HR 2.76 at minimum match score threshold, 95% CI 1.73, 4.41). Increasing missed matches (i.e., increasing match score threshold and positive predictive value of the linkage algorithm) was strongly correlated with a reduction in the precision of coefficient estimate (R2 = 0.97; p = 0.03).

Conclusions

Similar to studies with more negligible levels of linkage errors, false matches in this setting reduced the magnitude of the association; missed matches reduced precision. Adjusting for these biases could provide more robust results using data with considerable linkage errors.
Literature
1.
go back to reference Wellcome Trust: Enabling data linkage to maximise the value of public Health Research data: full report. 2015. Wellcome Trust: Enabling data linkage to maximise the value of public Health Research data: full report. 2015.
2.
go back to reference Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64:1183–210.CrossRef Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64:1183–210.CrossRef
3.
go back to reference Newcombe H, Kennedy J, Axford S, James A. Automatic linkage of vital records. Science. 1959;130:954–9.CrossRefPubMed Newcombe H, Kennedy J, Axford S, James A. Automatic linkage of vital records. Science. 1959;130:954–9.CrossRefPubMed
4.
go back to reference Baldi I, Ponti A, Zanetti R, Ciccone G, Merletti F, Gregori D. The impact of record-linkage bias in the cox model. J Eval Clin Pract. 2010;16:92–6.CrossRefPubMed Baldi I, Ponti A, Zanetti R, Ciccone G, Merletti F, Gregori D. The impact of record-linkage bias in the cox model. J Eval Clin Pract. 2010;16:92–6.CrossRefPubMed
5.
go back to reference Moore CL, Amin J, Gidding HF, Law MG. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS One. 2014;9:e103690.CrossRefPubMedPubMedCentral Moore CL, Amin J, Gidding HF, Law MG. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS One. 2014;9:e103690.CrossRefPubMedPubMedCentral
6.
go back to reference Harron K, Goldstein H, Wade A, Muller-Pebody B, Parslow R, Gilbert R. Linkage, Evaluation and Analysis of National Electronic Healthcare Data: application to providing enhanced blood-stream infection surveillance in Paediatric intensive care. PLoS One. 2013;8:e85278. Harron K, Goldstein H, Wade A, Muller-Pebody B, Parslow R, Gilbert R. Linkage, Evaluation and Analysis of National Electronic Healthcare Data: application to providing enhanced blood-stream infection surveillance in Paediatric intensive care. PLoS One. 2013;8:e85278.
7.
go back to reference Schmidlin K, Clough-Gorr KM, Spoerri A, Egger M, Zwahlen M, Swiss National C. Impact of unlinked deaths and coding changes on mortality trends in the Swiss National Cohort. BMC Med Inform Decis Mak. 2013;13:1.CrossRefPubMedPubMedCentral Schmidlin K, Clough-Gorr KM, Spoerri A, Egger M, Zwahlen M, Swiss National C. Impact of unlinked deaths and coding changes on mortality trends in the Swiss National Cohort. BMC Med Inform Decis Mak. 2013;13:1.CrossRefPubMedPubMedCentral
8.
go back to reference Boyd JH, Ferrante AM, Irvine K, Smith M, Moore E, Brown A, Randall SM. Understanding the origins of record linkage errors and how they affect research outcomes. Aust N Z J Public Health. 2017;41:215. Boyd JH, Ferrante AM, Irvine K, Smith M, Moore E, Brown A, Randall SM. Understanding the origins of record linkage errors and how they affect research outcomes. Aust N Z J Public Health. 2017;41:215.
9.
go back to reference Jorm L. Routinely collected data as a strategic resource for research: priorities for methods and workforce. Public Health Res Pract. 2015;25:e2541540.CrossRefPubMed Jorm L. Routinely collected data as a strategic resource for research: priorities for methods and workforce. Public Health Res Pract. 2015;25:e2541540.CrossRefPubMed
10.
go back to reference Bentley JP, Ford JB, Taylor LK, Irvine KA, Roberts CL. Investigating linkage rates among probabilistically linked birth and hospitalization records. BMC Med Res Methodol. 2012;12:149.CrossRefPubMedPubMedCentral Bentley JP, Ford JB, Taylor LK, Irvine KA, Roberts CL. Investigating linkage rates among probabilistically linked birth and hospitalization records. BMC Med Res Methodol. 2012;12:149.CrossRefPubMedPubMedCentral
11.
go back to reference Bohensky MA, Jolley D, Sundararajan V, Evans S, Pilcher DV, Scott I, Brand CA. Data linkage: a powerful research tool with potential problems. BMC Health Serv Res. 2010;10:346.CrossRefPubMedPubMedCentral Bohensky MA, Jolley D, Sundararajan V, Evans S, Pilcher DV, Scott I, Brand CA. Data linkage: a powerful research tool with potential problems. BMC Health Serv Res. 2010;10:346.CrossRefPubMedPubMedCentral
12.
go back to reference Corbell C, Katjitae I, Mengistu A, Kalemeera F, Sagwa E, Mabirizi D, Lates J, Nwokike J, Fuller S, Stergachis A. Records linkage of electronic databases for the assessment of adverse effects of antiretroviral therapy in sub-Saharan Africa. Pharmacoepidemiol Drug Saf. 2012;21:407–14.CrossRefPubMed Corbell C, Katjitae I, Mengistu A, Kalemeera F, Sagwa E, Mabirizi D, Lates J, Nwokike J, Fuller S, Stergachis A. Records linkage of electronic databases for the assessment of adverse effects of antiretroviral therapy in sub-Saharan Africa. Pharmacoepidemiol Drug Saf. 2012;21:407–14.CrossRefPubMed
13.
go back to reference Cawley C, Wringe A, Todd J, Gourlay A, Clark B, Masesa C, Machemba R, Reniers G, Urassa M, Zaba B. Risk factors for service use and trends in coverage of different HIV testing and counselling models in Northwest Tanzania between 2003 and 2010. Tropical Med Int Health. 2015;20:1473-87. Cawley C, Wringe A, Todd J, Gourlay A, Clark B, Masesa C, Machemba R, Reniers G, Urassa M, Zaba B. Risk factors for service use and trends in coverage of different HIV testing and counselling models in Northwest Tanzania between 2003 and 2010. Tropical Med Int Health. 2015;20:1473-87.
14.
go back to reference Gourlay A, Wringe A, Todd J, Cawley C, Michael D, Machemba R, Reniers G, Urassa M, Zaba B. Factors associated with uptake of services to prevent mother-to-child transmission of HIV in a community cohort in rural Tanzania. Sex Transm Infect. 2015;91:520-7. Gourlay A, Wringe A, Todd J, Cawley C, Michael D, Machemba R, Reniers G, Urassa M, Zaba B. Factors associated with uptake of services to prevent mother-to-child transmission of HIV in a community cohort in rural Tanzania. Sex Transm Infect. 2015;91:520-7.
15.
go back to reference Kabudula CW, Clark BD, Gómez-Olivé FX, Tollman S, Menken J, Reniers G. The promise of record linkage for assessing the uptake of health services in resource constrained settings: a pilot study from South Africa. BMC Med Res Methodol. 2014;14. Kabudula CW, Clark BD, Gómez-Olivé FX, Tollman S, Menken J, Reniers G. The promise of record linkage for assessing the uptake of health services in resource constrained settings: a pilot study from South Africa. BMC Med Res Methodol. 2014;14.
17.
go back to reference Rentsch CT, Kabudula CW, Catlett J, Beckles D, Machemba R, Mtenga B, Masilela N, Michael D, Natalis R, Urassa M, et al. Point-of-contact interactive record linkage (PIRL): a software tool to prospectively link demographic surveillance and health facility data [version 2; referees: 2 approved]. Gates Open Res. 2018;1. https://doi.org/10.12688/gatesopenres.12751.2. Rentsch CT, Kabudula CW, Catlett J, Beckles D, Machemba R, Mtenga B, Masilela N, Michael D, Natalis R, Urassa M, et al. Point-of-contact interactive record linkage (PIRL): a software tool to prospectively link demographic surveillance and health facility data [version 2; referees: 2 approved]. Gates Open Res. 2018;1. https://​doi.​org/​10.​12688/​gatesopenres.​12751.​2.
18.
go back to reference Rentsch CT, Reniers G, Kabudula C, Machemba R, Mtenga B, Harron K, Mee P, Michael D, Natalis R, Urassa M, et al. Point-of-contact interactive record linkage (PIRL) between demographic surveillance and health facility data in rural Tanzania. International Journal for Population Data Science. 2017;2. Rentsch CT, Reniers G, Kabudula C, Machemba R, Mtenga B, Harron K, Mee P, Michael D, Natalis R, Urassa M, et al. Point-of-contact interactive record linkage (PIRL) between demographic surveillance and health facility data in rural Tanzania. International Journal for Population Data Science. 2017;2.
19.
go back to reference Kishamawe C, Isingo R, Mtenga B, Zaba B, Todd J, Clark B, Changalucha J, Urassa M. Health & Demographic Surveillance System Profile: the Magu health and demographic surveillance system (Magu HDSS). Int J Epidemiol. 2015;44:1851–61.CrossRefPubMedPubMedCentral Kishamawe C, Isingo R, Mtenga B, Zaba B, Todd J, Clark B, Changalucha J, Urassa M. Health & Demographic Surveillance System Profile: the Magu health and demographic surveillance system (Magu HDSS). Int J Epidemiol. 2015;44:1851–61.CrossRefPubMedPubMedCentral
20.
go back to reference Winkler WE. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: American Statistical Association (proceedings of the section on survey research methods); 1990. p. 354–9. Winkler WE. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: American Statistical Association (proceedings of the section on survey research methods); 1990. p. 354–9.
21.
go back to reference Herzog TN, Scheuren FJ, Winkler WE. Data quality and record linkage techniques: Springer Science & Business Media; 2007. Herzog TN, Scheuren FJ, Winkler WE. Data quality and record linkage techniques: Springer Science & Business Media; 2007.
22.
go back to reference Sayers A, Ben-Shlomo Y, Blom AW, Steele F. Probabilistic record linkage. Int J Epidemiol. 2015;45:954-64. Sayers A, Ben-Shlomo Y, Blom AW, Steele F. Probabilistic record linkage. Int J Epidemiol. 2015;45:954-64.
23.
go back to reference Christen P. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. New York: Springer Science & Business Media; 2012. Christen P. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. New York: Springer Science & Business Media; 2012.
24.
go back to reference Harron K, Goldstein H, Dibben C. Methodological developments in data linkage: John Wiley & Sons; 2015. Harron K, Goldstein H, Dibben C. Methodological developments in data linkage: John Wiley & Sons; 2015.
25.
go back to reference Winkler WE. Overview of record linkage and current research directions. In: Research Report Series. Washington, DC: US Bureau of the Census; 2006. Winkler WE. Overview of record linkage and current research directions. In: Research Report Series. Washington, DC: US Bureau of the Census; 2006.
26.
go back to reference Newcombe H. Strategy and art in automated death searches. Am J Public Health. 1984;74. Newcombe H. Strategy and art in automated death searches. Am J Public Health. 1984;74.
27.
go back to reference Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009;28:3083–107.CrossRefPubMedPubMedCentral Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009;28:3083–107.CrossRefPubMedPubMedCentral
28.
go back to reference Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Erlbaum Associates; 1988. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Erlbaum Associates; 1988.
29.
go back to reference Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H. Evaluating bias due to linkage error in electronic healthcare records. BMC Med Res Methodol. 2014;14. Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H. Evaluating bias due to linkage error in electronic healthcare records. BMC Med Res Methodol. 2014;14.
30.
go back to reference Ford JB, Roberts CL, Taylor LK. Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. Paediatr Perinat Epidemiol. 2006;20:329–37.CrossRefPubMed Ford JB, Roberts CL, Taylor LK. Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. Paediatr Perinat Epidemiol. 2006;20:329–37.CrossRefPubMed
31.
go back to reference Harron KL, Doidge JC, Knight HE, Gilbert RE, Goldstein H, Cromwell DA, van der Meulen JH. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol. 2017;46:1699–710.CrossRefPubMedPubMedCentral Harron KL, Doidge JC, Knight HE, Gilbert RE, Goldstein H, Cromwell DA, van der Meulen JH. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol. 2017;46:1699–710.CrossRefPubMedPubMedCentral
32.
go back to reference Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39:417–20.CrossRefPubMed Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39:417–20.CrossRefPubMed
33.
go back to reference Hernan MA, Hernandez-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–25.CrossRefPubMed Hernan MA, Hernandez-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–25.CrossRefPubMed
34.
go back to reference Goldstein H, Harron K, Wade A. The analysis of record-linked data using multiple imputation with data value priors. Stat Med. 2012;31:3481–93.CrossRefPubMed Goldstein H, Harron K, Wade A. The analysis of record-linked data using multiple imputation with data value priors. Stat Med. 2012;31:3481–93.CrossRefPubMed
35.
go back to reference Harron K, Goldstein H, Dibben C. Record linkage: a missing data problem. In: Harron K, Dibben C, Goldstein H, editors. Methodological developments in data linkage. London: John Wiley & Sons; 2015.CrossRef Harron K, Goldstein H, Dibben C. Record linkage: a missing data problem. In: Harron K, Dibben C, Goldstein H, editors. Methodological developments in data linkage. London: John Wiley & Sons; 2015.CrossRef
36.
go back to reference Randall SM, Ferrante AM, Boyd JH, Bauer JK, Semmens JB. Privacy-preserving record linkage on large real world datasets. J Biomed Inform. 2014;50:205–12.CrossRefPubMed Randall SM, Ferrante AM, Boyd JH, Bauer JK, Semmens JB. Privacy-preserving record linkage on large real world datasets. J Biomed Inform. 2014;50:205–12.CrossRefPubMed
37.
go back to reference Schmidlin K, Clough-Gorr KM, Spoerri A, Grp SNCS. Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality. BMC Med Res Methodol. 2015;15:46. Schmidlin K, Clough-Gorr KM, Spoerri A, Grp SNCS. Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality. BMC Med Res Methodol. 2015;15:46.
38.
39.
go back to reference Boyd JH, Guiver T, Randall SM, Ferrante AM, Semmens JB, Anderson P, Dickinson T, Simple Sampling A. Method for estimating the accuracy of large scale record linkage projects. Methods Inf Med. 2016;55:276–83.CrossRefPubMed Boyd JH, Guiver T, Randall SM, Ferrante AM, Semmens JB, Anderson P, Dickinson T, Simple Sampling A. Method for estimating the accuracy of large scale record linkage projects. Methods Inf Med. 2016;55:276–83.CrossRefPubMed
Metadata
Title
Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
Authors
Christopher T. Rentsch
Katie Harron
Mark Urassa
Jim Todd
Georges Reniers
Basia Zaba
Publication date
01-12-2018
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2018
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-018-0632-5

Other articles of this Issue 1/2018

BMC Medical Research Methodology 1/2018 Go to the issue