Skip to main content
Top
Published in: European Journal of Epidemiology 1/2019

Open Access 01-01-2019 | DATA RESOURCES

Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications

Authors: Shivani Padmanabhan, Lucy Carty, Ellen Cameron, Rebecca E. Ghosh, Rachael Williams, Helen Strongman

Published in: European Journal of Epidemiology | Issue 1/2019

Login to get access

Abstract

Record linkage is increasingly used to expand the information available for public health research. An understanding of record linkage methods and the relevant strengths and limitations is important for robust analysis and interpretation of linked data. Here, we describe the approach used by Clinical Practice Research Datalink (CPRD) to link primary care data to other patient level datasets, and the potential implications of this approach for CPRD data analysis. General practice electronic health record software providers separately submit de-identified data to CPRD and patient identifiers to NHS Digital, excluding patients who have opted-out from contributing data. Data custodians for external datasets also send patient identifiers to NHS Digital. NHS Digital uses identifiers to link the datasets using an 8-stage deterministic methodology. CPRD subsequently receives a de-identified linked cohort file and provides researchers with anonymised linked data and metadata detailing the linkage process. This methodology has been used to generate routine primary care linked datasets, including data from Hospital Episode Statistics, Office for National Statistics and National Cancer Registration and Analysis Service. 10.6 million (M) patients from 411 English general practices were included in record linkage in June 2018. 9.1M (86%) patients were of research quality, of which 8.0M (88%) had a valid NHS number and were eligible for linkage in the CPRD standard linked dataset release. Linking CPRD data to other sources improves the range and validity of research studies. This manuscript, together with metadata generated on match strength and linkage eligibility, can be used to inform study design and explore potential linkage-related selection and misclassification biases.
Literature
1.
go back to reference Casey JA, Schwartz BS, Stewart WF, et al. Using electronic health records for population health research: a review of methods and applications. Annu Rev Public Health. 2016;37:61–81.CrossRefPubMed Casey JA, Schwartz BS, Stewart WF, et al. Using electronic health records for population health research: a review of methods and applications. Annu Rev Public Health. 2016;37:61–81.CrossRefPubMed
2.
3.
go back to reference Gilbert R, Lafferty R, Hagger-Johnson G, et al. GUILD: guidance for information about linking data sets. J Public Health. 2018; 40:191–8.CrossRef Gilbert R, Lafferty R, Hagger-Johnson G, et al. GUILD: guidance for information about linking data sets. J Public Health. 2018; 40:191–8.CrossRef
4.
go back to reference Bohensky MA, Jolley D, Sundararajan V, et al. Development and validation of reporting guidelines for studies involving data linkage. Aust N Z J Public Health. 2011;35:486–9.CrossRefPubMed Bohensky MA, Jolley D, Sundararajan V, et al. Development and validation of reporting guidelines for studies involving data linkage. Aust N Z J Public Health. 2011;35:486–9.CrossRefPubMed
5.
6.
go back to reference Degli Esposti E, Berto P, Buda S, et al. The Pandora Project: results of the pilot study. Am J Hypertens. 1999;12:790–6.CrossRefPubMed Degli Esposti E, Berto P, Buda S, et al. The Pandora Project: results of the pilot study. Am J Hypertens. 1999;12:790–6.CrossRefPubMed
13.
go back to reference Millett ERC, Quint JK, De Stavola BL, et al. Improved incidence estimates from linked vs. stand-alone electronic health records. J Clin Epidemiol. 2016;75:66–9.CrossRefPubMedPubMedCentral Millett ERC, Quint JK, De Stavola BL, et al. Improved incidence estimates from linked vs. stand-alone electronic health records. J Clin Epidemiol. 2016;75:66–9.CrossRefPubMedPubMedCentral
14.
go back to reference Crooks CJ, Card TR, West J. The use of a Bayesian hierarchy to develop and validate a co-morbidity score to predict mortality for linked primary and secondary care data from the NHS in England. PLoS ONE. 2016;11:e0165507.CrossRefPubMedPubMedCentral Crooks CJ, Card TR, West J. The use of a Bayesian hierarchy to develop and validate a co-morbidity score to predict mortality for linked primary and secondary care data from the NHS in England. PLoS ONE. 2016;11:e0165507.CrossRefPubMedPubMedCentral
15.
go back to reference Crooks CJ, Card TR, West J. Defining upper gastrointestinal bleeding from linked primary and secondary care data and the effect on occurrence and 28 day mortality. BMC Health Serv Res. 2012;12:392.CrossRefPubMedPubMedCentral Crooks CJ, Card TR, West J. Defining upper gastrointestinal bleeding from linked primary and secondary care data and the effect on occurrence and 28 day mortality. BMC Health Serv Res. 2012;12:392.CrossRefPubMedPubMedCentral
16.
go back to reference Baker R, Tata LJ, Kendrick D, et al. Identification of incident poisoning, fracture and burn events using linked primary care, secondary care and mortality data from England: implications for research and surveillance. Inj Prev. 2016;22:59–67.CrossRefPubMed Baker R, Tata LJ, Kendrick D, et al. Identification of incident poisoning, fracture and burn events using linked primary care, secondary care and mortality data from England: implications for research and surveillance. Inj Prev. 2016;22:59–67.CrossRefPubMed
17.
go back to reference Rothnie KJ, Müllerová H, Thomas SL, et al. Recording of hospitalizations for acute exacerbations of COPD in UK electronic health care records. Clin Epidemiol. 2016;8:771–82.CrossRefPubMedPubMedCentral Rothnie KJ, Müllerová H, Thomas SL, et al. Recording of hospitalizations for acute exacerbations of COPD in UK electronic health care records. Clin Epidemiol. 2016;8:771–82.CrossRefPubMedPubMedCentral
18.
go back to reference Zhu Y, Matsuyama Y, Ohashi Y, et al. When to conduct probabilistic linkage vs. deterministic linkage? A simulation study. J Biomed Inform. 2015;56:80–6.CrossRefPubMed Zhu Y, Matsuyama Y, Ohashi Y, et al. When to conduct probabilistic linkage vs. deterministic linkage? A simulation study. J Biomed Inform. 2015;56:80–6.CrossRefPubMed
20.
go back to reference Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12:e1001885.CrossRefPubMedPubMedCentral Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12:e1001885.CrossRefPubMedPubMedCentral
21.
go back to reference Sayers A, Ben-Shlomo Y, Blom AW, et al. Probabilistic record linkage. Int J Epidemiol. 2016;45:954–64.CrossRefPubMed Sayers A, Ben-Shlomo Y, Blom AW, et al. Probabilistic record linkage. Int J Epidemiol. 2016;45:954–64.CrossRefPubMed
22.
go back to reference Hagger-Johnson G, Harron K, Fleming T, et al. Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records. BMJ Open. 2015;5:e008118.CrossRefPubMedPubMedCentral Hagger-Johnson G, Harron K, Fleming T, et al. Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records. BMJ Open. 2015;5:e008118.CrossRefPubMedPubMedCentral
23.
go back to reference Boyd JH, Randall SM, Ferrante AM, et al. Accuracy and completeness of patient pathways–the benefits of national data linkage in Australia. BMC Health Serv Res. 2015;15:312.CrossRefPubMedPubMedCentral Boyd JH, Randall SM, Ferrante AM, et al. Accuracy and completeness of patient pathways–the benefits of national data linkage in Australia. BMC Health Serv Res. 2015;15:312.CrossRefPubMedPubMedCentral
24.
25.
go back to reference Herrett E, Shah AD, Boggon R, et al. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study. BMJ. 2013;346:f2350.CrossRefPubMedPubMedCentral Herrett E, Shah AD, Boggon R, et al. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study. BMJ. 2013;346:f2350.CrossRefPubMedPubMedCentral
26.
go back to reference Gallagher AM, Williams T, Leufkens HGM, et al. the impact of the choice of data source in record linkage studies estimating mortality in venous thromboembolism. PLoS ONE. 2016;11:e0148349.CrossRefPubMedPubMedCentral Gallagher AM, Williams T, Leufkens HGM, et al. the impact of the choice of data source in record linkage studies estimating mortality in venous thromboembolism. PLoS ONE. 2016;11:e0148349.CrossRefPubMedPubMedCentral
27.
go back to reference Méray N, Reitsma JB, Ravelli ACJ, et al. Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number. J Clin Epidemiol. 2007;60:883.e1–11.CrossRef Méray N, Reitsma JB, Ravelli ACJ, et al. Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number. J Clin Epidemiol. 2007;60:883.e1–11.CrossRef
28.
go back to reference Hippisley-Cox J. Validity and completeness of the NHS Number in primary and secondary care: electronic data in England 1991–2013. Nottingham: Nottingham University; 2015. Hippisley-Cox J. Validity and completeness of the NHS Number in primary and secondary care: electronic data in England 1991–2013. Nottingham: Nottingham University; 2015.
29.
30.
31.
go back to reference Moore CL, Amin J, Gidding HF, et al. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS ONE. 2014;9:e103690.CrossRefPubMedPubMedCentral Moore CL, Amin J, Gidding HF, et al. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS ONE. 2014;9:e103690.CrossRefPubMedPubMedCentral
32.
go back to reference Dregan A, Moller H, Murray-Thomas T, et al. Validity of cancer diagnosis in a primary care database compared with linked cancer registrations in England. Population-based cohort study. Cancer Epidemiol. 2012;36:425–9.CrossRefPubMed Dregan A, Moller H, Murray-Thomas T, et al. Validity of cancer diagnosis in a primary care database compared with linked cancer registrations in England. Population-based cohort study. Cancer Epidemiol. 2012;36:425–9.CrossRefPubMed
33.
go back to reference Baldi I, Ponti A, Zanetti R, et al. The impact of record-linkage bias in the Cox model. J Eval Clin Pract. 2010;16:92–6.CrossRefPubMed Baldi I, Ponti A, Zanetti R, et al. The impact of record-linkage bias in the Cox model. J Eval Clin Pract. 2010;16:92–6.CrossRefPubMed
35.
go back to reference Kelman CW, Bass AJ, Holman CDJ. Research use of linked health data—a best practice protocol. Aust N Z J Public Health. 2002;26:251–5.CrossRefPubMed Kelman CW, Bass AJ, Holman CDJ. Research use of linked health data—a best practice protocol. Aust N Z J Public Health. 2002;26:251–5.CrossRefPubMed
36.
go back to reference Harron K, Wade A, Muller-Pebody B, et al. Opening the black box of record linkage. J Epidemiol Community Health. 2012;66:1198.CrossRefPubMed Harron K, Wade A, Muller-Pebody B, et al. Opening the black box of record linkage. J Epidemiol Community Health. 2012;66:1198.CrossRefPubMed
Metadata
Title
Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications
Authors
Shivani Padmanabhan
Lucy Carty
Ellen Cameron
Rebecca E. Ghosh
Rachael Williams
Helen Strongman
Publication date
01-01-2019
Publisher
Springer Netherlands
Published in
European Journal of Epidemiology / Issue 1/2019
Print ISSN: 0393-2990
Electronic ISSN: 1573-7284
DOI
https://doi.org/10.1007/s10654-018-0442-4

Other articles of this Issue 1/2019

European Journal of Epidemiology 1/2019 Go to the issue