Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2009

Open Access 01-12-2009 | Research article

The SAIL databank: linking multiple health and social care datasets

Authors: Ronan A Lyons, Kerina H Jones, Gareth John, Caroline J Brooks, Jean-Philippe Verplancke, David V Ford, Ginevra Brown, Ken Leake

Published in: BMC Medical Informatics and Decision Making | Issue 1/2009

Login to get access

Abstract

Background

Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress.

Methods

Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique.

Results

The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were < 0.2%. A range of techniques for matching datasets to the NHSAR were applied and the optimum technique resulted in sensitivity values of: 99.9% for a GP dataset from primary care, 99.3% for a PEDW dataset from secondary care and 95.2% for the PARIS database from social care.

Conclusion

With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.
Appendix
Available only for authorised users
Literature
1.
go back to reference World Health Organization (WHO): Building foundations for e-health: progress of member states. Geneva. 2006 World Health Organization (WHO): Building foundations for e-health: progress of member states. Geneva. 2006
2.
go back to reference Black N: High-quality clinical databases: breaking down barriers. Lancet. 1999, 353: 1205-1206. 10.1016/S0140-6736(99)00108-7.CrossRefPubMed Black N: High-quality clinical databases: breaking down barriers. Lancet. 1999, 353: 1205-1206. 10.1016/S0140-6736(99)00108-7.CrossRefPubMed
6.
go back to reference Ford DV, Jones KH, Verplancke J-P, John G, Brown G, Lyons RA, Brooks C, Bodger O, Couch T, Leake K: The SAIL Programme: building a national architecture for e-health research and evaluation. Ford DV, Jones KH, Verplancke J-P, John G, Brown G, Lyons RA, Brooks C, Bodger O, Couch T, Leake K: The SAIL Programme: building a national architecture for e-health research and evaluation.
7.
go back to reference Goldacre MJ: The value of linked data for policy development, strategic planning, clinical practice and public health – an international perspective. Symposium on Health Data Linkage. Edited by: Glover J. 2003, Adelaide University: Public Health Information Development Unit Goldacre MJ: The value of linked data for policy development, strategic planning, clinical practice and public health – an international perspective. Symposium on Health Data Linkage. Edited by: Glover J. 2003, Adelaide University: Public Health Information Development Unit
8.
go back to reference Black N: Secondary use of personal data for health and health services research: why identifiable data are essential. J Health Serv Res Policy. 2003, 8 (suppl 1): 36-40. 10.1258/135581903766468873.CrossRefPubMed Black N: Secondary use of personal data for health and health services research: why identifiable data are essential. J Health Serv Res Policy. 2003, 8 (suppl 1): 36-40. 10.1258/135581903766468873.CrossRefPubMed
9.
go back to reference The West of Scotland Coronary Prevention Study Group: Computerised record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study. Clin Epidemiol. 1995, 48 (12): 1441-1452. 10.1016/0895-4356(95)00530-7.CrossRef The West of Scotland Coronary Prevention Study Group: Computerised record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study. Clin Epidemiol. 1995, 48 (12): 1441-1452. 10.1016/0895-4356(95)00530-7.CrossRef
11.
go back to reference Blakely T, Salmond C: Probabilistic record linkage and a method to calculate the positive predictive value. International Journal of Epidemiology. 2002, 31: 1246-1252. 10.1093/ije/31.6.1246.CrossRefPubMed Blakely T, Salmond C: Probabilistic record linkage and a method to calculate the positive predictive value. International Journal of Epidemiology. 2002, 31: 1246-1252. 10.1093/ije/31.6.1246.CrossRefPubMed
12.
go back to reference Méray N, Reitsma JB, Ravelli ACJ, Bonsel GJ: Probablistic record linkage is a valid and transparent tool to combine databases without a patient identification number. Journal of Clinical Epidemiology. 2007, 60: 883-891. 10.1016/j.jclinepi.2006.11.021.CrossRefPubMed Méray N, Reitsma JB, Ravelli ACJ, Bonsel GJ: Probablistic record linkage is a valid and transparent tool to combine databases without a patient identification number. Journal of Clinical Epidemiology. 2007, 60: 883-891. 10.1016/j.jclinepi.2006.11.021.CrossRefPubMed
14.
go back to reference Zobel J, Dart P: Finding approximate matches in large lexicons. Software Practice and Experience. 1995, 25 (3): 331-345. 10.1002/spe.4380250307.CrossRef Zobel J, Dart P: Finding approximate matches in large lexicons. Software Practice and Experience. 1995, 25 (3): 331-345. 10.1002/spe.4380250307.CrossRef
16.
go back to reference Karmel R, Gibson D: Event-based record linkage in health and aged care services data: a methodological innovation. BMC Health Services Research. 2007, 7 (154): 1-16. Karmel R, Gibson D: Event-based record linkage in health and aged care services data: a methodological innovation. BMC Health Services Research. 2007, 7 (154): 1-16.
17.
go back to reference Grannis SJ, Overhage JM, McDonald CJ: Analysis of identifier performance using a deterministic linkage algorithm. Proc AMIA Symp. 2002, 305-309. Grannis SJ, Overhage JM, McDonald CJ: Analysis of identifier performance using a deterministic linkage algorithm. Proc AMIA Symp. 2002, 305-309.
18.
go back to reference Grannis SJ, Overhage JM, Hui S, McDonald CJ: Analysis of a probabilistic record linkage technique without human review. AMIA Annu Symp Proc. 2003, 259-263. Grannis SJ, Overhage JM, Hui S, McDonald CJ: Analysis of a probabilistic record linkage technique without human review. AMIA Annu Symp Proc. 2003, 259-263.
19.
go back to reference Jamieson E, Roberts J, Browne G: The feasibility and accuracy of anonymized record linkage to estimate shared clientele among three health and social service agencies. Methods Inf Med. 1995, 34 (4): 371-377.PubMed Jamieson E, Roberts J, Browne G: The feasibility and accuracy of anonymized record linkage to estimate shared clientele among three health and social service agencies. Methods Inf Med. 1995, 34 (4): 371-377.PubMed
20.
go back to reference Ramsay CR, Campbell MK, Glazener CM: Linking Community Health Index and Scottish Morbidty Records for neonates: the Grampian experience. Health Bulletin (Edinburgh). 1999, 57 (1): 70-75. Ramsay CR, Campbell MK, Glazener CM: Linking Community Health Index and Scottish Morbidty Records for neonates: the Grampian experience. Health Bulletin (Edinburgh). 1999, 57 (1): 70-75.
21.
go back to reference Campbell KM, Deck D, Krupski A: Record linkage software in the public domain: a comparison of Link Plus, The Link King and a 'basic' deterministic algorithm. Health Informatics Journal. 2008, 14: 5-15. 10.1177/1460458208088855.CrossRefPubMed Campbell KM, Deck D, Krupski A: Record linkage software in the public domain: a comparison of Link Plus, The Link King and a 'basic' deterministic algorithm. Health Informatics Journal. 2008, 14: 5-15. 10.1177/1460458208088855.CrossRefPubMed
Metadata
Title
The SAIL databank: linking multiple health and social care datasets
Authors
Ronan A Lyons
Kerina H Jones
Gareth John
Caroline J Brooks
Jean-Philippe Verplancke
David V Ford
Ginevra Brown
Ken Leake
Publication date
01-12-2009
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2009
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/1472-6947-9-3

Other articles of this Issue 1/2009

BMC Medical Informatics and Decision Making 1/2009 Go to the issue