Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2017

Open Access 01-12-2017 | Research article

Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil

Authors: Enny S Paixão, Katie Harron, Kleydson Andrade, Maria Glória Teixeira, Rosemeire L. Fiaccone, Maria da Conceição N. Costa, Laura C. Rodrigues

Published in: BMC Medical Informatics and Decision Making | Issue 1/2017

Login to get access

Abstract

Background

Due to the increasing availability of individual-level information across different electronic datasets, record linkage has become an efficient and important research tool. High quality linkage is essential for producing robust results. The objective of this study was to describe the process of preparing and linking national Brazilian datasets, and to compare the accuracy of different linkage methods for assessing the risk of stillbirth due to dengue in pregnancy.

Methods

We linked mothers and stillbirths in two routinely collected datasets from Brazil for 2009–2010: for dengue in pregnancy, notifications of infectious diseases (SINAN); for stillbirths, mortality (SIM). Since there was no unique identifier, we used probabilistic linkage based on maternal name, age and municipality. We compared two probabilistic approaches, each with two thresholds: 1) a bespoke linkage algorithm; 2) a standard linkage software widely used in Brazil (ReclinkIII), and used manual review to identify further links. Sensitivity and positive predictive value (PPV) were estimated using a subset of gold-standard data created through manual review. We examined the characteristics of false-matches and missed-matches to identify any sources of bias.

Results

From records of 678,999 dengue cases and 62,373 stillbirths, the gold-standard linkage identified 191 cases. The bespoke linkage algorithm with a conservative threshold produced 131 links, with sensitivity = 64.4% (68 missed-matches) and PPV = 92.5% (8 false-matches). Manual review of uncertain links identified an additional 37 links, increasing sensitivity to 83.7%. The bespoke algorithm with a relaxed threshold identified 132 true matches (sensitivity = 69.1%), but introduced 61 false-matches (PPV = 68.4%). ReclinkIII produced lower sensitivity and PPV than the bespoke linkage algorithm. Linkage error was not associated with any recorded study variables.

Conclusion

Despite a lack of unique identifiers for linking mothers and stillbirths, we demonstrate a high standard of linkage of large routine databases from a middle income country. Probabilistic linkage and manual review were essential for accurately identifying cases for a case-control study, but this approach may not be feasible for larger databases or for linkage of more common outcomes.
Literature
2.
go back to reference Jutte DP, Roos LL, Brownell MD. Administrative record linkage as a tool for public health research. Annu Rev Public Health. 2011;32:91–108.CrossRefPubMed Jutte DP, Roos LL, Brownell MD. Administrative record linkage as a tool for public health research. Annu Rev Public Health. 2011;32:91–108.CrossRefPubMed
3.
go back to reference Harron K, Gilbert R, Cromwell D, van der Meulen J. Linking data for mothers and babies in de-identified electronic health data. PLoS One. 2016;11(10):e0164667.CrossRefPubMedPubMedCentral Harron K, Gilbert R, Cromwell D, van der Meulen J. Linking data for mothers and babies in de-identified electronic health data. PLoS One. 2016;11(10):e0164667.CrossRefPubMedPubMedCentral
4.
go back to reference Ford JB, Roberts CL, Taylor LK. Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. Paediatr Perinat Epidemiol. 2006;20:329–37.CrossRefPubMed Ford JB, Roberts CL, Taylor LK. Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. Paediatr Perinat Epidemiol. 2006;20:329–37.CrossRefPubMed
5.
go back to reference Liu C, Cnattingius S, Bergström M, Östberg V, Hjern A. Prenatal parental depression and preterm birth: a national cohort study. BJOG. 2016; n/a-n/a, doi: 10.1111/1471-0528.13891. Liu C, Cnattingius S, Bergström M, Östberg V, Hjern A. Prenatal parental depression and preterm birth: a national cohort study. BJOG. 2016; n/a-n/a, doi: 10.​1111/​1471-0528.​13891.
6.
go back to reference Kamphuis E, et al. Fetal gender of the first born and the recurrent risk of spontaneous preterm birth. Am J Obstet Gynecol. 2015;212:S386.CrossRef Kamphuis E, et al. Fetal gender of the first born and the recurrent risk of spontaneous preterm birth. Am J Obstet Gynecol. 2015;212:S386.CrossRef
7.
go back to reference Fonseca MGP, Coeli CM, Lucena F. De F de a, Veloso VG, Carvalho MS. accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian AIDS surveillance database. Cad Saúde Pública. 2010;26(7):1431–8.CrossRefPubMed Fonseca MGP, Coeli CM, Lucena F. De F de a, Veloso VG, Carvalho MS. accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian AIDS surveillance database. Cad Saúde Pública. 2010;26(7):1431–8.CrossRefPubMed
8.
go back to reference Kariminia A, Butler T, Corben S, Kaldor J, Levy M, Law M. Mortality among prisoners: how accurate is the Australian National Death Index? Aust N Z J Public Health. 2005;29(6):572–5.CrossRefPubMed Kariminia A, Butler T, Corben S, Kaldor J, Levy M, Law M. Mortality among prisoners: how accurate is the Australian National Death Index? Aust N Z J Public Health. 2005;29(6):572–5.CrossRefPubMed
10.
go back to reference Newcombe HB, Kennedy JM, Axford SJ, James AP. Automatic linkage of vital records. In: Record linkage techniques; 1985. Newcombe HB, Kennedy JM, Axford SJ, James AP. Automatic linkage of vital records. In: Record linkage techniques; 1985.
11.
go back to reference Harron K. Evaluating data linkage techniques for the analysis of bloodstream infection in paediatric intensive care (PhD Thesis). University College London; 2014. Harron K. Evaluating data linkage techniques for the analysis of bloodstream infection in paediatric intensive care (PhD Thesis). University College London; 2014.
12.
go back to reference Harron K, Goldstein H, Wade A, Muller-Pebody B, Parslow R, Gilbert R. Linkage, evaluation and analysis of national electronic healthcare data: application to providing enhanced blood-stream infection surveillance in paediatric intensive care. 2013 [cited 13 Oct 2015]; Available from: http://dx.plos.org/10.1371/journal.pone.0085278. Harron K, Goldstein H, Wade A, Muller-Pebody B, Parslow R, Gilbert R. Linkage, evaluation and analysis of national electronic healthcare data: application to providing enhanced blood-stream infection surveillance in paediatric intensive care. 2013 [cited 13 Oct 2015]; Available from: http://​dx.​plos.​org/​10.​1371/​journal.​pone.​0085278.
13.
go back to reference Moore CL, Amin J, Gidding HF, Law MG. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS One. 2014;9:e103690.CrossRefPubMedPubMedCentral Moore CL, Amin J, Gidding HF, Law MG. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS One. 2014;9:e103690.CrossRefPubMedPubMedCentral
17.
go back to reference Levenshtein VI. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Dokl. 1966;10:707–10. Levenshtein VI. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Dokl. 1966;10:707–10.
18.
go back to reference de Camargo KR Jr, Coeli CM. Reclink: aplicativo para o relacionamento de bases de dados, implementando o método probabilistic record linkage. Cad Saúde Pública. 2000;16(2):439–47.CrossRefPubMed de Camargo KR Jr, Coeli CM. Reclink: aplicativo para o relacionamento de bases de dados, implementando o método probabilistic record linkage. Cad Saúde Pública. 2000;16(2):439–47.CrossRefPubMed
19.
go back to reference Kelman CW, Bass AJ, Holman CDJ. Research use of linked health data—a best practice protocol. Aust N Z J Public Health. 2002;26(3):251–5.CrossRefPubMed Kelman CW, Bass AJ, Holman CDJ. Research use of linked health data—a best practice protocol. Aust N Z J Public Health. 2002;26(3):251–5.CrossRefPubMed
20.
go back to reference TLN d S, Klein CH, da Rocha Nogueira A, LHA S, NAS e S, Bloch KV. Cardiovascular mortality among a cohort of hypertensive and normotensives in Rio de Janeiro-Brazil-1991–2009. BMC Public Health. 2015;15(1):1.CrossRef TLN d S, Klein CH, da Rocha Nogueira A, LHA S, NAS e S, Bloch KV. Cardiovascular mortality among a cohort of hypertensive and normotensives in Rio de Janeiro-Brazil-1991–2009. BMC Public Health. 2015;15(1):1.CrossRef
21.
go back to reference Coutinho ESF, Coeli CM. Accuracy of the probabilistic record linkage methodology to ascertain deaths in survival studies. Cad Saúde Pública. 2006;22(10):2249–52.CrossRefPubMed Coutinho ESF, Coeli CM. Accuracy of the probabilistic record linkage methodology to ascertain deaths in survival studies. Cad Saúde Pública. 2006;22(10):2249–52.CrossRefPubMed
22.
go back to reference De Oliveira GP, Bierrenbach AL de S, de Camargo KR, Coeli CM, Pinheiro RS. Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis. Revista de Saúde Pública. 2016;50:49. doi:10.1590/S1518-8787.2016050006327. De Oliveira GP, Bierrenbach AL de S, de Camargo KR, Coeli CM, Pinheiro RS. Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis. Revista de Saúde Pública. 2016;50:49. doi:10.​1590/​S1518-8787.​2016050006327.
23.
go back to reference Coutinho RG, da M, Coeli CM, Faerstein E, Chor D. Sensitivity of probabilistic record linkage for reported birth identification: Pró-Saúde study. Rev Saude Publica. 2008;42(6):1097–100.CrossRefPubMed Coutinho RG, da M, Coeli CM, Faerstein E, Chor D. Sensitivity of probabilistic record linkage for reported birth identification: Pró-Saúde study. Rev Saude Publica. 2008;42(6):1097–100.CrossRefPubMed
24.
go back to reference Freire SM, Gonçalves R de CB, Bandarra AC, Villela MGT, Meire A, Cabral MDB, et al. Análise da efetividade de comparadores de strings para discriminar pares verdadeiros de pares falsos no relacionamento de registro. In: Anais do IX Workshop de Informática Médica XXIX Congresso da Sociedade Brasileira de Computação–IX Workshop de Informática Médica Bento Gonçalves: Sociedade Brasileira de Computação [Internet]. 2009 [cited 24 Nov 2016]. p. 2119–2128. Freire SM, Gonçalves R de CB, Bandarra AC, Villela MGT, Meire A, Cabral MDB, et al. Análise da efetividade de comparadores de strings para discriminar pares verdadeiros de pares falsos no relacionamento de registro. In: Anais do IX Workshop de Informática Médica XXIX Congresso da Sociedade Brasileira de Computação–IX Workshop de Informática Médica Bento Gonçalves: Sociedade Brasileira de Computação [Internet]. 2009 [cited 24 Nov 2016]. p. 2119–2128.
25.
go back to reference Grannis S, Overhage J, McDonald C. Real world performance of approximate string comparators for use in patient matching. Stud Health Technol Inform. 2004;107:43–7.PubMed Grannis S, Overhage J, McDonald C. Real world performance of approximate string comparators for use in patient matching. Stud Health Technol Inform. 2004;107:43–7.PubMed
26.
go back to reference Paixao Es, Costa MCN, Teixeira MG, Harron K, Almeida MF, Barreto ML, Rodrigues LC. Symptomatic dengue during pregnancy and the risk of stillbirth: a matched case control study using routine data in Brazil (2006–2012). Lancet Infect Dis. 2017. (in press). Paixao Es, Costa MCN, Teixeira MG, Harron K, Almeida MF, Barreto ML, Rodrigues LC. Symptomatic dengue during pregnancy and the risk of stillbirth: a matched case control study using routine data in Brazil (2006–2012). Lancet Infect Dis. 2017. (in press).
27.
go back to reference Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H. Evaluating bias due to data linkage error in electronic healthcare records. BMC Med Res Methodol. 2014;14:36.CrossRefPubMedPubMedCentral Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H. Evaluating bias due to data linkage error in electronic healthcare records. BMC Med Res Methodol. 2014;14:36.CrossRefPubMedPubMedCentral
Metadata
Title
Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil
Authors
Enny S Paixão
Katie Harron
Kleydson Andrade
Maria Glória Teixeira
Rosemeire L. Fiaccone
Maria da Conceição N. Costa
Laura C. Rodrigues
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2017
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-017-0506-5

Other articles of this Issue 1/2017

BMC Medical Informatics and Decision Making 1/2017 Go to the issue