Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

Open Access 01-12-2020 | Research article

Examining the quality of record linkage process using nationwide Brazilian administrative databases to build a large birth cohort

Authors: Daniela Almeida, David Gorender, Maria Yury Ichihara, Samila Sena, Luan Menezes, George C. G. Barbosa, Rosimeire L. Fiaccone, Enny S. Paixão, Robespierre Pita, Mauricio L. Barreto

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

Research using linked routine population-based data collected for non-research purposes has increased in recent years because they are a rich and detailed source of information. The objective of this study is to present an approach to prepare and link data from administrative sources in a middle-income country, to estimate its quality and to identify potential sources of bias by comparing linked and non-linked individuals.

Methods

We linked two administrative datasets with data covering the period 2001 to 2015, using maternal attributes (name, age, date of birth, and municipally of residence) from Brazil: live birth information system and the 100 Million Brazilian Cohort (created using administrative records from over 114 million individuals whose families applied for social assistance via the Unified Register for Social Programmes) implementing an in house developed linkage tool CIDACS-RL. We then estimated the proportion of highly probably link and examined the characteristics of missed-matches to identify any potential source of bias.

Results

A total of 27,699,891 live births were submited to linkage with maternal information recorded in the baseline of the 100 Million Brazilian Cohort dataset of those, 16,447,414 (59.4%) children were found registered in the 100 Million Brazilian Cohort dataset. The proportion of highly probably link ranged from 39.3% in 2001 to 82.1% in 2014. A substantial improvement in the linkage after the introduction of maternal date of birth attribute, in 2011, was observed. Our analyses indicated a slightly higher proportion of missing data among missed matches and a higher proportion of people living in an urban area and self-declared as Caucasian among linked pairs when compared with non-linked sets.

Discussion

We demonstrated that CIDACS-RL is capable of performing high quality linkage even with a limited number of common attributes, using indexation as a blocking strategy in larg e routine databases from a middle-income country. However, residual records occurred more among people under worse living conditions. The results presented in this study reinforce the need of evaluating linkage quality and when necessary to take linkage error into account for the analyses of any generated dataset.
Literature
14.
go back to reference São Paulo (cidade). Secretaria Municipal da Saúde. Coordenação de Epidemiologia e Informação – CEInfo. Declaração de Nascido Vivo. Manual de preenchimento da Declaração de Nascido Vivo. São Paulo: Secretaria Municipal da Saúde; 2011. p. 24. São Paulo (cidade). Secretaria Municipal da Saúde. Coordenação de Epidemiologia e Informação – CEInfo. Declaração de Nascido Vivo. Manual de preenchimento da Declaração de Nascido Vivo. São Paulo: Secretaria Municipal da Saúde; 2011. p. 24.
15.
go back to reference Oliveira MM, Andrade SSCA, Dimech GS, et al. Avaliação do Sistema de Informações sobre nascidos vivos. Brasil, 2006 a 2010. Epidemiol. E Serviços Saúde. 2015;24:629–40. Oliveira MM, Andrade SSCA, Dimech GS, et al. Avaliação do Sistema de Informações sobre nascidos vivos. Brasil, 2006 a 2010. Epidemiol. E Serviços Saúde. 2015;24:629–40.
16.
go back to reference de Barros RP, de Carvalho M, Mendonça R. Sobre as utilidades do Cadastro Único. Texto para discussão no 1414; 2009. de Barros RP, de Carvalho M, Mendonça R. Sobre as utilidades do Cadastro Único. Texto para discussão no 1414; 2009.
18.
go back to reference Barbosa GCG, et al. CIDACS-RL: A novel search engine-based record linkage system for huge datasets with high accuracy and scalability. In: Pharmaco Epidemiology and Drug Safety. Hoboken: Wiley; 2019. p. 118. Barbosa GCG, et al. CIDACS-RL: A novel search engine-based record linkage system for huge datasets with high accuracy and scalability. In: Pharmaco Epidemiology and Drug Safety. Hoboken: Wiley; 2019. p. 118.
19.
go back to reference Yancey WE. Evaluating string comparator performance for record linkage. Stat Res Div. 2005;1:3905–12. Yancey WE. Evaluating string comparator performance for record linkage. Stat Res Div. 2005;1:3905–12.
Metadata
Title
Examining the quality of record linkage process using nationwide Brazilian administrative databases to build a large birth cohort
Authors
Daniela Almeida
David Gorender
Maria Yury Ichihara
Samila Sena
Luan Menezes
George C. G. Barbosa
Rosimeire L. Fiaccone
Enny S. Paixão
Robespierre Pita
Mauricio L. Barreto
Publication date
01-12-2020
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-020-01192-0

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue