Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2017

Open Access 01-12-2017 | Technical advance

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Authors: Kassaye Yitbarek Yigzaw, Antonis Michalas, Johan Gustav Bellika

Published in: BMC Medical Informatics and Decision Making | Issue 1/2017

Login to get access

Abstract

Background

Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step.

Methods

We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network.

Results

The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem.

Conclusions

The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.
Appendix
Available only for authorised users
Footnotes
1
Commutative encryption is a form of encryption in which the order of the consecutive encryption and decryption of a value with different cryptographic keys does not affect the final result and no two values have the same encrypted value [69].
 
2
The protocols are different from the protocol proposed in [70, 71] for probabilistic record linkage. In Schnell et al.’s protocol [70], for each record, each identifier is encoded as a separate Bloom filter, whereas in Durham et al.’s protocol [71], to avoid frequency-based cryptanalysis, the set of identifiers of each record is encoded as a Bloom filter.
 
3
Secret sharing is a method by which a secret value is split into shares and a predefined number of shares are required to reconstruct the secret value [30].
 
4
OT is a method for two parties to exchange one of several values in which the sender is oblivious to which value is selected, while the receiver learns only the selected value [32].
 
5
The Count-Min sketch [72] is, similar to the counting Bloom filter (see the description of the counting Bloom filter in the Methods section), a space-efficient probabilistic data structure for encoding a set of elements that allows querying the frequencies of the occurrence of the inserted elements with some error.
 
Literature
1.
go back to reference Ross MK, Wei W, Ohno-Machado L. “Big data” and the electronic health record. IMIA Yearb. 2014;9:97–104.CrossRef Ross MK, Wei W, Ohno-Machado L. “Big data” and the electronic health record. IMIA Yearb. 2014;9:97–104.CrossRef
2.
go back to reference Kohane IS, Drazen JM, Campion EW. A glimpse of the next 100 years in medicine. N Engl J Med. 2012;367:2538–9.CrossRefPubMed Kohane IS, Drazen JM, Campion EW. A glimpse of the next 100 years in medicine. N Engl J Med. 2012;367:2538–9.CrossRefPubMed
3.
go back to reference Geissbuhler A, Safran C, Buchan I, Bellazzi R, Labkoff S, Eilenberg K, et al. Trustworthy reuse of health data: a transnational perspective. Int J Med Inf. 2013;82:1–9.CrossRef Geissbuhler A, Safran C, Buchan I, Bellazzi R, Labkoff S, Eilenberg K, et al. Trustworthy reuse of health data: a transnational perspective. Int J Med Inf. 2013;82:1–9.CrossRef
4.
go back to reference Hripcsak G, Bloomrosen M, FlatelyBrennan P, Chute CG, Cimino J, Detmer DE, et al. Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA’s 2012 Health Policy Meeting. J Am Med Inform Assoc. 2013;21:204–11.CrossRefPubMedPubMedCentral Hripcsak G, Bloomrosen M, FlatelyBrennan P, Chute CG, Cimino J, Detmer DE, et al. Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA’s 2012 Health Policy Meeting. J Am Med Inform Assoc. 2013;21:204–11.CrossRefPubMedPubMedCentral
5.
go back to reference Lober WB, Thomas Karras B, Wagner MM, Marc Overhage J, Davidson AJ, Fraser H, et al. Roundtable on bioterrorism detection: information system–based surveillance. J Am Med Inform Assoc. 2002;9:105–15.CrossRefPubMedPubMedCentral Lober WB, Thomas Karras B, Wagner MM, Marc Overhage J, Davidson AJ, Fraser H, et al. Roundtable on bioterrorism detection: information system–based surveillance. J Am Med Inform Assoc. 2002;9:105–15.CrossRefPubMedPubMedCentral
7.
go back to reference El Emam K, Hu J, Mercer J, Peyton L, Kantarcioglu M, Malin B, et al. A secure protocol for protecting the identity of providers when disclosing data for disease surveillance. J Am Med Inform Assoc. 2011;18:212–7.CrossRefPubMedPubMedCentral El Emam K, Hu J, Mercer J, Peyton L, Kantarcioglu M, Malin B, et al. A secure protocol for protecting the identity of providers when disclosing data for disease surveillance. J Am Med Inform Assoc. 2011;18:212–7.CrossRefPubMedPubMedCentral
8.
9.
go back to reference Holmes JH, Elliott TE, Brown JS, Raebel MA, Davidson A, Nelson AF, et al. Clinical research data warehouse governance for distributed research networks in the USA: a systematic review of the literature. J Am Med Inform Assoc. 2014;21:730–6.CrossRefPubMedPubMedCentral Holmes JH, Elliott TE, Brown JS, Raebel MA, Davidson A, Nelson AF, et al. Clinical research data warehouse governance for distributed research networks in the USA: a systematic review of the literature. J Am Med Inform Assoc. 2014;21:730–6.CrossRefPubMedPubMedCentral
10.
go back to reference Finnell JT, Overhage JM, Grannis S. All health care is not local: an evaluation of the distribution of emergency department care delivered in Indiana. AMIA Annu Symp Proc. 2011;2011:409–16.PubMedPubMedCentral Finnell JT, Overhage JM, Grannis S. All health care is not local: an evaluation of the distribution of emergency department care delivered in Indiana. AMIA Annu Symp Proc. 2011;2011:409–16.PubMedPubMedCentral
11.
go back to reference Gichoya J, Gamache RE, Vreeman DJ, Dixon BE, Finnell JT, Grannis S. An evaluation of the rates of repeat notifiable disease reporting and patient crossover using a health information exchange-based automated electronic laboratory reporting system. AMIA Annu Symp Proc. 2012;2012:1229–36.PubMedPubMedCentral Gichoya J, Gamache RE, Vreeman DJ, Dixon BE, Finnell JT, Grannis S. An evaluation of the rates of repeat notifiable disease reporting and patient crossover using a health information exchange-based automated electronic laboratory reporting system. AMIA Annu Symp Proc. 2012;2012:1229–36.PubMedPubMedCentral
12.
13.
14.
go back to reference Laurie G, Jones KH, Stevens L, Dobbs C. A review of evidence relating to harm resulting from uses of health and biomedical data [Internet]. The Nuffield Council on Bioethics (NCOB); 2014 Jun p. 210. Available from: http://nuffieldbioethics.org/wp-content/uploads/FINAL-Report-on-Harms-Arising-from-Use-of-Health-and-Biomedical-Data-30-JUNE-2014.pdf Laurie G, Jones KH, Stevens L, Dobbs C. A review of evidence relating to harm resulting from uses of health and biomedical data [Internet]. The Nuffield Council on Bioethics (NCOB); 2014 Jun p. 210. Available from: http://​nuffieldbioethic​s.​org/​wp-content/​uploads/​FINAL-Report-on-Harms-Arising-from-Use-of-Health-and-Biomedical-Data-30-JUNE-2014.​pdf
15.
go back to reference Du W, Atallah MJ. Privacy-preserving cooperative statistical analysis. In: Williams AD, editor. Comput. Secur. Appl. Conf. 2001 ACSAC 2001 Proc. 17th Annu. IEEE. 2001. p. 102–10. Du W, Atallah MJ. Privacy-preserving cooperative statistical analysis. In: Williams AD, editor. Comput. Secur. Appl. Conf. 2001 ACSAC 2001 Proc. 17th Annu. IEEE. 2001. p. 102–10.
16.
go back to reference Du W, Han YS, Chen S. Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Berry MW, editor. Proc. Fourth SIAM Int. Conf. Data Min. SIAM. 2004. p. 222–33. Du W, Han YS, Chen S. Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Berry MW, editor. Proc. Fourth SIAM Int. Conf. Data Min. SIAM. 2004. p. 222–33.
17.
go back to reference Kantarcioglu M. A survey of privacy-preserving methods across horizontally partitioned data. In: Aggarwal CC, Yu PS, editors. Priv.-Preserv. Data Min. New York: Springer; 2008. p. 313–35.CrossRef Kantarcioglu M. A survey of privacy-preserving methods across horizontally partitioned data. In: Aggarwal CC, Yu PS, editors. Priv.-Preserv. Data Min. New York: Springer; 2008. p. 313–35.CrossRef
18.
go back to reference Vaidya J. A survey of privacy-preserving methods across vertically partitioned data. In: Aggarwal CC, Yu PS, editors. Priv.-Preserv. Data Min. New York: Springer; 2008. p. 337–58.CrossRef Vaidya J. A survey of privacy-preserving methods across vertically partitioned data. In: Aggarwal CC, Yu PS, editors. Priv.-Preserv. Data Min. New York: Springer; 2008. p. 337–58.CrossRef
19.
go back to reference Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY. Tools for privacy preserving distributed data mining. ACM SIGKDD Explor Newsl. 2002;4:28–34.CrossRef Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY. Tools for privacy preserving distributed data mining. ACM SIGKDD Explor Newsl. 2002;4:28–34.CrossRef
20.
go back to reference Hailemichael MA, Yigzaw KY, Bellika JG. Emnet: a tool for privacy-preserving statistical computing on distributed health data. In: Granja C, Budrionis A, editors. Proc. 13th Scand. Conf. Health Inform. Linköping: Linköping University Electronic Press; 2015. p. 33–40. Hailemichael MA, Yigzaw KY, Bellika JG. Emnet: a tool for privacy-preserving statistical computing on distributed health data. In: Granja C, Budrionis A, editors. Proc. 13th Scand. Conf. Health Inform. Linköping: Linköping University Electronic Press; 2015. p. 33–40.
21.
go back to reference Andersen A, Yigzaw KY, Karlsen R. Privacy preserving health data processing. IEEE 16th Int. Conf. E-Health Netw. Appl. Serv. Heal. IEEE; 2014. p. 225–30 Andersen A, Yigzaw KY, Karlsen R. Privacy preserving health data processing. IEEE 16th Int. Conf. E-Health Netw. Appl. Serv. Heal. IEEE; 2014. p. 225–30
22.
go back to reference Vatsalan D, Christen P, Verykios VS. A taxonomy of privacy-preserving record linkage techniques. Inf Syst. 2013;38:946–69.CrossRef Vatsalan D, Christen P, Verykios VS. A taxonomy of privacy-preserving record linkage techniques. Inf Syst. 2013;38:946–69.CrossRef
23.
go back to reference Pinkas B, Schneider T, Zohner M. Faster private set intersection based on OT extension. In: Fu K, Jung J, editors. Proc. 23rd USENIX Secur. Symp. San Diego: USENIX Association; 2014. p. 797–812. Pinkas B, Schneider T, Zohner M. Faster private set intersection based on OT extension. In: Fu K, Jung J, editors. Proc. 23rd USENIX Secur. Symp. San Diego: USENIX Association; 2014. p. 797–812.
24.
go back to reference Quantin C, Bouzelat H, Allaert FAA, Benhamiche AM, Faivre J, Dusserre L. How to ensure data security of an epidemiological follow-up:quality assessment of an anonymous record linkage procedure. Int J Med Inf. 1998;49:117–22.CrossRef Quantin C, Bouzelat H, Allaert FAA, Benhamiche AM, Faivre J, Dusserre L. How to ensure data security of an epidemiological follow-up:quality assessment of an anonymous record linkage procedure. Int J Med Inf. 1998;49:117–22.CrossRef
25.
go back to reference Agrawal R, Evfimievski A, Srikant R. Information sharing across private databases. Proc. 2003 ACM SIGMOD Int. Conf. Manag. Data. New York, NY, USA: ACM; 2003. p. 86–97 Agrawal R, Evfimievski A, Srikant R. Information sharing across private databases. Proc. 2003 ACM SIGMOD Int. Conf. Manag. Data. New York, NY, USA: ACM; 2003. p. 86–97
26.
go back to reference El Emam K, Samet S, Hu J, Peyton L, Earle C, Jayaraman GC, et al. A protocol for the secure linking of registries for HPV surveillance. PLoS One. 2012;7:e39915.CrossRefPubMedPubMedCentral El Emam K, Samet S, Hu J, Peyton L, Earle C, Jayaraman GC, et al. A protocol for the secure linking of registries for HPV surveillance. PLoS One. 2012;7:e39915.CrossRefPubMedPubMedCentral
27.
go back to reference Adam N, White T, Shafiq B, Vaidya J, He X. Privacy preserving integration of health care data. AMIA Annu. Symp. Proc. 2007. 2007. p. 1–5. Adam N, White T, Shafiq B, Vaidya J, He X. Privacy preserving integration of health care data. AMIA Annu. Symp. Proc. 2007. 2007. p. 1–5.
28.
go back to reference Lai PK, Yiu S-M, Chow KP, Chong CF, Hui LCK. An efficient bloom filter based solution for multiparty private matching. Secur. Manag. 2006. p. 286–292 Lai PK, Yiu S-M, Chow KP, Chong CF, Hui LCK. An efficient bloom filter based solution for multiparty private matching. Secur. Manag. 2006. p. 286–292
29.
go back to reference Many D, Burkhart M, Dimitropoulos X. Fast private set operations with SEPIA. Technical report, ETH Zurich; 2012 Many D, Burkhart M, Dimitropoulos X. Fast private set operations with SEPIA. Technical report, ETH Zurich; 2012
30.
go back to reference Beimel A. Secret-sharing schemes: a survey. In: Chee YM, Guo Z, Shao F, Tang Y, Wang H, Xing C, editors. Coding Cryptol. Berlin: Springer; 2011. p. 11–46.CrossRef Beimel A. Secret-sharing schemes: a survey. In: Chee YM, Guo Z, Shao F, Tang Y, Wang H, Xing C, editors. Coding Cryptol. Berlin: Springer; 2011. p. 11–46.CrossRef
31.
go back to reference Dong C, Chen L, Wen Z. When private set intersection meets big data: an efficient and scalable protocol. Proc. 2013 ACM SIGSAC Conf. Comput. Commun. Secur. New York, NY, USA: ACM; 2013. p. 789–800 Dong C, Chen L, Wen Z. When private set intersection meets big data: an efficient and scalable protocol. Proc. 2013 ACM SIGSAC Conf. Comput. Commun. Secur. New York, NY, USA: ACM; 2013. p. 789–800
32.
go back to reference Kilian J. Founding crytpography on oblivious transfer. Proc. Twent. Annu. ACM Symp. Theory Comput. New York, NY, USA: ACM; 1988. p. 20–31. Kilian J. Founding crytpography on oblivious transfer. Proc. Twent. Annu. ACM Symp. Theory Comput. New York, NY, USA: ACM; 1988. p. 20–31.
33.
go back to reference Karapiperis D, Vatsalan D, Verykios VS, Christen P. Large-scale multi-party counting set intersection using a space efficient global synopsis. In: Renz M, Shahabi C, Zhou X, Cheema MA, editors. Database Syst. Adv. Appl. Springer International Publishing; 2015. p. 329–45. Karapiperis D, Vatsalan D, Verykios VS, Christen P. Large-scale multi-party counting set intersection using a space efficient global synopsis. In: Renz M, Shahabi C, Zhou X, Cheema MA, editors. Database Syst. Adv. Appl. Springer International Publishing; 2015. p. 329–45.
34.
go back to reference Paillier P. Public-key cryptosystems based on composite degree residuosity classes. In: Stern J, editor. Adv. Cryptol. — EUROCRYPT’99. Berlin: Springer; 1999. p. 223–38. Paillier P. Public-key cryptosystems based on composite degree residuosity classes. In: Stern J, editor. Adv. Cryptol. — EUROCRYPT’99. Berlin: Springer; 1999. p. 223–38.
35.
go back to reference Karr AF, Lin X, Sanil AP, Reiter JP. Secure regression on distributed databases. J Comput Graph Stat. 2005;14:263–79.CrossRef Karr AF, Lin X, Sanil AP, Reiter JP. Secure regression on distributed databases. J Comput Graph Stat. 2005;14:263–79.CrossRef
36.
go back to reference Bellika JG, Henriksen TS, Yigzaw KY. The Snow system - a decentralized medical data processing system. In: Llatas CF, García-Gómez JM, editors. Data Min. Clin. Med. Springer; 2014 Bellika JG, Henriksen TS, Yigzaw KY. The Snow system - a decentralized medical data processing system. In: Llatas CF, García-Gómez JM, editors. Data Min. Clin. Med. Springer; 2014
37.
go back to reference Stewart BA, Fernandes S, Rodriguez-Huertas E, Landzberg M. A preliminary look at duplicate testing associated with lack of electronic health record interoperability for transferred patients. J Am Med Inform Assoc JAMIA. 2010;17:341–4.CrossRefPubMed Stewart BA, Fernandes S, Rodriguez-Huertas E, Landzberg M. A preliminary look at duplicate testing associated with lack of electronic health record interoperability for transferred patients. J Am Med Inform Assoc JAMIA. 2010;17:341–4.CrossRefPubMed
38.
go back to reference Lazarus R, Kleinman KP, Dashevsky I, DeMaria A, Platt R. Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): the example of lower respiratory infection. BMC Public Health. 2001;1:1.CrossRef Lazarus R, Kleinman KP, Dashevsky I, DeMaria A, Platt R. Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): the example of lower respiratory infection. BMC Public Health. 2001;1:1.CrossRef
39.
40.
go back to reference Curtis LH, Weiner MG, Boudreau DM, Cooper WO, Daniel GW, Nair VP, et al. Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21:23–31.CrossRefPubMed Curtis LH, Weiner MG, Boudreau DM, Cooper WO, Daniel GW, Nair VP, et al. Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21:23–31.CrossRefPubMed
41.
go back to reference Weber GM, Murphy SN, McMurry AJ, MacFadden D, Nigrin DJ, Churchill S, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16:624–30.CrossRefPubMedPubMedCentral Weber GM, Murphy SN, McMurry AJ, MacFadden D, Nigrin DJ, Churchill S, et al. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc. 2009;16:624–30.CrossRefPubMedPubMedCentral
42.
go back to reference El Emam K, Mercer J, Moreau K, Grava-Gubins I, Buckeridge D, Jonker E. Physician privacy concerns when disclosing patient data for public health purposes during a pandemic influenza outbreak. BMC Public Health. 2011;11:454.CrossRefPubMedPubMedCentral El Emam K, Mercer J, Moreau K, Grava-Gubins I, Buckeridge D, Jonker E. Physician privacy concerns when disclosing patient data for public health purposes during a pandemic influenza outbreak. BMC Public Health. 2011;11:454.CrossRefPubMedPubMedCentral
43.
go back to reference Lindell Y, Pinkas B. Secure multiparty computation for privacy-preserving data mining. J Priv Confidentiality. 2009;1:5. Lindell Y, Pinkas B. Secure multiparty computation for privacy-preserving data mining. J Priv Confidentiality. 2009;1:5.
45.
go back to reference Cramer R, Damgård I. Multiparty computation, an introduction. In: Castellet M, editor. Contemp. Cryptol. Basel: Birkhäuser Basel; 2005. p. 41–87.CrossRef Cramer R, Damgård I. Multiparty computation, an introduction. In: Castellet M, editor. Contemp. Cryptol. Basel: Birkhäuser Basel; 2005. p. 41–87.CrossRef
46.
go back to reference Goldreich O. Foundations of cryptography: basic applications. 1st ed. New York: Cambridge University Press; 2004.CrossRef Goldreich O. Foundations of cryptography: basic applications. 1st ed. New York: Cambridge University Press; 2004.CrossRef
47.
go back to reference Vaidya J, Clifton C. Leveraging the “Multi” in secure multi-party computation. Proc. 2003 ACM Workshop Priv. Electron. Soc. New York, NY, USA: ACM; 2003. p. 53–9 Vaidya J, Clifton C. Leveraging the “Multi” in secure multi-party computation. Proc. 2003 ACM Workshop Priv. Electron. Soc. New York, NY, USA: ACM; 2003. p. 53–9
48.
go back to reference Bloom BH. Space/time trade-offs in hash coding with allowable errors. Commun ACM. 1970;13:422–6.CrossRef Bloom BH. Space/time trade-offs in hash coding with allowable errors. Commun ACM. 1970;13:422–6.CrossRef
49.
go back to reference Tarkoma S, Rothenberg CE, Lagerspetz E. Theory and practice of bloom filters for distributed systems. Commun Surv Tutor IEEE. 2012;14:131–55.CrossRef Tarkoma S, Rothenberg CE, Lagerspetz E. Theory and practice of bloom filters for distributed systems. Commun Surv Tutor IEEE. 2012;14:131–55.CrossRef
50.
go back to reference Fan L, Cao P, Almeida J, Broder AZ. Summary cache: a scalable wide-area Web cache sharing protocol. IEEE ACM Trans Netw. 2000;8:281–93.CrossRef Fan L, Cao P, Almeida J, Broder AZ. Summary cache: a scalable wide-area Web cache sharing protocol. IEEE ACM Trans Netw. 2000;8:281–93.CrossRef
51.
go back to reference Dimitriou T, Michalas A. Multi-party trust computation in decentralized environments. 2012 5th Int. Conf. New Technol. Mobil. Secur. NTMS. 2012. p. 1–5 Dimitriou T, Michalas A. Multi-party trust computation in decentralized environments. 2012 5th Int. Conf. New Technol. Mobil. Secur. NTMS. 2012. p. 1–5
52.
go back to reference Dimitriou T, Michalas A. Multi-party trust computation in decentralized environments in the presence of malicious adversaries. Ad Hoc Netw. 2014;15:53–66.CrossRef Dimitriou T, Michalas A. Multi-party trust computation in decentralized environments in the presence of malicious adversaries. Ad Hoc Netw. 2014;15:53–66.CrossRef
53.
go back to reference Karr AF, Fulp WJ, Vera F, Young SS, Lin X, Reiter JP. Secure, privacy-preserving analysis of distributed databases. Technometrics. 2007;49:335–45.CrossRef Karr AF, Fulp WJ, Vera F, Young SS, Lin X, Reiter JP. Secure, privacy-preserving analysis of distributed databases. Technometrics. 2007;49:335–45.CrossRef
54.
go back to reference Hernández MA, Stolfo SJ. Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Discov. 1998;2:9–37.CrossRef Hernández MA, Stolfo SJ. Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Discov. 1998;2:9–37.CrossRef
55.
go back to reference Hernández MA, Stolfo SJ. The merge/purge problem for large databases. Proc. 1995 ACM SIGMOD Int. Conf. Manag. Data. New York, NY, USA: ACM; 1995. p. 127–38 Hernández MA, Stolfo SJ. The merge/purge problem for large databases. Proc. 1995 ACM SIGMOD Int. Conf. Manag. Data. New York, NY, USA: ACM; 1995. p. 127–38
56.
go back to reference Lunde AS, Lundeborg S, Lettenstrom GS, Thygesen L, Huebner J. The person-number systems of Sweden, Norway, Denmark, and Israel. Vital Health Stat 2. 1980;84:1–59. Lunde AS, Lundeborg S, Lettenstrom GS, Thygesen L, Huebner J. The person-number systems of Sweden, Norway, Denmark, and Israel. Vital Health Stat 2. 1980;84:1–59.
57.
go back to reference Ludvigsson JF, Otterblad-Olausson P, Pettersson BU, Ekbom A. The Swedish personal identity number: possibilities and pitfalls in healthcare and medical research. Eur J Epidemiol. 2009;24:659–67.CrossRefPubMedPubMedCentral Ludvigsson JF, Otterblad-Olausson P, Pettersson BU, Ekbom A. The Swedish personal identity number: possibilities and pitfalls in healthcare and medical research. Eur J Epidemiol. 2009;24:659–67.CrossRefPubMedPubMedCentral
60.
go back to reference El Emam K, Buckeridge D, Tamblyn R, Neisa A, Jonker E, Verma A. The re-identification risk of Canadians from longitudinal demographics. BMC Med Inform Decis Mak. 2011;11:46.CrossRefPubMedPubMedCentral El Emam K, Buckeridge D, Tamblyn R, Neisa A, Jonker E, Verma A. The re-identification risk of Canadians from longitudinal demographics. BMC Med Inform Decis Mak. 2011;11:46.CrossRefPubMedPubMedCentral
61.
go back to reference Koot M, Noordende G, Laat C. A study on the re-identifiability of Dutch citizens. Workshop Priv. Enhancing Technol. PET. 2010 Koot M, Noordende G, Laat C. A study on the re-identifiability of Dutch citizens. Workshop Priv. Enhancing Technol. PET. 2010
62.
go back to reference Potosky AL, Riley GF, Lubitz JD, Mentnech RM, Kessler LG. Potential for cancer related health services research using a linked Medicare-tumor registry database. Med Care. 1993;31:732–48.CrossRefPubMed Potosky AL, Riley GF, Lubitz JD, Mentnech RM, Kessler LG. Potential for cancer related health services research using a linked Medicare-tumor registry database. Med Care. 1993;31:732–48.CrossRefPubMed
63.
go back to reference Warren JL, Klabunde CN, Schrag D, Bach PB, Riley GF. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002;40:IV3–IV18. Warren JL, Klabunde CN, Schrag D, Bach PB, Riley GF. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002;40:IV3–IV18.
64.
go back to reference Saint-Andre P, Smith K, Tronçon R. XMPP: the definitive guide: building real-time applications with jabber technologies. 1st ed. Sebastopol: O’Reilly Media, Inc.; 2009. Saint-Andre P, Smith K, Tronçon R. XMPP: the definitive guide: building real-time applications with jabber technologies. 1st ed. Sebastopol: O’Reilly Media, Inc.; 2009.
66.
go back to reference Friedman C, Rigby M. Conceptualising and creating a global learning health system. Int J Med Inf. 2013;82:e63–71.CrossRef Friedman C, Rigby M. Conceptualising and creating a global learning health system. Int J Med Inf. 2013;82:e63–71.CrossRef
68.
go back to reference Christen P. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng. 2012;24:1537–55.CrossRef Christen P. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng. 2012;24:1537–55.CrossRef
69.
go back to reference Pohlig SC, Hellman ME. An improved algorithm for computing logarithms over and its cryptographic significance (Corresp.). IEEE Trans Inf Theory. 1978;24:106–10.CrossRef Pohlig SC, Hellman ME. An improved algorithm for computing logarithms over and its cryptographic significance (Corresp.). IEEE Trans Inf Theory. 1978;24:106–10.CrossRef
71.
go back to reference Durham EA, Kantarcioglu M, Xue Y, Toth C, Kuzu M, Malin B. Composite bloom filters for secure record linkage. IEEE Trans Knowl Data Eng. 2014;26:2956–68.CrossRefPubMedPubMedCentral Durham EA, Kantarcioglu M, Xue Y, Toth C, Kuzu M, Malin B. Composite bloom filters for secure record linkage. IEEE Trans Knowl Data Eng. 2014;26:2956–68.CrossRefPubMedPubMedCentral
72.
go back to reference Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms. 2005;55:58–75.CrossRef Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms. 2005;55:58–75.CrossRef
Metadata
Title
Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation
Authors
Kassaye Yitbarek Yigzaw
Antonis Michalas
Johan Gustav Bellika
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2017
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-016-0389-x

Other articles of this Issue 1/2017

BMC Medical Informatics and Decision Making 1/2017 Go to the issue